# Module 4: AI-Powered Data Analytics 🤖

## Learning Objectives
By the end of this module, you will be able to:
- Set up and configure AI tools for data analysis
- Use AI to accelerate data cleaning and exploration
- Apply prompt engineering techniques for effective AI assistance
- Generate analysis code and visualizations with AI
- Create automated reports and insights
- Understand ethical considerations and limitations of AI in analytics

## Course Context
This module builds on your Excel, SQL, and Python skills from previous modules, adding AI as a powerful accelerator to your data analytics toolkit.

## 📚 Today's Agenda
1. **AI Tools Setup** - Configure OpenAI/Azure OpenAI for data work
2. **AI-Assisted Data Cleaning** - Automate quality checks and cleaning
3. **Prompt Engineering** - Master effective AI communication
4. **Code Generation** - Let AI write analysis scripts
5. **AI Visualizations** - Smart chart recommendations
6. **Statistical Analysis** - AI-guided statistical testing
7. **Automated Reporting** - Generate insights automatically
8. **Ethics & Best Practices** - Responsible AI usage

Let's revolutionize your data analysis workflow! 🚀

## 🔧 Lab 1: AI Tools Setup

### Environment Configuration
Before we begin, let's set up our AI-powered analytics environment.

In [None]:
# First, ensure you have the required libraries installed
# Run this in your terminal: pip install openai pandas numpy matplotlib seaborn plotly python-dotenv

# Import required libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from dotenv import load_dotenv
import openai
import warnings
warnings.filterwarnings('ignore')

# Set display options for better output
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8')

print("✅ Libraries imported successfully!")
print("📝 Next: Configure your API key in a .env file")

In [None]:
# Load environment variables and configure OpenAI
load_dotenv()

# Configure AI client (add your API key to .env file)
try:
    client = openai.OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
    model_name = os.getenv('OPENAI_MODEL', 'gpt-4')
    print("✅ OpenAI API configured successfully!")
except Exception as e:
    print("❌ API setup issue. Please check your .env file contains:")
    print("OPENAI_API_KEY=your_api_key_here")
    print(f"Error: {e}")

# Helper function for AI interactions
def ask_ai(prompt, temperature=0.3, max_tokens=1000):
    """Send a prompt to AI and get response"""
    try:
        response = client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": "You are an expert data analyst assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature,
            max_tokens=max_tokens
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {e}"

# Test the connection
test_result = ask_ai("Say 'AI analytics assistant ready!' in a professional tone.")
print(f"🤖 Test Response: {test_result}")

## 🧹 Lab 2: AI-Assisted Data Cleaning

Let's use AI to help us clean and explore our enhanced customer transactions dataset.

In [None]:
# Load the enhanced customer transactions dataset
df = pd.read_csv('data/customer-transactions-enhanced.csv')

print("📊 Dataset Loaded Successfully!")
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print("\nFirst 5 rows:")
df.head()

In [None]:
# Exercise 2.1: AI-Assisted Data Quality Assessment
def ai_data_quality_check(dataframe):
    """Use AI to assess data quality and suggest cleaning steps"""
    
    data_summary = f"""
    Dataset Info:
    - Shape: {dataframe.shape}
    - Columns: {list(dataframe.columns)}
    - Data Types: {dataframe.dtypes.to_dict()}
    - Missing Values: {dataframe.isnull().sum().to_dict()}
    - Sample Data: {dataframe.head(3).to_dict()}
    """
    
    prompt = f"""
    Analyze this customer transaction dataset for data quality issues:
    
    {data_summary}
    
    Provide:
    1. List of potential data quality issues
    2. Python code to check for these issues
    3. Recommended cleaning steps
    4. Any anomalies or inconsistencies you notice
    
    Be specific and actionable.
    """
    
    return ask_ai(prompt, max_tokens=1200)

# Get AI assessment
print("🤖 AI Data Quality Assessment:")
quality_report = ai_data_quality_check(df)
print(quality_report)

## 🎯 Lab 3: Prompt Engineering for Data Analytics

### Exercise 3.1: Effective vs Ineffective Prompts
Learn how to craft prompts that get better AI assistance for data analysis tasks.

In [None]:
# Compare different prompt styles

# ❌ POOR PROMPT
poor_prompt = "Analyze the data"

# ✅ GOOD PROMPT
good_prompt = f"""
CONTEXT: Customer transaction analysis for ice cream retail business
GOAL: Identify top 3 revenue optimization opportunities

DATASET: {df.shape[0]} transactions with columns {list(df.columns)}
TIMEFRAME: {df['purchase_date'].min()} to {df['purchase_date'].max()}

ANALYSIS REQUIRED:
1. Customer segmentation by spending patterns
2. Product performance by category and region
3. Payment method preferences and their impact

OUTPUT FORMAT:
- Executive summary (3-4 sentences)
- Top 3 specific findings with supporting data
- Actionable recommendations
- Python code to verify key findings

CONSTRAINTS: Focus on actionable insights that could increase revenue by 10%+
"""

print("🔴 Poor Prompt Response:")
response1 = ask_ai(poor_prompt)
print(response1[:200] + "...\n")

print("🟢 Good Prompt Response:")
response2 = ask_ai(good_prompt)
print(response2)

print("\n💡 Notice the difference in specificity and actionability!")

## 💻 Lab 4: AI Code Generation

### Exercise 4.1: Generate Analysis Code from Natural Language
Let AI write the code for complex data analysis tasks.

In [None]:
# AI Code Generator for Data Analysis
def generate_analysis_code(task_description):
    """Generate Python code for data analysis tasks"""
    prompt = f"""
    Generate Python pandas code for this analysis:
    
    TASK: {task_description}
    
    DATASET: Variable name 'df' with columns {list(df.columns)}
    
    REQUIREMENTS:
    1. Include detailed comments
    2. Handle potential errors
    3. Print results clearly
    4. Use appropriate visualizations if needed
    5. Return only executable Python code
    
    CODE:
    """
    return ask_ai(prompt, temperature=0.1)

# Exercise: Generate code for different analysis tasks
tasks = [
    "Create RFM (Recency, Frequency, Monetary) analysis for customer segmentation",
    "Build a cohort analysis showing customer retention over time",
    "Identify the most profitable customer segments by age and region",
    "Create a statistical test to compare revenue between payment methods"
]

print("🤖 AI-Generated Analysis Code:")
print("=" * 60)

for i, task in enumerate(tasks, 1):
    print(f"\n📝 TASK {i}: {task}")
    print("-" * 50)
    
    code = generate_analysis_code(task)
    print(code)
    
    print(f"\n⚠️  EXERCISE: Review the AI-generated code above.")
    print("1. Check for logical errors")
    print("2. Verify it matches the task requirements") 
    print("3. Test with a small data sample first")
    print("4. Add any missing error handling")
    print("=" * 60)

## 📊 Lab 5: AI-Powered Visualizations

### Exercise 5.1: Smart Chart Recommendations
Let AI suggest the best visualizations for your analysis goals.

In [None]:
# AI Visualization Advisor
def get_visualization_recommendation(analysis_goal):
    """Get AI recommendations for optimal visualizations"""
    prompt = f"""
    VISUALIZATION TASK: {analysis_goal}
    
    DATASET CONTEXT:
    - Customer transactions: {df.shape[0]} records
    - Numerical columns: price, quantity, customer_age, discount_applied, revenue
    - Categorical: product_name, category, customer_region, payment_method
    - Date column: purchase_date
    
    RECOMMEND:
    1. Best 2-3 chart types for this analysis
    2. Why these charts are most effective
    3. Complete Python code using matplotlib/seaborn/plotly
    4. Insights these visualizations might reveal
    
    Provide executable code with proper styling.
    """
    return ask_ai(prompt, temperature=0.2)

# Create a comprehensive dashboard
print("🎨 AI VISUALIZATION RECOMMENDATIONS")
print("=" * 70)

viz_goals = [
    "Show revenue trends and seasonal patterns over time",
    "Compare customer demographics and spending behavior", 
    "Analyze product performance across categories and regions",
    "Display payment method preferences and their revenue impact"
]

for goal in viz_goals:
    print(f"\n🎯 GOAL: {goal}")
    print("-" * 50)
    recommendation = get_visualization_recommendation(goal)
    print(recommendation)
    print("=" * 70)

# Let's also create one actual visualization
print("\n📊 SAMPLE VISUALIZATION:")
print("Revenue by Category and Region")

# Simple visualization based on our data
plt.figure(figsize=(12, 6))

# Revenue by category
plt.subplot(1, 2, 1)
category_revenue = df.groupby('category')['revenue'].sum().sort_values(ascending=True)
plt.barh(category_revenue.index, category_revenue.values)
plt.title('Total Revenue by Category')
plt.xlabel('Revenue ($)')

# Revenue by region
plt.subplot(1, 2, 2) 
region_revenue = df.groupby('customer_region')['revenue'].sum()
plt.pie(region_revenue.values, labels=region_revenue.index, autopct='%1.1f%%')
plt.title('Revenue Distribution by Region')

plt.tight_layout()
plt.show()

print("💡 TIP: AI can suggest the most effective chart types for your specific analysis goals!")

## 📝 Lab 6: Automated AI Reporting

### Exercise 6.1: Generate Executive Reports
Create comprehensive business reports automatically using AI analysis.

In [None]:
# AI Executive Report Generator
def generate_executive_report():
    """Create a comprehensive business report using AI"""
    
    # Calculate key metrics
    total_revenue = df['revenue'].sum()
    total_transactions = len(df)
    avg_order_value = df['revenue'].mean()
    top_category = df.groupby('category')['revenue'].sum().idxmax()
    top_region = df.groupby('customer_region')['revenue'].sum().idxmax()
    
    # Prepare data summary for AI
    business_metrics = f"""
    BUSINESS PERFORMANCE SUMMARY:
    - Total Revenue: ${total_revenue:,.2f}
    - Total Transactions: {total_transactions:,}
    - Average Order Value: ${avg_order_value:.2f}
    - Top Category: {top_category}
    - Top Region: {top_region}
    - Date Range: {df['purchase_date'].min()} to {df['purchase_date'].max()}
    
    DETAILED ANALYSIS:
    Revenue by Category: {df.groupby('category')['revenue'].sum().to_dict()}
    Revenue by Region: {df.groupby('customer_region')['revenue'].sum().to_dict()}
    Payment Methods: {df['payment_method'].value_counts().to_dict()}
    Customer Age Distribution: {df['customer_age'].describe().to_dict()}
    """
    
    prompt = f"""
    Create an executive summary report for ice cream retail business performance:
    
    {business_metrics}
    
    REPORT STRUCTURE:
    1. Executive Summary (2-3 key findings)
    2. Revenue Performance Analysis
    3. Customer Insights
    4. Growth Opportunities (3 specific recommendations)
    5. Risk Factors to Monitor
    6. Next Steps (actionable items)
    
    Write in professional business language suitable for executives.
    Include specific numbers and percentages where relevant.
    """
    
    return ask_ai(prompt, max_tokens=1500)

# Generate the automated report
print("📊 AUTOMATED EXECUTIVE REPORT")
print("Generated using AI-powered data analysis")
print("=" * 80)

report = generate_executive_report()
print(report)

print("\n" + "=" * 80)
print("✅ REPORT COMPLETE!")

# Best Practices Summary
print("\n\n🎯 BEST PRACTICES FOR AI-POWERED ANALYTICS:")
print("-" * 50)
print("""
✅ DO:
• Always verify AI-generated insights with data
• Use specific, context-rich prompts
• Test AI code on small samples first
• Cross-check statistical claims
• Maintain human oversight for business decisions

❌ DON'T:
• Blindly execute AI-generated code
• Share sensitive data without privacy review
• Rely solely on AI for critical decisions
• Skip validation of AI recommendations
• Ignore domain expertise and business context

🔒 ETHICS & PRIVACY:
• Anonymize sensitive customer data
• Follow organizational AI usage policies
• Understand AI model limitations
• Document AI-assisted analysis processes
• Maintain transparency in AI usage
""")

print("\n🚀 CONGRATULATIONS!")
print("You've completed Module 4: AI-Powered Data Analytics!")
print("\nSkills Acquired:")
print("✓ AI tool setup and configuration")
print("✓ AI-assisted data cleaning and quality assessment") 
print("✓ Prompt engineering for data analysis")
print("✓ AI code generation and validation")
print("✓ Smart visualization recommendations")
print("✓ Automated report generation")
print("✓ Ethical AI usage and best practices")
print("\n🎯 Next: Module 5 - Career Development & Advanced AI Integration")