# GitHub Copilot Data Analysis Practice

This notebook demonstrates how to use GitHub Copilot for data analysis tasks. Use the generated sample data to practice prompt engineering and explore Copilot's capabilities.

## Learning Objectives
- Practice effective prompting for data analysis
- Learn to leverage Copilot for exploratory data analysis
- Understand how to provide context for better suggestions
- Develop skills in Copilot-assisted visualization

## Setup Instructions
1. Ensure the sample data has been generated (run `generate_sample_data.py`)
2. Install required packages: `pip install -r requirements.txt`
3. Start exploring with Copilot assistance!

## 1. Data Loading and Initial Exploration

Let's start by loading our e-commerce datasets and getting familiar with the data structure.

In [None]:
# Import necessary libraries for data analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Set display options for better readability
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 100)

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

In [None]:
# Load all datasets from the sample-data directory
# Use descriptive variable names to help Copilot understand the context

# Customer data: demographics, preferences, and segmentation
customers_df = pd.read_csv('sample-data/customers.csv')

# Product catalog: items, categories, pricing, and inventory
products_df = pd.read_csv('sample-data/products.csv')

# Order transactions: customer purchases with dates and totals
orders_df = pd.read_csv('sample-data/orders.csv')

# Order line items: detailed breakdown of each order
order_items_df = pd.read_csv('sample-data/order_items.csv')

# Website analytics: user behavior and traffic patterns
analytics_df = pd.read_csv('sample-data/website_analytics.csv')

# Customer support: service tickets and resolution data
support_df = pd.read_csv('sample-data/support_tickets.csv')

print("📊 Datasets loaded successfully!")
print(f"Customers: {len(customers_df):,} records")
print(f"Products: {len(products_df):,} records")
print(f"Orders: {len(orders_df):,} records")
print(f"Order Items: {len(order_items_df):,} records")
print(f"Analytics: {len(analytics_df):,} records")
print(f"Support Tickets: {len(support_df):,} records")

## 2. Copilot Practice: Dataset Overview Function

**Prompt Engineering Exercise**: Write a comment describing what you want, then let Copilot suggest the implementation.

In [None]:
# Create a comprehensive dataset overview function that:
# - Shows shape, column names, and data types
# - Identifies missing values and their percentages
# - Displays basic statistics for numerical columns
# - Shows unique value counts for categorical columns
# - Generates a summary report

def analyze_dataset_overview(df, dataset_name):
    # Let Copilot implement this function
    pass

In [None]:
# Test the function with our datasets
analyze_dataset_overview(customers_df, "Customers")

## 3. Customer Segmentation Analysis

**Copilot Challenge**: Analyze customer segments and their characteristics.

In [None]:
# Analyze customer segments (Bronze, Silver, Gold, Platinum)
# Calculate metrics for each segment:
# - Average age and gender distribution
# - Preferred categories and geographic distribution
# - Registration patterns over time
# Create visualizations to compare segments

def analyze_customer_segments(customers_df):
    # Let Copilot suggest the analysis approach
    pass

## 4. Sales Performance Dashboard

**Advanced Copilot Usage**: Create an interactive dashboard with multiple visualizations.

In [None]:
# Create a comprehensive sales dashboard that includes:
# 1. Monthly revenue trends with year-over-year comparison
# 2. Top-performing products and categories
# 3. Customer acquisition and retention metrics
# 4. Geographic sales distribution
# 5. Average order value trends
# Use plotly for interactive visualizations

def create_sales_dashboard(orders_df, order_items_df, customers_df, products_df):
    # Let Copilot build the dashboard
    pass

## 5. Predictive Analytics with Copilot

**Machine Learning Practice**: Use Copilot to help build predictive models.

In [None]:
# Build a customer lifetime value (CLV) prediction model
# Features to consider:
# - Customer demographics (age, location, segment)
# - Purchase behavior (frequency, recency, monetary value)
# - Website engagement (analytics data)
# - Support interactions (tickets and satisfaction)
# Use multiple algorithms and compare performance

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler, LabelEncoder

def build_clv_prediction_model(customers_df, orders_df, order_items_df, analytics_df, support_df):
    # Let Copilot guide the feature engineering and model building
    pass

## 6. Advanced Analytics: Cohort Analysis

**Complex Analysis with Copilot**: Perform cohort analysis to understand customer retention.

In [None]:
# Perform cohort analysis to understand customer retention patterns:
# 1. Group customers by their first purchase month (cohorts)
# 2. Track purchase behavior over subsequent months
# 3. Calculate retention rates for each cohort
# 4. Visualize retention heatmap
# 5. Identify patterns and insights

def perform_cohort_analysis(orders_df, customers_df):
    # Let Copilot implement the cohort analysis
    pass

## 7. Real-time Analytics Simulation

**Streaming Data Practice**: Simulate real-time data processing with Copilot assistance.

In [None]:
# Simulate real-time analytics dashboard that processes streaming data:
# - Real-time sales monitoring
# - Alert system for unusual patterns
# - Live customer behavior tracking
# - Performance metrics updating
# Use threading and time delays to simulate streaming

import threading
import time
from IPython.display import clear_output

def simulate_realtime_analytics(orders_df, analytics_df):
    # Let Copilot create the real-time simulation
    pass

## 8. Copilot Prompt Engineering Exercises

Practice different prompting techniques and observe how Copilot responds.

In [None]:
# Exercise 1: Vague prompt
# "Analyze the data"
# Observe what Copilot suggests with minimal context

def analyze_data():
    pass

In [None]:
# Exercise 2: Specific prompt with context
# "Calculate monthly recurring revenue (MRR) for the e-commerce business
# using order data, considering customer subscription patterns and repeat purchases
# Return results as pandas DataFrame with months as index"

def calculate_monthly_recurring_revenue(orders_df, order_items_df):
    pass

In [None]:
# Exercise 3: Example-driven prompt
# "Create a function that calculates customer health score
# Example: customer_health_score(customer_id='CUST_000001') should return:
# {
#     'health_score': 85,
#     'factors': {
#         'purchase_frequency': 90,
#         'order_value': 80,
#         'support_satisfaction': 85,
#         'engagement': 88
#     },
#     'risk_level': 'low'
# }"

def calculate_customer_health_score(customer_id, orders_df, support_df, analytics_df):
    pass

## 9. Data Quality Assessment

Use Copilot to help identify and handle data quality issues.

In [None]:
# Comprehensive data quality assessment function that:
# - Detects missing values and patterns
# - Identifies potential duplicates
# - Finds outliers using statistical methods
# - Validates data consistency across related tables
# - Suggests data cleaning strategies
# - Generates a quality report with recommendations

def assess_data_quality(dataframes_dict):
    """
    dataframes_dict: Dictionary with dataset names as keys and DataFrames as values
    Example: {'customers': customers_df, 'orders': orders_df, ...}
    """
    # Let Copilot implement comprehensive data quality checks
    pass

## 10. Performance Optimization with Copilot

Learn how to optimize data processing operations with Copilot's help.

In [None]:
# Create a slow, inefficient function first
def slow_customer_summary(customers_df, orders_df):
    """Intentionally inefficient implementation for optimization practice"""
    results = []
    for _, customer in customers_df.iterrows():
        customer_orders = orders_df[orders_df['customer_id'] == customer['customer_id']]
        total_spent = customer_orders['total_amount'].sum()
        order_count = len(customer_orders)
        avg_order_value = total_spent / order_count if order_count > 0 else 0
        results.append({
            'customer_id': customer['customer_id'],
            'total_spent': total_spent,
            'order_count': order_count,
            'avg_order_value': avg_order_value
        })
    return pd.DataFrame(results)

# Now ask Copilot to optimize this function
# "Optimize the above function for better performance using pandas vectorized operations
# and groupby functionality. The optimized version should be significantly faster."

def optimized_customer_summary(customers_df, orders_df):
    # Let Copilot suggest the optimized implementation
    pass

## 11. Advanced Visualization Gallery

Create a collection of sophisticated visualizations with Copilot assistance.

In [None]:
# Create an advanced visualization gallery with:
# 1. Animated time series plots showing business growth
# 2. Interactive geographic heatmaps of customer distribution
# 3. Sankey diagrams showing customer journey flows
# 4. Radar charts comparing customer segments
# 5. Correlation matrix heatmaps with hierarchical clustering
# 6. Box plots with statistical annotations
# 7. Custom styled dashboards with corporate branding

def create_visualization_gallery(customers_df, orders_df, order_items_df, products_df, analytics_df):
    # Let Copilot create impressive visualizations
    pass

## 12. Automated Report Generation

Use Copilot to generate comprehensive business reports.

In [None]:
# Generate automated business intelligence reports with:
# - Executive summary with key metrics
# - Detailed analysis sections
# - Visualizations embedded in narrative
# - Actionable recommendations
# - Export options (HTML, PDF, PowerPoint)
# Use natural language generation techniques

def generate_business_report(all_dataframes, report_date, report_type='monthly'):
    """
    Generate comprehensive business intelligence report
    
    Parameters:
    - all_dataframes: dict containing all datasets
    - report_date: datetime object for report period
    - report_type: 'daily', 'weekly', 'monthly', or 'quarterly'
    
    Returns:
    - Formatted report with insights and recommendations
    """
    # Let Copilot create the automated reporting system
    pass

## Summary and Next Steps

🎯 **Key Learning Points:**
1. **Context is King**: More specific prompts yield better results
2. **Iterative Development**: Build complexity gradually with Copilot
3. **Review and Refine**: Always validate Copilot's suggestions
4. **Domain Knowledge**: Your expertise guides Copilot's assistance

🚀 **Practice Recommendations:**
- Try different prompting styles and compare results
- Experiment with various data analysis scenarios
- Use Copilot for both exploration and production code
- Build your own analysis projects with real datasets

📚 **Additional Resources:**
- [GitHub Copilot Documentation](https://docs.github.com/en/copilot)
- [Advanced Prompt Engineering Guide](../study-materials/02-prompt-engineering.md)
- [Certification Practice Tests](../mock-questions/)

Good luck with your GitHub Copilot certification! 🌟