# Week 4: Python Aggregations & Summary Statistics - EXERCISES
## Wednesday Python Class - September 3, 2025

**Student Name**: _________________ **Date**: _________________

---

## Instructions:
- Complete each exercise using pandas aggregation functions
- Focus on business insights, not just technical solutions
- Write clean, well-commented code
- Test your results to ensure they make business sense
- Use the provided Olist e-commerce dataset

**Points Distribution**: 100 points total
- Part A: Basic Aggregations (25 points)
- Part B: GroupBy Operations (30 points)  
- Part C: Pivot Tables & Advanced Analysis (30 points)
- Part D: Business Intelligence Report (15 points)

---

## Setup: Load Libraries and Data

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Load the dataset
df = pd.read_csv('../lecture-materials/datasets/olist_sample_data.csv')
df['order_date'] = pd.to_datetime(df['order_date'])

# Add business columns (you'll need these for the exercises)
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
df['month_name'] = df['order_date'].dt.strftime('%B')

# Create price segments
df['price_segment'] = pd.cut(df['price'], 
                            bins=[0, 100, 200, 300, float('inf')], 
                            labels=['Budget', 'Standard', 'Premium', 'Luxury'])

print(f"Dataset loaded: {df.shape[0]} orders, {df.shape[1]} columns")
print(f"Date range: {df['order_date'].min()} to {df['order_date'].max()}")
print("Ready to start exercises!")

---
## Part A: Basic Aggregations (25 points)
**From Excel SUMIF/COUNTIF to Pandas**

### Exercise A1: Overall Business Metrics (5 points)
**Business Question**: What are our key performance indicators?

Calculate and display:
- Total number of orders
- Total revenue
- Average order value
- Minimum and maximum order values
- Overall customer satisfaction (average review score)

In [None]:
# Your code here
# Hint: Use basic pandas aggregation functions like sum(), mean(), min(), max(), count()


### Exercise A2: Conditional Aggregations (10 points)
**Business Question**: How do high-value orders (≥ R$200) compare to regular orders?

Calculate and compare:
- Number of high-value vs regular orders
- Revenue from each segment
- Percentage of total orders and revenue each represents
- Average customer satisfaction for each segment

In [None]:
# Your code here
# Hint: Use boolean indexing like df[df['price'] >= 200]


### Exercise A3: Value Counts Analysis (10 points)
**Business Question**: What's our market distribution and product focus?

Analyze and display:
- Top 5 customer states by order count (with percentages)
- Top 5 product categories by order count (with percentages)  
- Distribution of price segments (count and percentage)
- Create a simple visualization for one of these distributions

In [None]:
# Your code here
# Hint: Use value_counts() and value_counts(normalize=True)


---
## Part B: GroupBy Operations (30 points)
**From Excel Pivot Tables to Pandas GroupBy**

### Exercise B1: State Performance Analysis (10 points)
**Business Question**: Which states are our best markets?

Create a comprehensive analysis by customer state showing:
- Order count
- Total revenue
- Average order value
- Average customer satisfaction
- Average freight cost

Sort by total revenue and display top 8 states.

In [None]:
# Your code here
# Hint: Use groupby().agg() with a dictionary


### Exercise B2: Category Performance with describe() (10 points)
**Business Question**: What's the statistical profile of our top product categories?

For the top 5 product categories by order count:
- Use describe() to show statistical summary of prices
- Calculate additional metrics: total revenue, average rating
- Identify which category has the most consistent pricing (lowest standard deviation)
- Identify which category has the highest customer satisfaction

In [None]:
# Your code here
# Hint: Combine value_counts(), groupby(), and describe()


### Exercise B3: Multi-Column GroupBy (10 points)
**Business Question**: How does performance vary by state AND price segment?

Create analysis showing:
- Order count by state and price segment
- Revenue by state and price segment
- Average customer satisfaction by state and price segment

Focus on top 3 states by revenue. Identify the best-performing state-segment combination.

In [None]:
# Your code here
# Hint: Use groupby(['state', 'segment']) with multiple aggregations


---
## Part C: Pivot Tables & Advanced Analysis (30 points)
**Advanced Excel Pivot Table Functionality in Pandas**

### Exercise C1: Revenue Pivot Table (10 points)
**Business Question**: Create a pivot table showing revenue by state and price segment

Requirements:
- States as rows, price segments as columns
- Revenue as values
- Include totals (margins)
- Create a second pivot showing percentages of total revenue
- Identify which state dominates each price segment

In [None]:
# Your code here
# Hint: Use pd.pivot_table() with margins=True


### Exercise C2: Monthly Trends Analysis (10 points)
**Business Question**: How do our sales trends vary by month?

Create analysis showing:
- Monthly order counts and revenue
- Monthly average order value
- Monthly customer satisfaction trends
- Identify the best and worst performing months
- Create a simple line plot showing monthly revenue trend

In [None]:
# Your code here
# Hint: Group by month_name, use matplotlib for visualization


### Exercise C3: Custom Business Metrics (10 points)
**Business Question**: Create custom KPIs for category performance

Write a custom aggregation function that calculates:
- Revenue concentration (coefficient of variation = std/mean)
- Premium order percentage (orders ≥ R$200)
- Customer satisfaction rate (% with rating ≥ 4)
- Revenue per unique order (if we assume each order_id is unique)

Apply this function to analyze product categories.

In [None]:
# Your code here
# Hint: Create a custom function that returns a pd.Series with calculated metrics


---
## Part D: Business Intelligence Report (15 points)
**Executive Dashboard Creation**

### Exercise D1: Comprehensive Business Report (15 points)
**Business Challenge**: Create an executive summary for stakeholders

Your report should include:

1. **Executive Summary** (5 points):
   - Key overall metrics (orders, revenue, AOV, satisfaction)
   - Growth/trend indicators

2. **Market Analysis** (5 points):
   - Top 3 states with key metrics
   - Market share percentages
   - Performance benchmarks

3. **Strategic Recommendations** (5 points):
   - Based on your analysis, provide 3 specific business recommendations
   - Support each recommendation with data insights
   - Consider market opportunities and operational efficiency

**Format**: Professional presentation with clear sections and business language

In [None]:
# Your code here - Create a comprehensive business intelligence report
# Combine multiple aggregation techniques to generate insights

def generate_executive_report(df):
    """Create comprehensive executive report"""
    # Your implementation here
    pass

# Call your function and format the output professionally


---
## Submission Checklist

Before submitting, verify:
- [ ] All code executes without errors
- [ ] Results are properly formatted and easy to read
- [ ] Business insights are clearly explained
- [ ] Appropriate aggregation functions used for each question
- [ ] Code is well-commented and clean
- [ ] Visualizations (where requested) are clear and informative
- [ ] Executive report addresses business stakeholders appropriately

---
## Self-Assessment Questions

1. **Excel Bridge**: How do pandas aggregations compare to Excel SUMIF/COUNTIF functions? What are the advantages?

2. **GroupBy vs Pivot**: When would you use groupby() vs pivot_table() for business analysis?

3. **Business Application**: How can these aggregation techniques help in real-world e-commerce analytics?

4. **Data Quality**: What data quality issues did you notice while working with the dataset? How would you address them?

---
*Write your answers to self-assessment questions below:*

### Self-Assessment Answers:

**Answer 1**: (Your answer here)

**Answer 2**: (Your answer here) 

**Answer 3**: (Your answer here)

**Answer 4**: (Your answer here)