# Week 5: Date & Time Operations - Python Exercises
**Business Scenario**: NaijaCommerce Seasonal Analysis & Operations Optimization
**Instructions**: Complete each exercise to help NaijaCommerce understand their temporal business patterns using Python

## 🎯 Exercise Overview
These exercises mirror the SQL exercises from Thursday's class, allowing you to compare approaches and validate that both tools can achieve the same business insights.

## 📚 Setup and Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("📦 Libraries imported successfully!")
print("🚀 Ready to start temporal analysis exercises!")

In [None]:
# Create sample dataset (equivalent to our SQL dataset)
np.random.seed(42)
n_orders = 12000
start_date = datetime(2017, 1, 1)
end_date = datetime(2018, 12, 31)

# Generate orders dataset
orders_df = pd.DataFrame({
    'order_id': [f'order_{i:06d}' for i in range(n_orders)],
    'customer_id': [f'customer_{np.random.randint(1, 6000):06d}' for _ in range(n_orders)],
    'order_purchase_timestamp': pd.date_range(start=start_date, end=end_date, periods=n_orders),
    'order_status': np.random.choice(['delivered', 'shipped', 'cancelled', 'unavailable'], 
                                   n_orders, p=[0.80, 0.12, 0.06, 0.02])
})

# Add delivery and approval timestamps
approval_delays = np.random.exponential(scale=1.5, size=n_orders)
delivery_delays = np.random.exponential(scale=9, size=n_orders) + approval_delays

orders_df['order_approved_at'] = orders_df['order_purchase_timestamp'] + pd.to_timedelta(approval_delays, unit='D')
orders_df['order_delivered_customer_date'] = orders_df['order_purchase_timestamp'] + pd.to_timedelta(delivery_delays, unit='D')
orders_df['order_estimated_delivery_date'] = orders_df['order_purchase_timestamp'] + pd.to_timedelta(10, unit='D')

# Set delivery dates to None for non-delivered orders
mask = orders_df['order_status'] != 'delivered'
orders_df.loc[mask, 'order_delivered_customer_date'] = pd.NaT

# Create order items with pricing (for revenue analysis)
delivered_orders = orders_df[orders_df['order_status'] == 'delivered'].copy()
order_items = []
for _, order in delivered_orders.iterrows():
    n_items = np.random.choice([1, 2, 3], p=[0.75, 0.20, 0.05])
    for item_id in range(n_items):
        order_items.append({
            'order_id': order['order_id'],
            'order_item_id': item_id + 1,
            'price': np.random.exponential(scale=45) + 15,  # Price distribution
            'product_id': f'product_{np.random.randint(1, 800):04d}'
        })

order_items_df = pd.DataFrame(order_items)

print(f"📊 Dataset created: {len(orders_df):,} orders, {len(order_items_df):,} items")
print(f"📅 Date range: {orders_df['order_purchase_timestamp'].min().date()} to {orders_df['order_purchase_timestamp'].max().date()}")
print(f"🎯 Delivered orders: {(orders_df['order_status'] == 'delivered').sum():,}")

# Display sample data
print("\n📋 Sample Orders Data:")
orders_df.head()

## 📊 Exercise Set 1: Basic Date Extraction and Formatting
**Business Context**: The marketing team needs temporal insights for strategic planning.

### Exercise 1.1: Monthly Sales Overview
**Task**: Create a monthly summary showing year, month, month name, total orders, and unique customers for 2017-2018.

**Expected Output**: DataFrame with columns: year, month, month_name, total_orders, unique_customers

In [None]:
# Exercise 1.1: Monthly Sales Overview
# YOUR CODE HERE:
# Hint: Use .dt accessor to extract year and month, then group by these components
# Filter for orders in 2017-2018 and exclude cancelled/unavailable orders

# Steps:
# 1. Filter data for 2017-2018 and valid order statuses
# 2. Extract year, month, and month_name from order_purchase_timestamp
# 3. Group by year and month
# 4. Count orders and unique customers

monthly_sales = # Your solution here

print("📊 Monthly Sales Overview:")
print(monthly_sales)

# Bonus: Create a simple visualization
# plt.figure(figsize=(12, 6))
# Add your visualization code here

### Exercise 1.2: Weekend vs Weekday Shopping Patterns
**Task**: Compare shopping patterns between weekdays and weekends.

**Expected Output**: DataFrame showing day_type, total_orders, percentage_of_total

In [None]:
# Exercise 1.2: Weekend vs Weekday Shopping Patterns
# YOUR CODE HERE:
# Hint: Use .dt.dayofweek (0=Monday, 6=Sunday) and classify days as Weekend/Weekday

# Steps:
# 1. Add day_of_week column using .dt.dayofweek
# 2. Create day_type column (Weekend for Saturday/Sunday, Weekday otherwise)
# 3. Group by day_type and count orders
# 4. Calculate percentages

day_type_analysis = # Your solution here

print("📅 Weekend vs Weekday Shopping Patterns:")
print(day_type_analysis)

### Exercise 1.3: Peak Shopping Hours Analysis
**Task**: Find the top 5 shopping hours by order volume.

**Expected Output**: DataFrame with hour_of_day, order_count, and rank

In [None]:
# Exercise 1.3: Peak Shopping Hours Analysis
# YOUR CODE HERE:
# Hint: Extract hour from timestamp and count orders by hour

# Steps:
# 1. Extract hour from order_purchase_timestamp
# 2. Group by hour and count orders
# 3. Sort by order count and get top 5
# 4. Add rank column

peak_hours = # Your solution here

print("🕐 Peak Shopping Hours:")
print(peak_hours)

# Bonus: Create a bar chart of all 24 hours
# plt.figure(figsize=(12, 6))
# Add your visualization code here

## 💰 Exercise Set 2: Seasonal Revenue Analysis
**Business Context**: Executive team wants to understand revenue patterns and growth trends.

### Exercise 2.1: Quarterly Performance Comparison
**Task**: Calculate revenue and orders by quarter, including quarter-over-quarter growth rates.

**Expected Output**: DataFrame with year, quarter, total_orders, total_revenue, qoq_growth

In [None]:
# Exercise 2.1: Quarterly Performance Comparison
# YOUR CODE HERE:
# Hint: Merge orders with order_items, extract quarters, use shift() for growth calculation

# Steps:
# 1. Merge orders_df with order_items_df
# 2. Extract year and quarter from order_purchase_timestamp
# 3. Group by year and quarter, calculate totals
# 4. Use shift() to get previous quarter values
# 5. Calculate quarter-over-quarter growth

quarterly_performance = # Your solution here

print("📊 Quarterly Performance with Growth:")
print(quarterly_performance)

### Exercise 2.2: Holiday Season Impact Analysis
**Task**: Find months with above-average sales and classify them as peak/regular periods.

**Expected Output**: DataFrame with month, month_name, total_revenue, avg_revenue_all_months, classification

In [None]:
# Exercise 2.2: Holiday Season Impact Analysis
# YOUR CODE HERE:
# Hint: Calculate monthly revenue, then compare each month to overall average

# Steps:
# 1. Merge orders with items and group by month
# 2. Calculate total revenue per month
# 3. Calculate overall average revenue
# 4. Classify months as Peak/Regular/Low based on performance vs average

holiday_impact = # Your solution here

print("🎄 Holiday Season Impact Analysis:")
print(holiday_impact)

### Exercise 2.3: Year-over-Year Growth Analysis
**Task**: Calculate month-over-month growth comparing 2017 vs 2018.

**Expected Output**: DataFrame with year, month, total_orders, prev_year_orders, yoy_growth_percent

In [None]:
# Exercise 2.3: Year-over-Year Growth Analysis
# YOUR CODE HERE:
# Hint: Create monthly summary, then use shift(12) to get same month previous year

# Steps:
# 1. Group by year and month to get monthly order counts
# 2. Use shift(12) to get same month previous year
# 3. Calculate year-over-year growth percentage
# 4. Filter for months where comparison is possible

yoy_growth = # Your solution here

print("📈 Year-over-Year Growth Analysis:")
print(yoy_growth)

## 🚚 Exercise Set 3: Delivery Performance Metrics
**Business Context**: Logistics team needs to understand seasonal delivery performance.

### Exercise 3.1: Average Delivery Time by Month
**Task**: Calculate average delivery time and identify months with poor performance.

**Expected Output**: DataFrame with month_year, total_deliveries, avg_delivery_days, performance_rating

In [None]:
# Exercise 3.1: Average Delivery Time by Month
# YOUR CODE HERE:
# Hint: Filter delivered orders, calculate delivery time using date arithmetic

# Steps:
# 1. Filter for delivered orders only
# 2. Calculate delivery_days = (delivered_date - purchase_date).dt.days
# 3. Group by month-year and calculate average delivery days
# 4. Add performance rating based on delivery time thresholds

delivery_performance = # Your solution here

print("📦 Monthly Delivery Performance:")
print(delivery_performance)

# Bonus: Create a line chart showing delivery time trends
# plt.figure(figsize=(12, 6))
# Add your visualization code here

### Exercise 3.2: Delivery Performance Benchmarking
**Task**: Calculate delivery time percentiles for setting performance targets.

**Expected Output**: DataFrame with year, p50_delivery_days, p75_delivery_days, p95_delivery_days

In [None]:
# Exercise 3.2: Delivery Performance Benchmarking
# YOUR CODE HERE:
# Hint: Use .quantile() method for percentile calculations

# Steps:
# 1. Filter delivered orders and calculate delivery days
# 2. Group by year
# 3. Calculate 50th, 75th, and 95th percentiles using .quantile()

delivery_benchmarks = # Your solution here

print("📊 Delivery Performance Benchmarks:")
print(delivery_benchmarks)

### Exercise 3.3: Late Delivery Analysis
**Task**: Find orders delivered later than estimated and analyze patterns.

**Expected Output**: DataFrame with month, total_orders, late_deliveries, late_delivery_rate, avg_delay_days

In [None]:
# Exercise 3.3: Late Delivery Analysis
# YOUR CODE HERE:
# Hint: Compare delivered_date with estimated_delivery_date

# Steps:
# 1. Filter delivered orders with both delivery and estimated dates
# 2. Create is_late column: delivered_date > estimated_date
# 3. Calculate delay_days for late orders
# 4. Group by month and calculate late delivery statistics

late_delivery_analysis = # Your solution here

print("⚠️ Late Delivery Analysis:")
print(late_delivery_analysis)

## 👥 Exercise Set 4: Customer Behavior Temporal Analysis
**Business Context**: Marketing wants to understand customer engagement patterns.

### Exercise 4.1: Customer Purchase Frequency by Season
**Task**: Analyze how often customers make repeat purchases by quarter.

**Expected Output**: DataFrame with quarter, total_customers, repeat_customers, repeat_purchase_rate

In [None]:
# Exercise 4.1: Customer Purchase Frequency by Season
# YOUR CODE HERE:
# Hint: Group by customer and quarter, count orders per customer per quarter

# Steps:
# 1. Extract quarter from order_purchase_timestamp
# 2. Group by customer_id and quarter, count orders per customer per quarter
# 3. Identify customers with multiple orders in the same quarter
# 4. Calculate repeat purchase rates by quarter

customer_frequency = # Your solution here

print("👥 Customer Purchase Frequency by Quarter:")
print(customer_frequency)

### Exercise 4.2: New vs Returning Customer Trends
**Task**: Identify customer acquisition patterns by month.

**Expected Output**: DataFrame with month_year, new_customers, returning_customers, customer_mix_ratio

In [None]:
# Exercise 4.2: New vs Returning Customer Trends
# YOUR CODE HERE:
# Hint: Find each customer's first order date, then classify subsequent orders

# Steps:
# 1. Find first order date for each customer
# 2. For each order, determine if it's the customer's first order (new) or not (returning)
# 3. Group by month and count new vs returning customers
# 4. Calculate customer mix ratio

new_vs_returning = # Your solution here

print("🆕 New vs Returning Customer Trends:")
print(new_vs_returning)

### Exercise 4.3: Customer Lifetime Analysis
**Task**: Analyze customer purchase spans and frequency.

**Expected Output**: DataFrame with customer_lifetime_category, customer_count, avg_orders_per_customer

In [None]:
# Exercise 4.3: Customer Lifetime Analysis
# YOUR CODE HERE:
# Hint: Calculate time between first and last order for each customer

# Steps:
# 1. For each customer, find first and last order dates
# 2. Calculate customer lifetime in months
# 3. Group customers into lifetime categories (0, 1-3, 4-6, 7-12, 12+ months)
# 4. Calculate average orders per customer in each category

customer_lifetime = # Your solution here

print("⏳ Customer Lifetime Analysis:")
print(customer_lifetime)

## 📈 Exercise Set 5: Advanced Temporal Business Intelligence
**Business Context**: Advanced analytics for trend identification and forecasting.

### Exercise 5.1: Rolling Averages for Trend Analysis
**Task**: Calculate 7-day and 30-day moving averages for daily orders.

**Expected Output**: DataFrame with date, daily_orders, rolling_7d_avg, rolling_30d_avg

In [None]:
# Exercise 5.1: Rolling Averages for Trend Analysis
# YOUR CODE HERE:
# Hint: Group by date, then use .rolling() method for moving averages

# Steps:
# 1. Group orders by date (order_purchase_timestamp.dt.date)
# 2. Count daily orders
# 3. Sort by date
# 4. Calculate 7-day and 30-day rolling averages using .rolling()

rolling_analysis = # Your solution here

print("📊 Rolling Averages Analysis (first 20 days):")
print(rolling_analysis.head(20))

# Bonus: Create a line chart showing daily orders with moving averages
# plt.figure(figsize=(15, 6))
# Add your visualization code here

### Exercise 5.2: Cohort Analysis by Registration Month
**Task**: Create a cohort analysis showing customer retention rates.

**Expected Output**: DataFrame with cohort_month, months_since_first_order, active_customers, retention_rate

In [None]:
# Exercise 5.2: Cohort Analysis by Registration Month
# YOUR CODE HERE:
# Hint: Group customers by first purchase month, track subsequent activity

# Steps:
# 1. Find first purchase month for each customer (cohort_month)
# 2. For each order, calculate months since first purchase
# 3. Group by cohort_month and months_since_first_purchase
# 4. Count active customers and calculate retention rates
# 5. Focus on first 12 months and recent cohorts

cohort_analysis = # Your solution here

print("👥 Cohort Analysis (sample):")
print(cohort_analysis.head(15))

# Bonus: Create a cohort heatmap
# cohort_matrix = cohort_analysis.pivot(index='cohort_month', 
#                                      columns='months_since_first_order', 
#                                      values='retention_rate')
# plt.figure(figsize=(12, 8))
# sns.heatmap(cohort_matrix, annot=True, fmt='.1f', cmap='Blues')
# plt.title('Customer Cohort Retention Heatmap')
# plt.show()

### Exercise 5.3: Seasonal Inventory Planning Analysis
**Task**: Calculate seasonal demand patterns with predictive insights.

**Expected Output**: DataFrame with month, avg_orders, seasonal_index, recommended_stock_level

In [None]:
# Exercise 5.3: Seasonal Inventory Planning Analysis
# YOUR CODE HERE:
# Hint: Calculate seasonal indices by comparing monthly averages to yearly average

# Steps:
# 1. Group orders by month (across all years)
# 2. Calculate average monthly orders
# 3. Calculate seasonal index: (monthly_avg / yearly_avg) * 100
# 4. Create stock level recommendations based on seasonal index

seasonal_planning = # Your solution here

print("📦 Seasonal Inventory Planning:")
print(seasonal_planning)

## 🌍 Exercise Set 6: Nigerian Business Context
**Business Context**: Adapt analysis for Nigerian market conditions and cultural patterns.

### Exercise 6.1: Rainy Season Impact Analysis
**Task**: Compare delivery performance between rainy and dry seasons.

**Expected Output**: DataFrame with season_type, avg_delivery_days, order_volume, delivery_performance

In [None]:
# Exercise 6.1: Rainy Season Impact Analysis
# YOUR CODE HERE:
# Hint: Classify months into Rainy (May-Oct) and Dry (Nov-Apr) seasons

# Steps:
# 1. Create function to classify months into Nigerian seasons
# 2. Apply to delivery data
# 3. Group by season and calculate delivery metrics
# 4. Add performance classification

def get_nigerian_season(month):
    # Define the function here
    pass

rainy_season_impact = # Your solution here

print("🌧️ Rainy Season Impact Analysis:")
print(rainy_season_impact)

### Exercise 6.2: Nigerian Holiday Impact Analysis
**Task**: Identify sales spikes around major Nigerian holidays.

**Expected Output**: DataFrame with month, holiday_type, avg_monthly_orders, sales_spike_factor, recommended_preparation

In [None]:
# Exercise 6.2: Nigerian Holiday Impact Analysis
# YOUR CODE HERE:
# Hint: Classify months by Nigerian holidays and business periods

# Steps:
# 1. Create function to classify Nigerian business periods
# 2. Calculate average monthly orders by business period
# 3. Calculate sales spike factor compared to normal periods
# 4. Add business recommendations

def get_nigerian_business_period(month):
    # Define the function here
    # December: Christmas Season
    # October: Independence Month
    # September: Back-to-School
    # Other months: Regular Period
    pass

holiday_impact = # Your solution here

print("🎄 Nigerian Holiday Impact Analysis:")
print(holiday_impact)

### Exercise 6.3: Business Day vs Holiday Analysis
**Task**: Compare business metrics between regular business days and holidays.

**Expected Output**: DataFrame with day_type, avg_daily_orders, customer_behavior_pattern

In [None]:
# Exercise 6.3: Business Day vs Holiday Analysis
# YOUR CODE HERE:
# Hint: Classify days into Business Day, Weekend, and specific holidays

# Steps:
# 1. Extract day of week and specific dates (Christmas, Independence Day)
# 2. Classify each order date as Business Day, Weekend, or Holiday
# 3. Calculate average daily orders by day type
# 4. Add customer behavior pattern insights

def classify_day_type(row):
    # Define the function here
    # Consider: weekends, Christmas (Dec 25), Independence Day (Oct 1)
    pass

business_day_analysis = # Your solution here

print("📅 Business Day vs Holiday Analysis:")
print(business_day_analysis)

## 🎯 Validation and Comparison
**Task**: Compare your Python results with Thursday's SQL findings.

### Final Summary Dashboard

In [None]:
# Create a comprehensive summary dashboard
# YOUR CODE HERE: Combine insights from all exercises

print("📊 NAIJACOMMERCE TEMPORAL ANALYSIS DASHBOARD")
print("=" * 50)

# Dataset overview
print(f"📋 DATASET OVERVIEW:")
print(f"   Total Orders: {len(orders_df):,}")
print(f"   Date Range: {orders_df['order_purchase_timestamp'].min().date()} to {orders_df['order_purchase_timestamp'].max().date()}")
print(f"   Delivered Orders: {(orders_df['order_status'] == 'delivered').sum():,}")

# Add more summary statistics from your exercises
# Peak shopping periods
# Delivery performance metrics
# Customer behavior insights
# Seasonal patterns

print("\n✅ Analysis Complete!")
print("🔄 Compare these results with your Thursday SQL analysis")
print("💡 Both tools should provide identical business insights!")

## 📝 Reflection Questions

After completing the exercises, reflect on these questions:

1. **Tool Comparison**: Which operations were easier in Python vs SQL? Why?

2. **Business Insights**: Did you discover the same patterns using Python as you did with SQL?

3. **Visualization**: How did Python's visualization capabilities enhance your understanding?

4. **Performance**: For which types of analysis would you prefer Python over SQL?

5. **Integration**: How could you combine SQL and Python in a real business workflow?

## 🚀 Next Steps

These datetime skills will be essential for:
- **Google Looker Studio**: Creating time-based dashboards and reports
- **Streamlit**: Building interactive temporal analytics applications
- **Advanced Analytics**: Time series forecasting and predictive modeling
- **Business Intelligence**: Automated reporting and KPI tracking systems

---

## 📋 Submission Guidelines

1. **Complete all exercises** with working code and business insights
2. **Add visualizations** where indicated to enhance understanding
3. **Compare results** with your Thursday SQL analysis
4. **Include business interpretations** of your findings
5. **Test your code** to ensure it runs without errors

**Evaluation Criteria**:
- Correct implementation of datetime operations (40%)
- Business relevance and insight quality (30%)
- Code quality and pandas best practices (20%)
- Consistency with SQL analysis results (10%)

Good luck with your analysis! 🎯