# Week 3: Sorting & Calculated Fields in Python
## Nigerian E-commerce Analytics with Pandas

**Course**: PORA Academy Cohort 5 - Data Analytics & AI Bootcamp  
**Week**: 3 of 24  
**Date**: August 27, 2025  
**Duration**: 2 hours  

### Business Context
You are a data analyst for **Olist Nigeria**, a major e-commerce marketplace. Your task is to analyze order pricing patterns, create business metrics, and sort data to discover insights that drive strategic decisions.

### Learning Objectives
By the end of this session, you will be able to:
1. **Create calculated columns** using arithmetic operations and business logic
2. **Sort DataFrames** with single and multiple criteria
3. **Apply conditional logic** for business categorization
4. **Handle string manipulations** for data cleaning
5. **Build e-commerce KPIs** and performance metrics

## 📚 Setup and Data Loading

First, let's import our libraries and load the Olist e-commerce dataset.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set pandas display options for better output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', '{:.2f}'.format)

print("📦 Libraries imported successfully!")
print(f"🐼 Pandas version: {pd.__version__}")
print(f"🔢 NumPy version: {np.__version__}")

In [None]:
# Load the Olist e-commerce dataset
# Note: In a real scenario, students would upload their CSV files to Colab

# For this demo, we'll create sample data that mirrors the actual Olist structure
np.random.seed(42)  # For reproducible results

# Create sample e-commerce data
n_orders = 1000

# Generate realistic order data
order_data = {
    'order_id': [f'order_{i:06d}' for i in range(n_orders)],
    'order_item_id': np.random.randint(1, 5, n_orders),
    'product_id': [f'prod_{np.random.randint(1000, 9999):04d}' for _ in range(n_orders)],
    'seller_id': [f'seller_{np.random.randint(100, 999):03d}' for _ in range(n_orders)],
    'price': np.random.exponential(100, n_orders).round(2),
    'freight_value': np.random.exponential(15, n_orders).round(2),
    'product_category_name_english': np.random.choice([
        'health_beauty', 'computers_accessories', 'watches_gifts', 'toys', 
        'electronics', 'bed_bath_table', 'sports_leisure', 'furniture_decor',
        'housewares', 'auto', 'telephony', 'garden_tools'
    ], n_orders),
    'product_weight_g': np.random.lognormal(6, 1, n_orders).astype(int),
    'product_length_cm': np.random.randint(10, 100, n_orders),
    'product_height_cm': np.random.randint(5, 50, n_orders),
    'product_width_cm': np.random.randint(10, 80, n_orders)
}

# Introduce some realistic missing values
missing_indices = np.random.choice(n_orders, size=int(n_orders * 0.1), replace=False)
for idx in missing_indices:
    order_data['product_weight_g'][idx] = np.nan

# Create DataFrame
df = pd.DataFrame(order_data)

print("🛒 E-commerce dataset loaded successfully!")
print(f"📊 Dataset shape: {df.shape}")
print("\n🔍 First few rows:")
df.head()

In [None]:
# Quick data exploration
print("📋 Dataset Information:")
print(df.info())
print("\n📈 Basic Statistics:")
print(df.describe())
print("\n🏷️ Product Categories:")
print(df['product_category_name_english'].value_counts())

## 🧮 Module 1: Creating Calculated Columns

In e-commerce analysis, raw data only tells part of the story. We need to create calculated fields that provide business insights.

### 1.1 Basic Arithmetic Operations

In [None]:
# 💰 Basic business calculations

# Total order value (price + shipping)
df['total_order_value'] = df['price'] + df['freight_value']

# Platform commission (10% of product price)
df['platform_commission'] = df['price'] * 0.10

# Processing fee (5% of total order value)
df['processing_fee'] = df['total_order_value'] * 0.05

# Net revenue (total - commission - processing fee)
df['net_revenue'] = df['total_order_value'] - df['platform_commission'] - df['processing_fee']

print("💰 Basic Financial Calculations:")
print(df[['order_id', 'price', 'freight_value', 'total_order_value', 'platform_commission', 'net_revenue']].head())

# Summary statistics for new fields
print("\n📊 Summary Statistics for Calculated Fields:")
print(df[['total_order_value', 'platform_commission', 'net_revenue']].describe())

### 1.2 Advanced E-commerce Metrics

In [None]:
# 🎯 Advanced business KPIs

# Freight as percentage of product price (key logistics KPI)
df['freight_percentage'] = (df['freight_value'] / df['price'] * 100).round(2)

# Price per gram (efficiency metric for weight-based products)
df['price_per_gram'] = (df['price'] / df['product_weight_g']).round(4)

# Product volume for packaging optimization
df['volume_cm3'] = df['product_length_cm'] * df['product_height_cm'] * df['product_width_cm']

# Value density (price per cubic cm) - premium product indicator
df['value_density'] = (df['price'] / df['volume_cm3']).round(6)

# Shipping cost per gram (logistics efficiency)
df['shipping_cost_per_gram'] = (df['freight_value'] / df['product_weight_g']).round(4)

# Handle infinite and NaN values from division operations
for col in ['price_per_gram', 'value_density', 'shipping_cost_per_gram']:
    df[col] = df[col].replace([np.inf, -np.inf], np.nan)

print("🎯 Advanced E-commerce Metrics:")
print(df[['price', 'freight_percentage', 'price_per_gram', 'value_density', 'shipping_cost_per_gram']].head(10))

# Check for missing values in calculated fields
print("\n🔍 Missing Values in Calculated Fields:")
missing_calc = df[['price_per_gram', 'value_density', 'shipping_cost_per_gram']].isnull().sum()
print(missing_calc)

### 1.3 Currency Conversion for Nigerian Market

In [None]:
# 💱 Convert Brazilian Real (BRL) to Nigerian Naira (NGN)
# Exchange rate approximation: 1 BRL = 500 NGN (simplified for learning)

EXCHANGE_RATE_NGN = 500

# Convert key financial metrics to Naira
df['price_ngn'] = (df['price'] * EXCHANGE_RATE_NGN).round(0)
df['freight_ngn'] = (df['freight_value'] * EXCHANGE_RATE_NGN).round(0) 
df['total_value_ngn'] = (df['total_order_value'] * EXCHANGE_RATE_NGN).round(0)
df['commission_ngn'] = (df['platform_commission'] * EXCHANGE_RATE_NGN).round(0)

print("💱 Currency Conversion (BRL → NGN):")
print(df[['price', 'price_ngn', 'freight_value', 'freight_ngn', 'total_order_value', 'total_value_ngn']].head())

# Nigerian market price ranges
print("\n🇳🇬 Nigerian Naira Price Distribution:")
print(f"Minimum order value: ₦{df['total_value_ngn'].min():,.0f}")
print(f"Maximum order value: ₦{df['total_value_ngn'].max():,.0f}")
print(f"Average order value: ₦{df['total_value_ngn'].mean():,.0f}")
print(f"Median order value: ₦{df['total_value_ngn'].median():,.0f}")

## 📊 Module 2: Data Sorting with sort_values()

Strategic sorting helps identify top performers, problem areas, and optimization opportunities.

### 2.1 Single Column Sorting

In [None]:
# 🏆 Find highest value orders (VIP customer identification)
highest_value = df.sort_values('total_order_value', ascending=False)
print("🏆 Top 10 Highest Value Orders:")
print(highest_value[['order_id', 'price', 'freight_value', 'total_order_value', 'total_value_ngn']].head(10))

print("\n" + "="*50)

# ⚡ Find most efficient shipping (lowest freight percentage)
efficient_shipping = df[df['freight_percentage'].notna()].sort_values('freight_percentage', ascending=True)
print("⚡ Most Efficient Shipping (Lowest Freight %):")
print(efficient_shipping[['order_id', 'price', 'freight_value', 'freight_percentage']].head(10))

print("\n" + "="*50)

# 💎 Products with highest value density (premium products)
premium_products = df[df['value_density'].notna()].sort_values('value_density', ascending=False)
print("💎 Highest Value Density Products (Premium Items):")
print(premium_products[['product_id', 'product_category_name_english', 'price', 'volume_cm3', 'value_density']].head(10))

### 2.2 Multiple Column Sorting

In [None]:
# 🎯 Multi-level sorting for comprehensive business analysis
# Sort by: Category (A-Z) → Highest Value → Best Shipping Efficiency

df_sorted = df.sort_values(
    ['product_category_name_english', 'total_order_value', 'freight_percentage'],
    ascending=[True, False, True],  # Category A-Z, highest value first, best shipping first
    na_position='last'  # Put NaN values at the end
)

print("🎯 Multi-level Sorting: Category → Value → Shipping Efficiency")

# Focus on key categories for analysis
key_categories = ['health_beauty', 'computers_accessories', 'watches_gifts', 'electronics']
category_analysis = df_sorted[df_sorted['product_category_name_english'].isin(key_categories)]

print(category_analysis[[
    'product_category_name_english', 'price', 'total_order_value', 
    'freight_percentage', 'total_value_ngn'
]].head(20))

In [None]:
# 🏅 Get top 3 products per category (business insight)

def get_top_n_per_category(df, n=3):
    """Get top N products by value for each category"""
    return df.groupby('product_category_name_english').apply(
        lambda x: x.nlargest(n, 'total_order_value')
    ).reset_index(drop=True)

top_per_category = get_top_n_per_category(df, 3)

print("🏅 Top 3 Products per Category by Order Value:")
display_cols = ['product_category_name_english', 'price', 'total_order_value', 'freight_percentage', 'total_value_ngn']
print(top_per_category[display_cols].head(15))

# Category performance summary
print("\n📋 Category Performance Summary:")
category_summary = top_per_category.groupby('product_category_name_english').agg({
    'total_order_value': ['mean', 'max'],
    'freight_percentage': 'mean'
}).round(2)
print(category_summary)

## 🧠 Module 3: Conditional Logic and Categorization

Business categorization enables targeted strategies and operational decisions.

### 3.1 Simple Conditions with numpy.where

In [None]:
# 🏷️ Simple binary and multi-level categorization

# Value tier classification
df['value_tier'] = np.where(df['price'] > 200, 'High Value', 'Standard Value')

# Shipping cost evaluation
df['shipping_tier'] = np.where(
    df['freight_value'] == 0, 'Free Shipping',
    np.where(
        df['freight_value'] < 10, 'Low Shipping',
        np.where(
            df['freight_value'] < 25, 'Standard Shipping',
            'High Shipping'
        )
    )
)

# Nigerian market pricing categories (in Naira)
df['naira_price_category'] = np.where(
    df['total_value_ngn'] < 25000, 'Budget (₦0-25K)',
    np.where(
        df['total_value_ngn'] < 100000, 'Economy (₦25K-100K)',
        np.where(
            df['total_value_ngn'] < 250000, 'Premium (₦100K-250K)',
            'Luxury (₦250K+)'
        )
    )
)

print("🏷️ Simple Business Categorization:")
print(df[['price', 'value_tier', 'freight_value', 'shipping_tier', 'total_value_ngn', 'naira_price_category']].head(10))

# Distribution analysis
print("\n📊 Category Distributions:")
print("Value Tier:")
print(df['value_tier'].value_counts())
print("\nNigerian Price Categories:")
print(df['naira_price_category'].value_counts())

### 3.2 Advanced Conditional Logic with Custom Functions

In [None]:
# 🎯 Complex business categorization functions

def categorize_shipping_efficiency(freight_pct):
    """Categorize shipping efficiency with star ratings"""
    if pd.isna(freight_pct):
        return 'Unknown'
    elif freight_pct < 5:
        return 'Excellent ⭐⭐⭐⭐⭐'
    elif freight_pct < 10:
        return 'Very Good ⭐⭐⭐⭐'
    elif freight_pct < 20:
        return 'Good ⭐⭐⭐'
    elif freight_pct < 30:
        return 'Average ⭐⭐'
    else:
        return 'Needs Improvement ⭐'

def categorize_business_priority(row):
    """Complex business priority based on multiple factors"""
    price = row['price']
    freight_pct = row['freight_percentage'] if pd.notna(row['freight_percentage']) else 100
    weight = row['product_weight_g'] if pd.notna(row['product_weight_g']) else 1000
    
    if price > 500 and freight_pct < 15 and weight < 2000:
        return 'VIP Priority 🏆'
    elif price > 300 and freight_pct < 20:
        return 'High Priority 🚀'
    elif price > 100 and freight_pct < 25:
        return 'Standard Priority ✅'
    elif freight_pct > 40:
        return 'Review Required ⚠️'
    else:
        return 'Low Priority 📦'

def categorize_weight_class(weight):
    """Categorize products by weight for logistics planning"""
    if pd.isna(weight):
        return 'Unknown'
    elif weight < 500:
        return 'Light'
    elif weight < 2000:
        return 'Medium'
    elif weight < 5000:
        return 'Heavy'
    else:
        return 'Bulk'

# Apply categorization functions
df['shipping_efficiency_score'] = df['freight_percentage'].apply(categorize_shipping_efficiency)
df['business_priority'] = df.apply(categorize_business_priority, axis=1)
df['weight_category'] = df['product_weight_g'].apply(categorize_weight_class)

print("🎯 Advanced Business Categorization:")
print(df[['price', 'freight_percentage', 'shipping_efficiency_score', 'business_priority', 'weight_category']].head(12))

In [None]:
# 📊 Business priority distribution and analysis

print("📊 Business Priority Distribution:")
priority_counts = df['business_priority'].value_counts()
print(priority_counts)
print(f"\nPercentage breakdown:")
priority_pct = (priority_counts / len(df) * 100).round(1)
for priority, pct in priority_pct.items():
    print(f"{priority}: {pct}%")

print("\n" + "="*60)

# Shipping efficiency analysis
print("⚡ Shipping Efficiency Analysis:")
efficiency_counts = df['shipping_efficiency_score'].value_counts()
print(efficiency_counts)

print("\n" + "="*60)

# Weight category distribution
print("⚖️ Product Weight Distribution:")
weight_counts = df['weight_category'].value_counts()
print(weight_counts)

## 🔤 Module 4: String Manipulation for Business Data

Clean, standardized text data is essential for professional business reporting.

In [None]:
# 🧹 String operations for business applications

# Create standardized order codes
df['order_code_prefix'] = df['order_id'].str[:8].str.upper()

# Clean and format category names
df['category_clean'] = df['product_category_name_english'].str.title().str.replace('_', ' ')

# Generate business-ready SKU codes
df['generated_sku'] = (
    df['product_category_name_english'].str[:3].str.upper() + '-' +
    df['order_item_id'].astype(str) + '-' +
    df['product_id'].str.replace('prod_', '').str[:4]
)

# Format currency displays for reports
df['price_display_brl'] = df['price'].apply(lambda x: f'R$ {x:.2f}')
df['price_display_ngn'] = df['price_ngn'].apply(lambda x: f'₦ {x:,.0f}')

# Weight display with proper units
df['weight_display'] = df['product_weight_g'].apply(
    lambda x: f'{x:.0f}g' if pd.notna(x) else 'Weight not specified'
)

# Dimension summary
df['dimensions_display'] = (
    df['product_length_cm'].astype(str) + '×' +
    df['product_height_cm'].astype(str) + '×' +
    df['product_width_cm'].astype(str) + ' cm'
)

print("🧹 String Manipulation for Business Presentation:")
display_cols = [
    'order_code_prefix', 'category_clean', 'generated_sku', 
    'price_display_brl', 'price_display_ngn', 'weight_display'
]
print(df[display_cols].head(10))

In [None]:
# 🔍 Text analysis for data quality assessment

# Category name analysis
df['category_name_length'] = df['product_category_name_english'].str.len()
df['category_word_count'] = df['product_category_name_english'].str.split('_').str.len()

# Order ID validation
df['order_id_valid'] = df['order_id'].str.len() == 12  # Check expected format

print("🔍 Text Analysis and Data Quality:")
print(f"Average category name length: {df['category_name_length'].mean():.1f} characters")
print(f"Average words per category: {df['category_word_count'].mean():.1f} words")
print(f"Order ID format validity: {df['order_id_valid'].sum()}/{len(df)} ({df['order_id_valid'].mean()*100:.1f}%)")

print("\n📊 Category Name Length Distribution:")
length_stats = df.groupby('product_category_name_english')['category_name_length'].agg(['mean', 'min', 'max']).round(1)
print(length_stats.head(8))

## 🎯 Hands-On Practice: Nigerian E-commerce Pricing Analysis

**Business Challenge**: Olist Nigeria wants to analyze their product portfolio and rank products by strategic importance for business growth.

### Exercise Requirements:
1. Calculate total value including shipping for each order
2. Create Nigerian price categories: Budget (<₦25,000), Premium (₦25,000-₦100,000), Luxury (>₦100,000)
3. Calculate shipping efficiency score (freight as % of price)
4. Create business recommendations based on multiple factors
5. Sort by business importance and display top 20 products

**Try to solve this yourself first, then check the solution below!**

In [None]:
# 🎯 YOUR TURN: Complete this exercise

# Step 1: Calculate total values and conversions
# TODO: Create 'total_with_shipping' column
# TODO: Convert to Nigerian Naira

# Step 2: Create price categorization
# TODO: Use pd.cut() or conditional logic for Nigerian price categories

# Step 3: Calculate shipping efficiency 
# TODO: Calculate freight as percentage of price

# Step 4: Business recommendation function
# TODO: Create function that recommends actions based on multiple factors

# Step 5: Strategic ranking
# TODO: Sort by business importance criteria

print("🎯 Exercise: Write your solution here!")
print("Hint: Start with Step 1 - calculating total values")

In [None]:
# 🎯 SOLUTION: Nigerian E-commerce Strategic Analysis

# Step 1: Calculate total values and conversions
df['total_with_shipping'] = df['price'] + df['freight_value']
df['total_ngn_strategic'] = df['total_with_shipping'] * 500  # Convert to Naira

# Step 2: Nigerian price categorization using pd.cut()
df['price_category_ngn'] = pd.cut(
    df['total_ngn_strategic'],
    bins=[0, 25000, 100000, float('inf')],
    labels=['Budget', 'Premium', 'Luxury'],
    include_lowest=True
)

# Step 3: Shipping efficiency calculation
df['shipping_efficiency'] = (df['freight_value'] / df['price'] * 100).round(1)

# Step 4: Business recommendation function
def get_business_recommendation(row):
    """Strategic business recommendations based on multiple factors"""
    total_ngn = row['total_ngn_strategic']
    shipping_eff = row['shipping_efficiency'] if pd.notna(row['shipping_efficiency']) else 50
    category = row['product_category_name_english']
    
    # High value with good shipping efficiency
    if total_ngn > 100000 and shipping_eff < 15:
        return 'Promote Heavily 🚀'
    
    # Premium tech/beauty products
    elif (total_ngn > 50000 and shipping_eff < 20 and 
          category in ['computers_accessories', 'health_beauty', 'electronics']):
        return 'Strategic Focus 🎯'
    
    # Budget items with good shipping - bundle opportunities
    elif total_ngn < 25000 and shipping_eff < 25:
        return 'Bundle Opportunity 📦'
    
    # Poor shipping efficiency needs review
    elif shipping_eff > 35:
        return 'Review Logistics ⚠️'
    
    # High value but poor efficiency
    elif total_ngn > 100000 and shipping_eff > 25:
        return 'Optimize Shipping 🔧'
    
    else:
        return 'Standard Treatment ✅'

# Apply business recommendation
df['business_recommendation'] = df.apply(get_business_recommendation, axis=1)

# Step 5: Strategic ranking
# Create priority score for sorting
priority_order = {
    'Promote Heavily 🚀': 1,
    'Strategic Focus 🎯': 2, 
    'Optimize Shipping 🔧': 3,
    'Bundle Opportunity 📦': 4,
    'Review Logistics ⚠️': 5,
    'Standard Treatment ✅': 6
}

df['priority_score'] = df['business_recommendation'].map(priority_order)

# Strategic ranking with multiple criteria
strategic_ranking = df.sort_values([
    'priority_score',           # Business recommendation priority
    'total_ngn_strategic',      # Higher value first within same priority
    'shipping_efficiency'       # Better shipping efficiency first
], ascending=[True, False, True])

print("🎯 Strategic Product Ranking for Nigerian Market:")
display_cols = [
    'product_category_name_english', 'total_ngn_strategic', 'price_category_ngn',
    'shipping_efficiency', 'business_recommendation'
]
strategic_top20 = strategic_ranking[display_cols].head(20)
print(strategic_top20)

print("\n📊 Business Recommendation Summary:")
recommendation_summary = df['business_recommendation'].value_counts()
print(recommendation_summary)

## 📈 Data Visualization: Business Insights Dashboard

In [None]:
# 📈 Create business visualizations

plt.figure(figsize=(15, 12))

# Subplot 1: Price distribution by category
plt.subplot(2, 3, 1)
df.boxplot(column='total_ngn_strategic', by='price_category_ngn', ax=plt.gca())
plt.title('Order Value Distribution by Price Category')
plt.xlabel('Price Category')
plt.ylabel('Total Value (NGN)')
plt.yscale('log')

# Subplot 2: Shipping efficiency distribution
plt.subplot(2, 3, 2)
df[df['shipping_efficiency'] < 100]['shipping_efficiency'].hist(bins=20, alpha=0.7, color='skyblue')
plt.title('Shipping Efficiency Distribution')
plt.xlabel('Freight as % of Price')
plt.ylabel('Number of Orders')

# Subplot 3: Business recommendations
plt.subplot(2, 3, 3)
recommendation_counts = df['business_recommendation'].value_counts()
plt.pie(recommendation_counts.values, labels=recommendation_counts.index, autopct='%1.1f%%')
plt.title('Business Recommendations Distribution')

# Subplot 4: Category performance
plt.subplot(2, 3, 4)
category_avg = df.groupby('product_category_name_english')['total_ngn_strategic'].mean().sort_values(ascending=False).head(8)
plt.bar(range(len(category_avg)), category_avg.values, color='lightcoral')
plt.title('Average Order Value by Category')
plt.xlabel('Product Category')
plt.ylabel('Average Order Value (NGN)')
plt.xticks(range(len(category_avg)), [cat.replace('_', '\n') for cat in category_avg.index], rotation=45, ha='right')

# Subplot 5: Value density vs shipping efficiency scatter
plt.subplot(2, 3, 5)
scatter_data = df[(df['value_density'].notna()) & (df['shipping_efficiency'] < 50)]
plt.scatter(scatter_data['shipping_efficiency'], scatter_data['value_density'], 
           alpha=0.6, c=scatter_data['total_ngn_strategic'], cmap='viridis')
plt.xlabel('Shipping Efficiency (%)')
plt.ylabel('Value Density (BRL/cm³)')
plt.title('Value Density vs Shipping Efficiency')
plt.colorbar(label='Total Value (NGN)')

# Subplot 6: Weight category distribution
plt.subplot(2, 3, 6)
weight_counts = df['weight_category'].value_counts()
plt.bar(weight_counts.index, weight_counts.values, color='lightgreen')
plt.title('Product Weight Category Distribution')
plt.xlabel('Weight Category')
plt.ylabel('Number of Products')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

print("📈 Business Dashboard Visualizations Created!")

## 🎯 Key Takeaways & Business Insights

### What We've Learned:
1. **Calculated Columns**: Created business-relevant metrics from raw e-commerce data
2. **Strategic Sorting**: Identified top performers and optimization opportunities
3. **Conditional Logic**: Built sophisticated categorization systems for business decisions
4. **String Operations**: Cleaned and formatted data for professional presentation

### Business Insights Discovered:
- **High-Value Opportunities**: Products worth promoting based on value and efficiency
- **Logistics Optimization**: Items with poor shipping efficiency need review
- **Market Segmentation**: Clear price tiers for Nigerian market positioning
- **Operational Priorities**: Data-driven fulfillment and customer service priorities

### Tomorrow's SQL Connection:
These same business calculations will be implemented in SQL using:
- DataFrame columns → SELECT calculated fields  
- `sort_values()` → ORDER BY clauses
- Custom functions → CASE statements
- Conditional logic → WHERE and HAVING clauses

The business logic remains identical - only the syntax changes!

## 📝 Practice Assignment Preview

**Assignment**: Seller Performance Analysis for Olist Nigeria

**Your Task**: Analyze seller performance metrics to identify top performers and growth opportunities:

1. **Calculate seller KPIs**: Average order value, shipping efficiency, product diversity
2. **Create performance tiers**: Platinum, Gold, Silver, Bronze seller classifications
3. **Identify growth opportunities**: Sellers with high potential but current challenges
4. **Generate business recommendations**: Actionable insights for seller relationship management

**Deliverables**:
- Jupyter notebook with complete analysis
- Business dashboard visualizations
- Executive summary of findings and recommendations

**Due**: Before next Thursday's SQL class (synchronization with SQL assignment)

## 📚 Additional Resources

### Python Documentation:
- [Pandas User Guide](https://pandas.pydata.org/docs/user_guide/)
- [NumPy Documentation](https://numpy.org/doc/stable/)
- [Data Manipulation with pandas](https://pandas.pydata.org/docs/user_guide/cookbook.html)

### Business Context:
- Nigerian E-commerce Market Analysis
- E-commerce KPIs and Metrics Guide
- Data-Driven Decision Making in Retail

### Next Session Preview:
- **Thursday**: Same business logic implemented in SQL
- **Week 4**: Aggregations and summary statistics (GROUP BY operations)
- **Week 5**: Date/time operations for temporal analysis

---

**🎉 Congratulations! You've completed Week 3 Python content. You now have the skills to create calculated fields and sort data like a professional e-commerce analyst!**