Q2 - Time Travel with Multi-Index¶

Question: Welcome to Time Travel with Multi-Index!
You are given a multi-index time series dataset of sales data for different regions and product categories.

Your task is to perform advanced time series analysis and manipulation to answer the following questions:
- What is the total sales for each region and product category over time?
- Calculate the moving average of sales for each region and product category.
- Identify the region with the highest sales growth rate.
- Determine the top-selling product category for each region.
- Find the month with the highest overall sales.
Datasets:

sales_data: Contains multi-index (region, product_category) and columns (date, sales).

In [None]:
import pandas as pd
import numpy as np

# Seed for reproducibility
np.random.seed(1)

# Generate date range
date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='ME')

# Regions and product categories
regions = ['North Pole', 'South Pole', 'East Egg', 'Westworld']
product_categories = ['Gizmos', 'Widgets', 'Doodads', 'Thingamajigs']

# Generate synthetic sales data
data = []
for region in regions:
    for category in product_categories:
        sales = np.random.randint(1000, 5000, size=len(date_range))
        for date, sale in zip(date_range, sales):
            data.append([region, category, date, sale])

# Create DataFrame
sales_data = pd.DataFrame(data, columns=['Region', 'Product_Category', 'Date', 'Sales'])

# Set multi-index
sales_data.set_index(['Region', 'Product_Category', 'Date'], inplace=True)

# Display the dataset
sales_data.head()

In [None]:
# Calculate the total sales for each region and product category over time
total_sales_rp = sales_data.groupby(['Region', 'Product_Category']).resample('ME', level='Date').sum()
total_sales_rp

In [None]:
# Calculate the moving average of sales for each region and product category
moving_average_sales = total_sales_rp.groupby(['Region', 'Product_Category']).rolling(window=3).mean()
moving_average_sales.reset_index(level=[0, 1])

In [None]:
# Identify the region with the highest sales growth rate
sales_growth = total_sales_rp.groupby(level='Region').sum().pct_change().fillna(0)
highest_growth_region = sales_growth['Sales'].idxmax()
highest_growth_region

In [None]:
# Determine the top-selling product category for each region
product_sale = total_sales_rp.groupby(['Region', 'Product_Category']).sum()
top_selling_product = product_sale.groupby('Region').idxmax()
top_selling_product

In [None]:
# Find the month with the highest overall sales
total_monthly_sales = total_sales_rp.groupby('Date').sum()
highest_sales_month = total_monthly_sales['Sales'].idxmax()
highest_sales_month.strftime('%Y-%m')