# Assignment 3: Building Time Series Forecasts

**Student Name:** [Your Name Here]

**Date:** [Date]

---

## Assignment Overview

In this assignment, you'll analyze temporal data from Corporación Favorita stores to identify trends, seasonality, and anomalies, then build forecasting models using decomposition techniques. You'll work with real retail sales data to predict future sales patterns.

---

## Step 1: Import Libraries and Load Data

Start by downloading all the of the necessary libraries for this assignment.

In [None]:
pip install pandas numpy matplotlib seaborn scikit-learn statsmodels

Next, import the required libraries.

In [2]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# For time series analysis
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Libraries imported successfully")

✓ Libraries imported successfully


Finally, load the training, store, and holiday CSV data from the `data` directory using `pd.read_csv()`. Display basic information about the training data and print out the first few rows to get an understanding of what the training data looks like.

In [None]:
# Load the datasets
# TODO: Load all three required CSV files
train_df = None  # Replace with pd.read_csv('data/train.csv')
holidays_df = None  # Replace with pd.read_csv('data/holidays_events.csv')
stores_df = None  # Replace with pd.read_csv('data/stores.csv')

# Display basic information
if train_df is not None:
    print(f"Training data shape: {train_df.shape}")
    print(f"Date range: {train_df['date'].min()} to {train_df['date'].max()}")
    print(f"\nFirst few rows:")
    # TODO: Display the first few rows

print("\n" + "="*80)
print("CHECKPOINT: Verify datasets loaded correctly")
print(f"Train data shape: {train_df.shape if train_df is not None else 'Not loaded'}")
print(f"Holidays data shape: {holidays_df.shape if holidays_df is not None else 'Not loaded'}")
print(f"Stores data shape: {stores_df.shape if stores_df is not None else 'Not loaded'}")
print("="*80)

### Explore Available Stores and Product Families
Display store information from `stores_df` to help choose a store. Consider looking at store type, cluster, and city

In [None]:
# Explore the data to help choose your store and products
if train_df is not None:
    print("Available stores:")
    print(f"Total number of stores: {train_df['store_nbr'].nunique()}")
    
    print("\nAvailable product families:")
    families = train_df['family'].value_counts().head(20)
    print(families)
    
    # TODO: Display store information from stores_df to help choose a store
    # Consider looking at store type, cluster, and city

---
## Step 2: Select and Prepare Your Time Series Data

### Select Your Store and Product Families

Choose one store and two contrasting product families. Good contrasts might be:
- PRODUCE vs BEVERAGES (perishable vs non-perishable)
- BREAD/BAKERY vs AUTOMOTIVE (daily necessity vs occasional purchase)
- GROCERY I vs CELEBRATION (staples vs seasonal)

In [None]:
# TODO: Select your store and product families
selected_store = None  # Replace with your chosen store number (e.g., 1)
product_family_1 = None  # Replace with first product family (e.g., 'PRODUCE')
product_family_2 = None  # Replace with second product family (e.g., 'BEVERAGES')

print(f"Selected Store: {selected_store}")
print(f"Product Family 1: {product_family_1}")
print(f"Product Family 2: {product_family_2}")

### Filter Data
Filter the `train_df` data to your selected store and products. Create a date range from 2016-01-01 to 2017-08-15 for consistency.

In [None]:
# TODO: Filter train_df for selected store and date range 2016-01-01 to 2017-08-15

# Convert date column to datetime if needed
if train_df is not None:
    train_df['date'] = pd.to_datetime(train_df['date'])

# Filter for date range
start_date = '2016-01-01'
end_date = '2017-08-15'

# TODO: Create filtered datasets for each product family
product1_data = None  # Filter for store, product_family_1, and date range
product2_data = None  # Filter for store, product_family_2, and date range

print("\n" + "="*80)
print("CHECKPOINT: Data filtered successfully")
print(f"Product 1 data shape: {product1_data.shape if product1_data is not None else 'Not filtered'}")
print(f"Product 2 data shape: {product2_data.shape if product2_data is not None else 'Not filtered'}")
print("="*80)

### Aggregate Daily Sales
Aggregate daily sales and handle missing dates.

In [None]:

# TODO: Group by date and sum sales for each product family
# TODO: Create a complete date range and fill missing dates with 0 sales

# Example structure (replace with your implementation):
# date_range = pd.date_range(start=start_date, end=end_date, freq='D')
# product1_ts = product1_data.groupby('date')['sales'].sum().reindex(date_range, fill_value=0)

product1_ts = None  # Replace with time series for product 1
product2_ts = None  # Replace with time series for product 2

### Plot Raw Time Series
Plot both time series to see the raw patterns using matplotlib.

In [None]:
# Visualize both time series
# TODO: Create a figure with 2 subplots showing both time series
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Plot Product 1
# TODO: Plot product1_ts on axes[0]

# Plot Product 2
# TODO: Plot product2_ts on axes[1]

plt.tight_layout()
plt.show()

print("Observation: What patterns do you see in the raw data?")

### Document Your Choice (2-3 sentences)
Explain why you chose these specific products. 
- What contrasts do they represent? 
- Why will they be interesting to compare?

[ REPLACE WITH YOUR JUSTIFICATION ]

---
## Step 3: Identify Trends Using Moving Averages

### Calculate Moving Averages
Calculate the 7-day and 30-day moving averages for both products

In [None]:
# TODO: Calculate 7-day and 30-day moving averages for both products

# For Product 1
product1_ma7 = None  # Replace with product1_ts.rolling(window=7).mean()
product1_ma30 = None  # Replace with product1_ts.rolling(window=30).mean()

# For Product 2
product2_ma7 = None  # Replace with product2_ts.rolling(window=7).mean()
product2_ma30 = None  # Replace with product2_ts.rolling(window=30).mean()

### Plot Original Sales
Using matplotlip, plot original sales with both moving averages (7-day and 30-day) overlaid.

In [None]:
# Plot original sales with moving averages
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Product 1
# TODO: Plot original, 7-day MA, and 30-day MA for product 1
# axes[0].plot(product1_ts.index, product1_ts.values, alpha=0.4, label='Daily Sales')
# axes[0].plot(product1_ma7.index, product1_ma7.values, label='7-Day MA')
# axes[0].plot(product1_ma30.index, product1_ma30.values, label='30-Day MA')

# Product 2
# TODO: Plot original, 7-day MA, and 30-day MA for product 2

plt.tight_layout()
plt.show()

### Identify and Explain Trend Changes

Merge the data with `holidays_events.csv` to explain what caused these changes.

In [None]:
# Merge with holidays to explain trend changes
# TODO: Convert holidays_df date to datetime and filter for your date range
if holidays_df is not None:
    holidays_df['date'] = pd.to_datetime(holidays_df['date'])
    relevant_holidays = None  # Filter holidays_df for your date range
    
    # TODO: Display holidays that might explain trend changes
    print("Key holidays/events in the period:")
    # Display relevant holidays

### Document Trend Analysis
For each product family, document:
1. Overall trend direction (growing, declining, stable)
2. Any trend changes that correlate with holidays or events
3. Business implications of the trends you discovered

Update the markdown cell below with your analysis.

**Product 1 Trends:**
- Overall trend direction: [ Growing/Declining/Stable? ]
- Key trend changes: [ List at least 3 significant changes and dates ]
- Holiday correlations: [ Which holidays affected sales? ]
- Business implications: [ What do these trends mean for inventory ]

**Product 2 Trends:**
- Overall trend direction: [ Growing/Declining/Stable? ]
- Key trend changes: [ List at least 3 significant changes and dates ]
- Holiday correlations: [ Which holidays affected sales? ]
- Business implications: [ What do these trends mean for inventory ]

---
## Step 4: Detect and Visualize Seasonal Patterns

Analyze the seasonal components of your sales data.

### Day-of-Week Analysis

Add day of week to your data and calculate the average sales by day. Create a bar plot to visualize the weekday patterns.

In [None]:
# Analyze day-of-week patterns
# TODO: Add day of week to your data and calculate average sales by day

# For Product 1
product1_dow = None  # Create DataFrame with date and sales
# Add day of week: product1_dow['day_of_week'] = product1_dow.index.day_name()
# Group by day of week and calculate mean sales

# For Product 2
product2_dow = None  # Similar for product 2

# Create bar plot comparing weekday patterns
# TODO: Create side-by-side bar plot showing average sales by day of week

### Monthly Seasonality Analysis

Calculate the average sales by month for both products. Once calculated, create a line plot showing monthly patterns for both products.

In [None]:
# Analyze monthly patterns
# TODO: Calculate average sales by month for both products

# For Product 1
product1_monthly = None  # Group by month and calculate mean sales

# For Product 2  
product2_monthly = None  # Group by month and calculate mean sales

# Create visualization
# TODO: Create line plot showing monthly patterns for both products

### Holiday Impact Analysis

Compare the average sales on holidays compared to regular days.

In [None]:
# Analyze holiday vs non-holiday sales
# TODO: Compare average sales on holidays vs regular days

# Create a list of holiday dates
holiday_dates = None  # Extract unique dates from holidays_df

# Calculate average sales on holidays vs non-holidays for both products
# TODO: Split data into holiday and non-holiday sales and compare

### Seasonal Pattern Findings

**Document your findings:**
- Which days of the week have highest/lowest sales?
- Are there monthly patterns (e.g., payday effects)?
- How do holidays affect each product differently?
- What business decisions could these patterns inform?

Update the markdown cell below with your analysis.

[ Write your analysis here ]

---
## Step 5: Decompose Time Series and Build Forecasts

### Time Series Decomposition

In [None]:
# Perform seasonal decomposition
# TODO: Use seasonal_decompose to separate trend, seasonal, and residual components

# For Product 1
decomposition1 = None  # seasonal_decompose(product1_ts, model='additive', period=7)

# For Product 2
decomposition2 = None  # seasonal_decompose(product2_ts, model='additive', period=7)

In [None]:
# Visualize decomposition for Product 1
if decomposition1 is not None:
    fig, axes = plt.subplots(4, 1, figsize=(14, 10))
    
    # TODO: Plot each component
    # decomposition1.observed.plot(ax=axes[0], title=f'{product_family_1} - Original')
    # decomposition1.trend.plot(ax=axes[1], title='Trend')
    # decomposition1.seasonal.plot(ax=axes[2], title='Seasonal')
    # decomposition1.resid.plot(ax=axes[3], title='Residual')
    
    plt.tight_layout()
    plt.show()

In [None]:
# Visualize decomposition for Product 2
# TODO: Similar visualization for Product 2

### Build Forecasts Using Decomposition

In [None]:
# Split data for validation
# Use last 30 days as test set
test_days = 30

# For Product 1
train1 = None  # product1_ts[:-test_days]
test1 = None  # product1_ts[-test_days:]

# For Product 2
train2 = None  # product2_ts[:-test_days]
test2 = None  # product2_ts[-test_days:]

In [None]:
# Create baseline forecast (naive method - use average of last 30 days)
# TODO: Calculate baseline forecasts

# For Product 1
baseline_forecast1 = None  # np.repeat(train1[-30:].mean(), test_days)

# For Product 2
baseline_forecast2 = None  # np.repeat(train2[-30:].mean(), test_days)

In [None]:
# Create decomposition-based forecast
# TODO: Use the decomposition components to build a forecast
# Simple approach: extend trend + repeat seasonal pattern

# For Product 1
# 1. Extend trend (use last trend value or simple linear extension)
# 2. Repeat seasonal pattern for next 30 days
# 3. Combine trend + seasonal
decomp_forecast1 = None  # Your decomposition-based forecast

# For Product 2
decomp_forecast2 = None  # Your decomposition-based forecast

### Calculate Forecast Accuracy

In [None]:
# Calculate RMSE for both methods
# TODO: Calculate RMSE for baseline and decomposition forecasts

# Product 1
baseline_rmse1 = None  # np.sqrt(mean_squared_error(test1, baseline_forecast1))
decomp_rmse1 = None  # np.sqrt(mean_squared_error(test1, decomp_forecast1))

# Product 2
baseline_rmse2 = None  # np.sqrt(mean_squared_error(test2, baseline_forecast2))
decomp_rmse2 = None  # np.sqrt(mean_squared_error(test2, decomp_forecast2))

In [None]:
# Create comparison table
comparison_data = {
    'Product': [product_family_1, product_family_1, product_family_2, product_family_2],
    'Method': ['Baseline', 'Decomposition', 'Baseline', 'Decomposition'],
    'RMSE': [baseline_rmse1, decomp_rmse1, baseline_rmse2, decomp_rmse2]
}

comparison_df = pd.DataFrame(comparison_data)

# TODO: Calculate percentage improvement
# Add improvement column to comparison_df

print("\n" + "="*80)
print("FORECAST PERFORMANCE COMPARISON")
print("="*80)
# TODO: Display comparison table
print("="*80)

### Visualize Forecasts

In [None]:
# Plot actual vs forecasted values
# TODO: Create visualization showing:
# - Historical data (train)
# - Actual test data
# - Baseline forecast
# - Decomposition forecast

fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Product 1 forecast visualization
# TODO: Plot on axes[0]

# Product 2 forecast visualization  
# TODO: Plot on axes[1]

plt.tight_layout()
plt.show()

---
## Step 6: Generate Business Recommendations

### Executive Summary (200-300 words)

Based on your analysis, write a brief executive summary that includes:
- **Key Patterns Discovered:** Summarize the main trends and seasonal patterns for each product
- **Inventory Planning Recommendations:** Specific recommendations based on your findings
- **High-Risk Periods:** Identify periods requiring special attention
- **Predictability Analysis:** Which product is more predictable and why?
- **Specific Action Item:** One concrete action the store manager should take based on your forecast

[ Write your 200 to 300 word executive summary here ]

---
## Step 7: Submit Your Work

Before submitting:
1. Ensure all code cells run without errors
2. Verify all visualizations display correctly
3. Check that your analysis sections are complete
4. Review your executive summary

Push to GitHub:
```bash
git add .
git commit -m 'completed time series forecasting assignment'
git push
```

Submit your GitHub repository link on the course platform.