# 👩‍💻 Forecasting Retail Sales Using Facebook Prophet
## 📋 Overview
In this lab, you'll apply Facebook Prophet to forecast retail sales data, a crucial task for businesses optimizing inventory and marketing strategies. You'll prepare time series data, build and configure a Prophet model with appropriate seasonality settings, generate forecasts for future periods, and evaluate the model's performance. By the end, you'll have a complete sales forecasting pipeline that provides actionable business insights.
## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- Prepare time series data specifically for Prophet's required format
- Configure and train a Prophet model with appropriate seasonality settings
- Generate and visualize sales forecasts including confidence intervals
- Interpret forecast components (trend, seasonality, holidays) for business insights
- Evaluate forecast performance using appropriate metrics

## 🚀 Starting Point
Access the starter code by running the cells below. The lab uses a retail dataset containing historical sales information that we'll use for forecasting.

Required tools/setup:

- Python 3.x
- pandas, numpy, matplotlib
- prophet (Facebook Prophet)
- scikit-learn (for evaluation metrics)

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
!pip install -qq plotly
from prophet import Prophet
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load the dataset
data = pd.read_csv('stores_sales_forecasting.csv', encoding='latin-1')
print("Dataset preview:")
print(data.head())
print("\nDataset shape:", data.shape)

## Task 1: Prepare Time Series Data for Prophet
**Context:** In retail analytics, we need to transform raw sales data into a format suitable for time series forecasting. Prophet requires a specific data structure with date ('ds') and target ('y') columns.

**Steps:**

1. Convert 'Order Date' and 'Ship Date' columns to datetime using pandas' `to_datetime()` function.


2. Aggregate sales by date to create a daily time series:

    - Use `groupby()` on 'Order Date' and sum the 'Sales' column
    - Reset the index to convert the grouped data back to a DataFrame
    - Rename the columns to match Prophet's required format ('ds' for date, 'y' for target)


3. Examine the prepared time series data to ensure it's ready for modeling.

In [None]:
# Your code for converting dates to datetime
# ...

# Your code for aggregating sales by date
# ...

# Print the prepared data
# ...

**💡 Tip:** Make sure your date column is sorted in ascending order for Prophet to work correctly.

**⚙️ Test Your Work:**

- The resulting DataFrame should have exactly two columns named 'ds' and 'y'
- Print the first 5 rows to confirm the structure is correct
- Expected output: A DataFrame with dates in the 'ds' column and daily sales totals in the 'y' column

## Task 2: Visualize the Time Series Data
**Context:** Before building a forecasting model, it's important to understand the patterns in your data visually to identify trends, seasonality, and potential outliers.

**Steps:**

1. Create a figure with appropriate dimensions using `plt.figure(figsize=(width, height)).`


2. Plot the time series data:

    - Use the 'ds' column for x-axis (dates)
    - Use the 'y' column for y-axis (sales)
    - Add appropriate title and axis labels


3. Display the plot to examine patterns in the sales data.

In [None]:
# Your code for creating the time series plot
# ...

# Show the plot
# ...

**💡 Tip:** Look for weekly patterns, yearly seasonality, and any unusual spikes or dips that might affect your forecast.

**⚙️ Test Your Work:**

- A line plot should appear showing sales over time
- The plot should have labeled axes and a title
- Observe if there are clear seasonal patterns or trends

## Task 3: Build and Fit the Prophet Model
**Context:** Prophet is designed to handle multiple seasonalities and holidays, making it ideal for retail sales forecasting where these factors significantly impact sales.

**Steps:**

1. Create a Prophet model with appropriate parameters:

    - Set `yearly_seasonality=True` to capture annual patterns
    - Set `weekly_seasonality=True` for weekly patterns
    - Set `daily_seasonality=False` unless you have intra-day data
    - Set `seasonality_mode='multiplicative'` as retail often has multiplicative seasonality


2. Add US holidays to the model using `add_country_holidays()` to account for holiday effects on sales.


3. Fit the model to your prepared time series data using the `fit()` method.

In [None]:
# Your code for creating the Prophet model
# ...

# Your code for adding holidays
# ...

# Your code for fitting the model
# ...

**💡 Tip:** Multiplicative seasonality is often more appropriate for retail data where seasonal fluctuations increase as the overall trend increases.

**⚙️ Test Your Work:**

- The model should fit without errors
- Check that the model parameters match your settings

## Task 4: Generate Future Forecasts
**Context:** With a fitted model, businesses can now generate forecasts for future periods to inform inventory, staffing, and marketing decisions.

**Steps:**

1. Create a future dataframe for predictions:

    - Use `make_future_dataframe()` with an appropriate period parameter (e.g., 90 days)
    - This creates dates extending beyond your training data


2. Generate forecast values using the `predict()` method with the future dataframe.


3. Review the forecast results, focusing on the predicted values ('yhat') and confidence intervals ('yhat_lower', 'yhat_upper').

In [None]:
# Your code for creating future dataframe
# ...

# Your code for generating the forecast
# ...

# Print the forecast for the next few days
# ...

**💡 Tip:** The 'periods' parameter should match the business planning horizon (e.g., 30 days for monthly planning, 90 days for quarterly).

**⚙️ Test Your Work:**

- Print the forecast columns ['ds', 'yhat', 'yhat_lower', 'yhat_upper'] for the last 5 days
- Verify that the forecast extends beyond your original data

## Task 5: Visualize and Interpret the Forecast
**Context:** Visual representation of forecasts helps stakeholders understand future sales predictions and the factors influencing them.

**Steps:**

1. Plot the forecast using Prophet's built-in plotting function:

    - Use `model.plot(forecast)` to visualize the forecast with confidence intervals
    - Add appropriate title and labels


2. Plot the forecast components to understand what's driving your predictions:

    - Use `model.plot_components(forecast)` to see trend, seasonality, and holiday effects
    - These components help explain why sales might increase or decrease at certain times

In [None]:
# Your code for plotting the forecast
# ...

# Your code for plotting the components
# ...

**💡 Tip:** The components plot shows separate graphs for trend, weekly seasonality, yearly seasonality, and holidays, helping you identify which factors have the biggest impact on sales.

**⚙️ Test Your Work:**

- Two plots should appear: the forecast with confidence intervals and the components breakdown
- Verify that the components include trend, weekly seasonality, yearly seasonality, and holidays

## Task 6: Evaluate Forecast Performance
**Context:** To determine the reliability of forecasts, businesses need to measure how well the model performs on historical data.

**Steps:**

1. Filter the forecast to include only dates that exist in your original data.


2. Merge the actual sales with predicted values based on the date.


3. Calculate error metrics:

    - Mean Absolute Error (MAE)
    - Root Mean Squared Error (RMSE)
    - Mean Absolute Percentage Error (MAPE)


4. Interpret the results in the context of the business.

In [None]:
# Your code for filtering historical forecast data
# ...

# Your code for merging actual and predicted values
# ...

# Your code for calculating error metrics
# ...

# Print the metrics
# ...

**💡 Tip:** MAPE is particularly useful as it shows the average percentage difference between forecasted and actual values, making it intuitive for stakeholders.

**⚙️ Test Your Work:**

- You should have numeric values for MAE, RMSE, and MAPE
- MAE and RMSE are in the same units as your sales data
- MAPE should be expressed as a percentage

## ✅ Success Checklist
- Time series data is properly prepared with 'ds' and 'y' columns
- Visualizations clearly show the original time series and forecast
- Prophet model is configured with appropriate seasonality settings and holidays
- Forecast extends the appropriate number of periods into the future
- Component plots show trend, seasonality, and holiday effects
- Error metrics are calculated and interpreted
- Program runs without errors

## 🔍 Common Issues & Solutions
**Problem:** Prophet installation fails

**Solution:** Use `!pip install prophet` or try the conda installation `conda install -c conda-forge prophet`

**Problem:** "No module named 'prophet'" error

**Solution:** Some environments require importing as `from fbprophet import Prophet` instead of `from prophet import Prophet`

**Problem:** Model performance is poor with high error metrics

**Solution:** Try adjusting seasonality parameters, add additional regressors, or experiment with different changepoint settings

**Problem:** Strange seasonal patterns in the forecast

**Solution:** Check if `seasonality_mode` is set appropriately ('additive' vs 'multiplicative') based on your data patterns

## 🔑 Key Points
- Prophet works best when data is properly prepared with clear date and target value columns
- Including relevant holidays and seasonal patterns significantly improves retail forecasting accuracy
- Component plots help explain the forecasts to business stakeholders by breaking down trends and seasonality
- Error metrics provide an objective measure of forecast reliability
- Forecasting enables proactive business decisions for inventory management and resource allocation

## 💻 Exemplar Solution

<details>

<summary><strong>Click HERE to see an exemplar solution</strong></summary>    
    
```python
# Import necessary libraries
import pandas as pd
import numpy as np
from prophet import Prophet
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load the dataset
data = pd.read_csv('stores_sales_forecasting.csv', encoding='latin-1')
print(data.head())
print("Dataset shape:", data.shape)

# Task 1: Prepare Time Series Data for Prophet
# Convert date columns to datetime
data['Order Date'] = pd.to_datetime(data['Order Date'])
data['Ship Date'] = pd.to_datetime(data['Ship Date'])

# Aggregate sales by date for time series forecasting
daily_sales = data.groupby('Order Date')['Sales'].sum().reset_index()
daily_sales.columns = ['ds', 'y']  # Rename for Prophet compatibility

# Examine the aggregated data
print("\nDaily sales data:")
print(daily_sales.head())

# Task 2: Visualize the Time Series Data
# Visualize the time series
plt.figure(figsize=(12, 6))
plt.plot(daily_sales['ds'], daily_sales['y'])
plt.title('Daily Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Total Sales')
plt.grid(True)
plt.show()

# Task 3: Build and Fit the Prophet Model
# Create and fit Prophet model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    seasonality_mode='multiplicative',  # Retail often has multiplicative seasonality

    # --- Additional Parameters You Can Add/Adjust ---

    # 1. changepoint_prior_scale:
    # Controls the flexibility of the trend. A higher value allows the trend
    # to change more abruptly, a lower value makes it smoother.
    # Default is 0.05. For retail, sudden shifts due to marketing campaigns,
    # economic changes, or new product launches might occur.
    changepoint_prior_scale=0.01, # Experiment with values like 0.005, 0.1, 0.5

    # 2. seasonality_prior_scale:
    # Controls the strength of the seasonality components (yearly, weekly, etc.).
    # A higher value means stronger seasonality. Default is 10.0.
    # Retail sales often have very strong seasonality around holidays, so
    # increasing this might help capture those patterns more robustly.
    seasonality_prior_scale=12.0, # Experiment with values like 5.0, 20.0

    # 3. holidays_prior_scale:
    # Controls the strength of holiday effects. Default is 10.0.
    # If holidays have a particularly significant impact on your sales beyond
    # regular seasonality, increasing this value can give them more weight.
    holidays_prior_scale=10.0, # Experiment with values like 5.0, 20.0

    # 4. interval_width:
    # Adjusts the width of the uncertainty intervals (e.g., yhat_lower, yhat_upper).
    # Default is 0.80 (for 80% confidence interval). Use 0.95 for 95% confidence.
    # This affects the visualization of uncertainty, not the forecast itself.
    interval_width=0.80, # Set to 0.95 for wider, 95% confidence intervals

    # 5. growth:
    # Specifies the trend model. Default is 'linear'. Another option is 'logistic'
    # for saturating growth. 'logistic' requires adding 'cap' and 'floor' columns
    # to your DataFrame to define the maximum/minimum possible values for y.
    # growth='linear' # Usually fine for most sales data unless saturation is expected

    # 6. daily_seasonality:
    # You already have this set to False. Keep it unless your data truly has
    # clear intra-day patterns (e.g., hourly sales).
    # daily_seasonality=False
)

# Add US holidays
model.add_country_holidays(country_name='US')

# Fit the model
model.fit(daily_sales)

# Task 4: Generate Future Forecasts
# Generate future forecasts (next 90 days)
future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)

# Print forecast for the next few periods
print("\nForecast for the next few days:")
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(5))

# Task 5: Visualize and Interpret the Forecast
# Visualize the forecast
plt.figure(figsize=(12, 6))
fig = model.plot(forecast)
plt.title('Sales Forecast')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True)
plt.show()

# Visualize forecast components
fig2 = model.plot_components(forecast)
plt.show()

# Task 6: Evaluate Forecast Performance
# Calculate metrics on the historical period
historical_forecast = forecast[forecast['ds'].isin(daily_sales['ds'])]
historical_data = daily_sales.copy()

# Merge actual and predicted values
evaluation = pd.merge(historical_data,
                      historical_forecast[['ds', 'yhat']],
                      on='ds',
                      how='left')

# Calculate error metrics
mae = mean_absolute_error(evaluation['y'], evaluation['yhat'])
rmse = np.sqrt(mean_squared_error(evaluation['y'], evaluation['yhat']))
# MAPE calculation: handle division by zero if y can be 0
mape = np.mean(np.abs((evaluation['y'] - evaluation['yhat']) / evaluation['y'].replace(0, np.nan))) * 100 # Replace 0 with NaN for MAPE calc

print(f"\nModel Performance Metrics:")
print(f"MAE: ${mae:.2f}")
print(f"RMSE: ${rmse:.2f}")
print(f"MAPE: {mape:.2f}%")
    
```    