# Python Assignment: Facebook Prophet Forecasting with External Regressors

This assignment will challenge you to leverage Facebook Prophet's capabilities for time series forecasting, specifically incorporating external regressors. You will construct a time series with a synthetic relationship to an external variable, build Prophet models with and without this regressor, and then critically evaluate the impact of including additional information on forecast accuracy. Understanding how to use external regressors is crucial for improving model performance and incorporating business insights into your forecasts.

## Part 1: Data Generation and Preparation (30 points)

We'll create a synthetic time series dataset that includes a clear trend, seasonality, and a dependency on an external regressor. This ensures you can clearly observe the effects of adding the regressor.

In [None]:
# 1.1 Install Prophet (if not already installed)
# !pip install prophet pandas numpy matplotlib scikit-learn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from prophet import Prophet
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler
import warnings

warnings.filterwarnings('ignore') # Suppress warnings for cleaner output
np.random.seed(42) # for reproducibility

# 1.2 Generate Synthetic Time Series Data with an External Regressor
#    Create a DataFrame with:
#    - `ds`: DatetimeIndex (daily frequency, at least 3 years of data, e.g., from '2020-01-01').
#    - `y`: The target variable. It should have:
#        - A clear increasing or decreasing linear trend.
#        - Annual seasonality (e.g., sinusoidal pattern).
#        - A direct linear dependency on `external_regressor_1` (e.g., `y = ... + coef * external_regressor_1`).
#        - Random noise.
#    - `external_regressor_1`: This should be a continuous numerical variable that also has its own trend or pattern.
#      Ensure it has a significant, visible relationship with `y`.

n_days = 365 * 4 # 4 years of daily data
dates = pd.date_range(start='2020-01-01', periods=n_days, freq='D')

# Trend component for y
y_trend = np.linspace(50, 200, n_days)

# Annual seasonality for y
y_annual_seasonality = 30 * np.sin(np.linspace(0, 8 * np.pi, n_days))

# External Regressor 1: Has its own subtle trend and noise
external_regressor_1 = np.linspace(10, 30, n_days) + 5 * np.random.randn(n_days)
external_regressor_1 = np.maximum(0, external_regressor_1) # Ensure non-negative

# Define the relationship between y and external_regressor_1
regressor_effect = 2.5 * external_regressor_1 # Each unit of regressor adds 2.5 to y

# Noise for y
y_noise = np.random.normal(0, 8, n_days)

# Combine components for y
y = y_trend + y_annual_seasonality + regressor_effect + y_noise

# Create the DataFrame in Prophet's required format
data = pd.DataFrame({
    'ds': dates,
    'y': y,
    'external_regressor_1': external_regressor_1
})

print("Data Head:\n", data.head())
print("\nData Info:")
data.info()

# 1.3 Split Data into Training and Testing Sets
#    Allocate the last 30 days (or 1 month, depending on your frequency) for testing.

train_size = len(data) - 30
train_df = data.iloc[:train_size].copy()
test_df = data.iloc[train_size:].copy()

print(f"\nTraining data points: {len(train_df)}")
print(f"Test data points: {len(test_df)}")

# 1.4 Visualize Raw Data and Regressor
#    Plot `y` over time.
#    Plot `external_regressor_1` over time (perhaps on a secondary y-axis or a separate subplot to see its trend).

fig, ax1 = plt.subplots(figsize=(15, 7))

# Plot y
ax1.plot(data['ds'], data['y'], label='Target (y)', color='blue')
ax1.set_xlabel('Date')
ax1.set_ylabel('Target Value (y)', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')
ax1.set_title('Time Series Data with External Regressor')

# Create a second y-axis for the regressor
ax2 = ax1.twinx()
ax2.plot(data['ds'], data['external_regressor_1'], label='External Regressor 1', color='orange', linestyle='--')
ax2.set_ylabel('Regressor Value', color='orange')
ax2.tick_params(axis='y', labelcolor='orange')

fig.legend(loc="upper left", bbox_to_anchor=(0.1, 0.95))
plt.grid(True)
plt.show()


## Part 2: Basic Prophet Model (without External Regressors) (20 points)

Establish a baseline by training and forecasting with Prophet using only the `ds` and `y` columns. This will allow for a clear comparison once the regressor is added.

In [None]:
# 2.1 Initialize and Fit Basic Prophet Model
#    Create a Prophet model instance.
#    Fit the model on your `train_df` using only `ds` and `y`.

print("\n--- Training Basic Prophet Model (without regressors) ---")
# TODO: Initialize Prophet model
# m_basic = Prophet(daily_seasonality=False) # Set daily_seasonality to True if your data has daily patterns
# m_basic.fit(train_df[['ds', 'y']])

# 2.2 Create Future DataFrame and Forecast
#    Create a future DataFrame for the forecast period (matching `test_df`'s duration).
#    Generate predictions using the basic model.

future_basic = m_basic.make_future_dataframe(periods=len(test_df), freq='D', include_history=False)
print("Future DataFrame for basic model head:\n", future_basic.head())

forecast_basic = m_basic.predict(future_basic)
print("Basic Model Forecast Head:\n", forecast_basic[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].head())

# 2.3 Evaluate Basic Model Performance
#    Align forecast with actuals and calculate RMSE and MAE for the test period.

y_true_basic = test_df['y'].values
y_pred_basic = forecast_basic['yhat'].values

rmse_basic = np.sqrt(mean_squared_error(y_true_basic, y_pred_basic))
mae_basic = mean_absolute_error(y_true_basic, y_pred_basic)

print(f"\nBasic Model (No Regressors) RMSE: {rmse_basic:.4f}")
print(f"Basic Model (No Regressors) MAE: {mae_basic:.4f}")

# 2.4 Plot Basic Model Forecast
#    Plot the basic model's forecast (including confidence intervals) alongside the actual test data.

fig = m_basic.plot(forecast_basic, figsize=(12, 6))
ax = fig.gca()
ax.plot(test_df['ds'], test_df['y'], 'o', color='red', markersize=4, label='Actual Test Data')
ax.set_title('Basic Prophet Forecast (No Regressors)')
plt.legend()
plt.show()


## Part 3: Prophet with External Regressors (30 points)

Now, you will include the `external_regressor_1` into your Prophet model and observe its impact on the forecast. This section highlights the crucial requirement of having future values of regressors for prediction.

In [None]:
# 3.1 Initialize and Fit Prophet Model with External Regressor
#    Create a new Prophet model instance.
#    Add `external_regressor_1` using `m.add_regressor()`.
#    Fit the model on your `train_df`, ensuring `external_regressor_1` is also present.

print("\n--- Training Prophet Model with External Regressor ---")
# TODO: Initialize Prophet model
# m_regressor = Prophet(daily_seasonality=False)
# TODO: Add external regressor
# m_regressor.add_regressor('external_regressor_1')
# TODO: Fit the model with regressor
# m_regressor.fit(train_df[['ds', 'y', 'external_regressor_1']])

# 3.2 Prepare Future DataFrame for Regressor Model
#    Create a future DataFrame. Crucially, this DataFrame **must** also contain the future values for `external_regressor_1`.
#    For this synthetic data, you can simply use the `test_df`'s `ds` and `external_regressor_1` columns.

future_regressor = m_regressor.make_future_dataframe(periods=len(test_df), freq='D', include_history=False)

# TODO: Merge the future regressor values into the future DataFrame
# future_regressor = pd.merge(future_regressor, test_df[['ds', 'external_regressor_1']], on='ds', how='left')

print("Future DataFrame for regressor model head (with regressor):\n", future_regressor.head())

# 3.3 Generate Forecast with Regressor Model
#    Generate predictions using the model with the regressor and the prepared `future_regressor` DataFrame.

forecast_regressor = m_regressor.predict(future_regressor)
print("Regressor Model Forecast Head:\n", forecast_regressor[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].head())

# 3.4 Evaluate Regressor Model Performance
#    Align forecast with actuals and calculate RMSE and MAE for the test period.

y_true_regressor = test_df['y'].values
y_pred_regressor = forecast_regressor['yhat'].values

rmse_regressor = np.sqrt(mean_squared_error(y_true_regressor, y_pred_regressor))
mae_regressor = mean_absolute_error(y_true_regressor, y_pred_regressor)

print(f"\nRegressor Model RMSE: {rmse_regressor:.4f}")
print(f"Regressor Model MAE: {mae_regressor:.4f}")

# 3.5 Plot Regressor Model Forecast and Components
#    Plot the regressor model's forecast (including confidence intervals) alongside the actual test data.
#    Additionally, use `m_regressor.plot_components(forecast_regressor)` to visualize the impact of the regressor.

fig = m_regressor.plot(forecast_regressor, figsize=(12, 6))
ax = fig.gca()
ax.plot(test_df['ds'], test_df['y'], 'o', color='red', markersize=4, label='Actual Test Data')
ax.set_title('Prophet Forecast with External Regressor')
plt.legend()
plt.show()

print("\n--- Model Components with Regressor ---")
# TODO: Plot model components
# fig_components = m_regressor.plot_components(forecast_regressor)
# plt.show()


## Part 4: Evaluation and Interpretation (15 points)

Compare the performance of the two models and interpret the results from the components plot.

### Your Analysis:

1.  **Compare the RMSE and MAE of the basic Prophet model versus the Prophet model with the external regressor.** What do these metrics tell you about the impact of the external regressor on forecast accuracy?

    * **Comparison:** _(Your comparison here)_
    * **Impact of Regressor:** _(Your explanation here)_

2.  **Examine the `model.plot_components()` output for the model with the external regressor.** Describe what the component plot for `external_regressor_1` shows. Does it align with how you designed its relationship with `y` in Part 1?

    * **Description of Regressor Component:** _(Your description here)_
    * **Alignment with Design:** _(Your explanation here)_

3.  **In a real-world scenario, what is the biggest challenge when using external regressors for future forecasting? How might you address this challenge?**

    * **Challenge:** _(Your explanation here)_
    * **Addressing the Challenge:** _(Your proposed solutions here)_


## Part 5: Reflection and Advanced Topics (5 points)

Consider other advanced features and considerations for Prophet.

### Your Answers to Reflection Questions:

1.  **Prophet allows you to specify whether a regressor is `additive` or `multiplicative` (default is additive). Briefly explain when you might choose one over the other.**

    _(Your answer here)_

2.  **Beyond external regressors, name two other advanced features or considerations in Prophet that can significantly impact forecast quality, and briefly explain their purpose.** (e.g., holidays, custom seasonality, changepoints, saturation).

    * **Feature 1:** _(Name and brief explanation)_
    * **Feature 2:** _(Name and brief explanation)_


## Deliverables:

1.  This completed Jupyter Notebook (`prophet_external_regressors_assignment.ipynb`) with all code cells executed and reflection questions answered.
2.  Ensure all plots are clearly visible and well-labeled within the notebook.