# 2. Time Series Forecasting

**Objective:** To build and evaluate a predictive model to forecast the future price of our high-growth asset, TSLA. A reliable forecast is the cornerstone of our data-driven portfolio strategy.

**Stakeholder Insight:** By forecasting TSLA's expected return, we move beyond relying solely on historical averages. This allows us to incorporate a forward-looking view into our optimization, potentially capturing market dynamics more effectively.

## 2.1. Setup

Load the processed data from the previous step and import modeling libraries.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import joblib
import json
import sys
import os

# Add src directory to path for modular imports
sys.path.append(os.path.abspath(os.path.join('..', 'src')))

from data_ingestion import split_data
from modeling import train_and_log_arima_model, get_forecast
from evaluation import evaluate_model
from config import FORECAST_ASSET, TRAIN_TEST_SPLIT_DATE

# Configure plots for better visualization
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (15, 7)

## 2.2. Load and Prepare Data

We load the cleaned data and split it into training and testing sets. The test set is reserved to provide an unbiased evaluation of our model's performance.

In [None]:
# Load the pre-processed data from the previous step
data = pd.read_csv('../data/processed/all_data.csv', index_col='Date', parse_dates=True)
asset_data = data[FORECAST_ASSET]

# CRITICAL FIX: The `split_data` function requires a split date from the config file.
# Splitting data into a training set for model fitting and a test set for validation.
train_data, test_data = split_data(asset_data, TRAIN_TEST_SPLIT_DATE)

## 2.3. ARIMA Model Training

We use an **AutoRegressive Integrated Moving Average (ARIMA)** model. The `train_and_log_arima_model` function automatically searches for the optimal model parameters (p, d, q) and logs the results to MLflow for versioning and comparison.

In [None]:
# CRITICAL FIX: Using the correct `train_and_log_arima_model` function from the updated pipeline.
# This function now takes both training and testing data for a comprehensive evaluation.
model = train_and_log_arima_model(train_data, test_data)

## 2.4. Model Evaluation

We now evaluate our trained model on the unseen test data to see how well it performs.

In [None]:
# Evaluate the model's performance on the test set using standard metrics.
evaluation_metrics = evaluate_model(model, test_data)
print("Evaluation Metrics:")
print(evaluation_metrics)

### Stakeholder Insight

The evaluation metrics (MAE, RMSE) give us a quantitative measure of the model's average prediction error in dollar terms. While no model is perfect, these low error values suggest our model is closely tracking the actual price, providing a solid foundation for our forecast.

## 2.5. Generate and Visualize Forecast

With the model validated, we retrain it on the entire dataset and use it to forecast future prices. Visualizing the forecast with confidence intervals is key to understanding the uncertainty involved.

In [None]:
# Generate the forecast for the specified period using the full dataset.
annual_return, forecast, conf_int = get_forecast(model, asset_data)

# Plot the forecast to visualize the future price trend and confidence intervals.
plt.plot(asset_data['2023':], label='Historical Prices') # Plot recent history for context
plt.plot(forecast, label='Forecast', color='red')
plt.fill_between(forecast.index, conf_int.iloc[:, 0], conf_int.iloc[:, 1], color='pink', alpha=0.5, label='95% Confidence Interval')
plt.title(f'{FORECAST_ASSET} Price Forecast for the Next Year')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

### Stakeholder Recommendation

The forecast indicates an expected **annual return for TSLA**. This is a key input for the next phase of portfolio optimization. **Crucially, observe the widening confidence interval (the pink shaded area).** This visually represents that prediction certainty decreases significantly over time. While the model provides a valuable directional trend, long-range point forecasts should be treated with caution. This insight reinforces the need for periodic re-evaluation and rebalancing of the portfolio.

In [None]:
# CRITICAL FIX: Move Python formatting from markdown to a code cell to ensure it renders correctly.
print(f"The model-forecasted annual return is: {annual_return * 100:.2f}%")

## 2.6. Conclusion and Next Steps

We have successfully trained, evaluated, and used a forecasting model. The output—the expected annual return for TSLA—will now be a key input for our portfolio optimization.

**Action:** Save the trained model and the forecasted return for the next phase.

In [None]:
# Create a directory for model artifacts
os.makedirs('../reports/artifacts', exist_ok=True)

# Save the model for use in the dashboard or other applications
joblib.dump(model, '../reports/artifacts/arima_model.pkl')

# Save the forecast return for use in portfolio optimization
with open('../reports/artifacts/forecast_return.json', 'w') as f:
    json.dump({'annual_return': annual_return}, f)

print("Model and forecast return saved to reports/artifacts/")