**This project demonstrates the use of [AutoGluon](https://auto.gluon.ai/stable/index.html) : [Time Series Forecasting](auto.gluon.ai/stable/tutorials/timeseries/index.html) for forecasting iron ore prices. The workflow includes data preprocessing, feature engineering, model training, evaluation, and visualization. The results show the model's predictions and the calculated Mean Absolute Error (MAE), providing insights into the model's performance.**


## Setup and Libraries

In [None]:
!pip install autogluon.timeseries
!pip install autogluon

In [None]:
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from google.colab import drive
from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame
from sklearn.metrics import mean_absolute_error

In [None]:
# Google Drive'ı mount etme
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Load the Dataset and Data Preparation

**Convert the 'Date' column to the required format and create necessary columns for AutoGluon.**

In [None]:
# Load the data file
df = pd.read_csv("/content/drive/MyDrive/colab driver/ore_price_single_monthly.csv", parse_dates=['Date'])

# Define the 'item_id' value as a variable
item_id_value = 'iron_ore'

# Convert the 'Date' column to only include year, month, and day
df['timestamp'] = pd.to_datetime(df['Date'].dt.strftime('%d/%m/%Y'))

# Add the 'item_id' column and assign the defined value
df['item_id'] = item_id_value

# Rename the 'Price' column to 'target'
df = df.rename(columns={'Price': 'target'})

# Select and arrange the necessary columns
df = df[['item_id', 'timestamp', 'target']]

In [None]:
# Visualization
fig = go.Figure(data=go.Scatter(
    x=df['timestamp'],
    y=df['target'],
    mode='lines',
    name='Iron Ore Price',
    line=dict(width=2)))

fig.update_layout(
    title="Iron Ore Price over Time",
    xaxis_title="Date",
    yaxis_title="Price",
    showlegend=False,
    xaxis_range=['2006-01-01', '2025-1-01'],)

fig.show()

In [None]:
total_rows = df.shape[0]
print(f"Total number of rows in the dataset: {total_rows}")

Total number of rows in the dataset: 222


## Feature Engineering

**Add lag features and rolling window features to capture temporal dependencies in the data.**

In [None]:
# Feature engineering: add lag features and rolling window features
df['lag_1'] = df['target'].shift(1)
df['lag_2'] = df['target'].shift(2)
df['lag_3'] = df['target'].shift(3)

df['rolling_mean_3'] = df['target'].rolling(window=3).mean()
df['rolling_std_3'] = df['target'].rolling(window=3).std()

# Drop rows with NaN values created by lag and rolling window operations
df = df.dropna().reset_index(drop=True)

# df.head(15)
# df.tail(15)

## Convert Data for AutoGluon

**Convert the prepared dataframe into the format required by AutoGluon.**

In [None]:
train_data = TimeSeriesDataFrame.from_data_frame(
    df,
    id_column="item_id",
    timestamp_column="timestamp")

## Define and Train the Model

**Set up and train the AutoGluon TimeSeriesPredictor with specified hyperparameters.**

In [None]:
# Define the model
predictor = TimeSeriesPredictor(
    prediction_length=36, #for a 3-year price forecast
    freq="MS",
    path="autogluon-iron-ore-monthly",
    target="target",
    eval_metric="MAE"
)

# Define the hyperparameters
hyperparameters = {
    "ETS": {
        "seasonal": "additive",
        "damped_trend": True
    },
    "Theta": {
        "decomposition_type": "multiplicative"
    },
    "DeepAR": {
        "epochs": 200,
        "learning_rate": 0.00005,
        "num_cells": 100,
        "num_layers": 3,
        "dropout_rate": 0.2,
        "batch_size": 32,
        "early_stopping_patience": 30
    }
}

# Train the model
predictor.fit(
    train_data=train_data,
    presets="high_quality",
    time_limit=3600,
    hyperparameters=hyperparameters)

Beginning AutoGluon training... Time limit = 3600s
AutoGluon will save models to 'autogluon-iron-ore-monthly'
AutoGluon Version:  1.1.1
Python Version:     3.10.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          2
GPU Count:          0
Memory Avail:       10.35 GB / 12.67 GB (81.7%)
Disk Space Avail:   76.21 GB / 107.72 GB (70.8%)
Setting presets to: high_quality

Fitting with arguments:
{'enable_ensemble': True,
 'eval_metric': MAE,
 'freq': 'MS',
 'hyperparameters': {'DeepAR': {'batch_size': 32,
                                'dropout_rate': 0.2,
                                'early_stopping_patience': 30,
                                'epochs': 200,
                                'learning_rate': 5e-05,
                                'num_cells': 100,
                                'num_layers': 3},
                     'ETS': {'damped_trend': True, 'seasonal': 'additive'},
     

<autogluon.timeseries.predictor.TimeSeriesPredictor at 0x7c7f1acf5060>

---
**Evaluation and Explanation**

---
There might be some confusion between the MAE value calculated in the model and the MAE reported by AutoGluon. AutoGluon calculates the MAE during model training using the validation set. However, the MAE value you calculate in the model represents the error between the predicted and actual values for the entire dataset.

**AutoGluon Validation MAE**
During model training, AutoGluon splits the dataset into training and validation sets. The validation set is used to evaluate the model's performance during training. The following MAE values are reported for different models:

*   ETS Model: Validation MAE = 93.33
*   Theta Model: Validation MAE = 88.39
*   DeepAR Model: Validation MAE = 28.41
*   Weighted Ensemble Model: Validation MAE = `15.41` (Best model)

These values show the model's performance on the validation set. The Weighted Ensemble Model, which combines the predictions from DeepAR and Theta models, performed the best with the lowest validation MAE.

## Make Predictions

Use the best model found during training to make predictions.

In [None]:
# Make predictions using the best model explicitly
best_model = predictor.model_best
predictions = predictor.predict(train_data, model=best_model)

# Convert predictions to a pandas DataFrame format
predictions_df = predictions.reset_index()

# Convert predictions columns to merge with original data
predictions_df = predictions_df[['item_id', 'timestamp', 'mean', '0.1', '0.9']]
predictions_df.columns = ['item_id', 'timestamp', 'mean', 'q0.1', 'q0.9']

## Combine Predictions with Original Data

Merge the predictions with the original data for comparison and visualization.

In [None]:
# Merge with original data
combined_df = pd.concat([df[['item_id', 'timestamp', 'target']],
                         predictions_df[['item_id', 'timestamp', 'mean']].rename(columns={'mean': 'target'})],
                        ignore_index=True)

# Label the original data and predictions
combined_df['type'] = 'actual'
combined_df.loc[len(df):, 'type'] = 'forecast'

## Evaluate the Model

Calculate the Mean Absolute Error (MAE) for the predictions.

In [None]:
# Calculate MAE for the predictions (only for overlapping periods)
overlap_period = min(len(df), len(predictions_df))
actual = df['target'].values[-overlap_period:]
predicted = predictions_df['mean'].values[:overlap_period]
mae = mean_absolute_error(actual, predicted)
mae_fig = go.Figure()
mae_fig.add_trace(go.Indicator(mode="number",value=mae,title={"text": "Mean Absolute Error (MAE)"},))
mae_fig.update_layout(height=200)
mae_fig.show()

In [None]:
# predictions_df.head(10)

## Visualization

**Visualize the actual and forecasted values using Plotly.**

P10 and P90 represent the probability distribution boundaries. There is a 10% probability that the actual value will be above the upper line (P10), and a 90% probability that the actual value will be above the lower line (P90).

In [None]:
# Visualization
fig = px.line(combined_df, x='timestamp', y='target', color='type', title='Time Series Forecast with AutoGluon')

fig.add_scatter(x=predictions_df['timestamp'], y=predictions_df['q0.1'], mode='lines', name='P 10', line=dict(color='green', width=1))
fig.add_scatter(x=predictions_df['timestamp'], y=predictions_df['q0.9'], mode='lines', name='P 90', line=dict(color='black', width=1))

fig.add_traces(go.Scatter(
    x=list(predictions_df['timestamp']) + list(predictions_df['timestamp'][::-1]),
    y=list(predictions_df['q0.1']) + list(predictions_df['q0.9'][::-1]),
    fill='toself',
    fillcolor='rgba(0,100,80,0.2)',
    line=dict(color='rgba(255,255,255,0)'),
    showlegend=False,
    name='Confidence Interval'))

fig.update_xaxes(rangeslider_visible=True)

# Update layout to ensure visibility of all components
fig.update_layout(
    title='Time Series Forecast with AutoGluon',
    xaxis_title='Date',
    yaxis_title='Price',
    legend_title='Legend',
    xaxis_range=['2006','2028']
)
fig.show()

In [None]:
# Calculate the yearly average of the forecasts
predictions_df['year'] = predictions_df['timestamp'].dt.year
yearly_avg = predictions_df.groupby('year')['mean'].mean().round(2).reset_index()
print("Yearly average of the forecasts:")
print(yearly_avg)

Yearly average of the forecasts:
   year   mean
0  2024  93.24
1  2025  67.58
2  2026  70.69
3  2027  80.61
