# Forecasting With Machine Learning
Apply ML to any forecasting task with these four strategies.


# 📘 Lesson: Forecasting Beyond One Step Ahead

## 🧭 Introduction

In Lessons 2 and 3, we treated forecasting as a simple regression problem with all of our features derived from a single input: the **time index** ⏳. This allowed us to forecast any point in the future simply by generating the right **trend** and **seasonal** features.

But once we added **lag features** in Lesson 4, the game changed 🎮.  
Lag features depend on knowing the past target value — which isn’t available in the future!  
A **lag 1** lets you predict one step ahead, but not two or more. 🤯

In Lesson 4, we assumed we could always generate the necessary lags for each forecast (i.e., predicting just one step forward).  
🔮 **Real-world forecasting** usually requires more than this! So in this lesson, we’ll learn how to forecast for multiple steps ahead using ML models.

---

## 📐 Defining the Forecasting Task

Before building a forecasting model, you need to define:

1. 🧠 What **features** are available at the time of forecast?  
2. ⏰ What **period** you’re forecasting into? (Target)

### 🔹 Forecast Origin
The **forecast origin** is the time at which you are making a prediction.  
🧩 Typically, this is the last timestamp for which you have data.  
Anything before this point can be used to generate features.

### 🔹 Forecast Horizon
The **forecast horizon** is the time window you want to forecast — e.g., a 1-step or 5-step horizon.  
This defines your **target** 🏁.

### 🔹 Lead Time
The gap between the forecast origin and the start of the horizon is called the **lead time** (or **latency**).  
⏳ A "3-step ahead" forecast means you're forecasting three time steps ahead of your most recent data.

---

## 🛠️ Preparing Data for Forecasting

To use ML models for forecasting, we need to turn our time series 📊 into a **dataframe** 🧾.

We already did part of this in Lesson 4 when we created **lag features**.  
Now, we also need to create our **target columns**, especially for **multistep forecasting**.

### 🧾 Example: Multistep Forecasting Table

Each row = a **single forecast**  
Each target column = one **future step**  
Each feature column = one **past lag**

| Year | y_step_3 | y_step_2 | y_step_1 | y_lag_2 | y_lag_3 | y_lag_4 | y_lag_5 | y_lag_6 |
|------|----------|----------|----------|---------|---------|---------|---------|---------|
| 2010 |   2      |   1      |   0      |   NaN   |   NaN   |   NaN   |   NaN   |   NaN   |
| 2011 |   3      |   2      |   1      |   NaN   |   NaN   |   NaN   |   NaN   |   NaN   |
| 2012 |   4      |   3      |   2      |    0    |   NaN   |   NaN   |   NaN   |   NaN   |
| 2013 |   5      |   4      |   3      |    1    |    0    |   NaN   |   NaN   |   NaN   |
| 2014 |   6      |   5      |   4      |    2    |    1    |    0    |   NaN   |   NaN   |
| 2015 |   7      |   6      |   5      |    3    |    2    |    1    |    0    |   NaN   |
| 2016 |   8      |   7      |   6      |    4    |    3    |    2    |    1    |    0    |
| 2017 |   9      |   8      |   7      |    5    |    4    |    3    |    2    |    1    |
| 2018 |  10      |   9      |   8      |    6    |    5    |    4    |    3    |    2    |
| 2019 |  11      |  10      |   9      |    7    |    6    |    5    |    4    |    3    |

> 🔍 This shows a **3-step forecast horizon** with a **2-step lead time** and **5 lag features**.

---

## 📊 Multistep Forecasting Strategies

How do we make multistep forecasts from a model?  
Here are **4 common strategies**, each with its own pros and cons:

---

### 1️⃣ Multioutput Model

- 🧠 A single model that outputs multiple values (steps)
- ✅ Works great with models like **Linear Regression** and **Neural Networks**
- ❌ Not compatible with models like **XGBoost**
- 🚀 Efficient and simple!

---

### 2️⃣ Direct Strategy

- 🎯 Train **a separate model for each time step**
- E.g., 1 model for 1-step ahead, another for 2-steps ahead, etc.
- ✅ Each model is specialized 🧪
- ❌ Expensive to train (many models!)

---

### 3️⃣ Recursive Strategy

- 🔁 Train **one model** for 1-step ahead
- Use its output as input (lag) for next step
- ✅ Only one model to train
- ❌ Errors can accumulate 🔺

---

### 4️⃣ DirRec Strategy (Direct + Recursive)

- 🧬 Hybrid approach!
- Train **a model for each step**
- Use **previous predictions as lags**
- ✅ Captures dependencies better than Direct
- ❌ Still suffers from error propagation 😬

---

🧠 Which one is best? It depends on:
- Forecast horizon ⏳
- Data size 📈
- Model type 🤖
- Tolerance for error propagation 🚨

Now we’re ready to forecast like a pro! 🌟📉📈 Let's dive in!!
```


In [6]:
from pathlib import Path
import ipywidgets as widgets
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.multioutput import RegressorChain
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBRegressor


comp_dir = Path('store_sales_dataset')

store_sales = pd.read_csv(
    comp_dir / 'train.csv',
    usecols=['store_nbr', 'family', 'date', 'sales', 'onpromotion'],
    dtype={
        'store_nbr': 'category',
        'family': 'category',
        'sales': 'float32',
        'onpromotion': 'uint32',
    },
    parse_dates=['date'],
    infer_datetime_format=True,
)
store_sales['date'] = store_sales.date.dt.to_period('D')
store_sales = store_sales.set_index(['store_nbr', 'family', 'date']).sort_index()

family_sales = (
    store_sales
    .groupby(['family', 'date'])
    .mean()
    .unstack('family')
    .loc['2017']
)

test = pd.read_csv(
    comp_dir / 'test.csv',
    dtype={
        'store_nbr': 'category',
        'family': 'category',
        'onpromotion': 'uint32',
    },
    parse_dates=['date'],
    infer_datetime_format=True,
)
test['date'] = test.date.dt.to_period('D')
test = test.set_index(['store_nbr', 'family', 'date']).sort_index()


  store_sales = pd.read_csv(
  .groupby(['family', 'date'])
  test = pd.read_csv(


Let's consider the following three forecasting tasks:

a. 3-step forecast using 4 lag features with a 2-step lead time

b. 1-step forecast using 3 lag features with a 1-step lead time

c. 3-step forecast using 4 lag features with a 1-step lead time

In [None]:
from utils import (create_multistep_example,
                                          load_multistep_data,
                                          make_lags,
                                          make_multistep_target,
                                          plot_multistep)
datasets = load_multistep_data()

data_tabs = widgets.Tab([widgets.Output() for _ in enumerate(datasets)])
for i, df in enumerate(datasets):
    data_tabs.set_title(i, f'Dataset {i+1}')
    with data_tabs.children[i]:
        display(df)

display(data_tabs)

  index=pd.period_range(start='2010', freq='A', periods=n, name='Year'),
  ts = pd.Series(
  index=pd.period_range(start='2010', freq='A', periods=n, name='Year'),
  ts = pd.Series(
  index=pd.period_range(start='2010', freq='A', periods=n, name='Year'),
  ts = pd.Series(


# 1) Match description to dataset
task_a = 2

task_b = 1

task_c = 3



In [None]:
print("Training Data", "\n" + "-" * 13 + "\n", store_sales)
print("\n")
print("Test Data", "\n" + "-" * 9 + "\n", test)

# 2) Identify the forecasting task for *Store Sales* competition

The training set ends on 2017-08-15, which gives us the forecast origin. The test set comprises the dates 2017-08-16 to 2017-08-31, and this gives us the forecast horizon. There is one step between the origin and horizon, so we have a lead time of one day.

Put another way, we need a 16-step forecast with a 1-step lead time. We can use lags starting with lag 1, and we make the entire 16-step forecast using features from 2017-08-15.

# 3) Create multistep dataset for *Store Sales*
Create targets suitable for the *Store Sales* forecasting task. Use 4 days of lag features. Drop any missing values from both targets and features.

In [None]:
y = family_sales.loc[:, 'sales']

#Make 4 lag features
X = make_lags(y, lags=4).dropna()

#Make multistep target
y = make_multistep_target(y, steps=16).dropna()

y, X = y.align(X, join='inner', axis=0)

Let's prepare the data for XGBoost.

In [None]:
le = LabelEncoder()
X = (X
    .stack('family')  # wide to long
    .reset_index('family')  # convert index to column
    .assign(family=lambda x: le.fit_transform(x.family))  # label encode
)
y = y.stack('family')  # wide to long

display(y)

# 4) Forecast with the DirRec strategy
Instatiate a model that applies the DirRec strategy to XGBoost.


In [None]:
from sklearn.multioutput import RegressorChain

model = RegressorChain(base_estimator=XGBRegressor())
model.fit(X, y)

y_pred = pd.DataFrame(
    model.predict(X),
    index=y.index,
    columns=y.columns,
).clip(0.0)

See a sample of the 16-step predictions this model makes on the training data.

In [None]:
FAMILY = 'BEAUTY'
START = '2017-04-01'
EVERY = 16

y_pred_ = y_pred.xs(FAMILY, level='family', axis=0).loc[START:]
y_ = family_sales.loc[START:, 'sales'].loc[:, FAMILY]
plot_params = {'figsize': (10, 6)}

fig, ax = plt.subplots(1, 1, figsize=(11, 4))
ax = y_.plot(**plot_params, ax=ax, alpha=0.5)
ax = plot_multistep(y_pred_, ax=ax, every=EVERY)
_ = ax.legend([FAMILY, FAMILY + ' Forecast'])