In this example we're going to forecast the number of bikes in 5 bike stations from the city of Toulouse.
We'll do so by using sail's river wrapper.

This tutorial is based on rivers' own example: https://riverml.xyz/dev/examples/bike-sharing-forecasting/.
Different from river's tutorial, we will avoid using the function evaluate.progressive_val_score.
We will do so by controlling our self the training and evaluating loop.


In [1]:
import pandas as pd
from river import datasets, metrics, optim, stats
from sail.transformers.river.feature_extraction import TargetAgg
from sail.models.river.linear_model import LinearRegression
from sail.transformers.river.preprocessing import StandardScaler
from sail.transformers.river.compose import Select

### Loading the dataset


In [2]:
dataset = datasets.Bikes()
x, y = [], []
for data, label in dataset:
    x.append(data)
    y.append(label)

df = pd.DataFrame(x)
df["target"] = y

X = df.drop(["description","target"], axis=1)
y = df["target"]

### Create SAIL transformers and start incremental training


In [4]:
select = Select(['clouds', 'humidity', 'pressure', 'temperature', 'wind'])
scaler = StandardScaler()
model = LinearRegression(optimizer=optim.SGD(0.001))

metric = metrics.MAE()

batch_size = 1
for start in range(0, X.shape[0], batch_size):
    end = start + batch_size
    X_train = X.iloc[start:end]
    y_train = y.iloc[start:end]

    if start > 0:
        # Predicting
        X_train_predict = X_train.copy()
        X_train_predict = select.transform(X_train_predict)
        X_train_predict = scaler.transform(X_train_predict)
        yhat = model.predict(X_train_predict)

        # Update the metric
        metric.update(y_train.to_numpy(), yhat)

    # Partially fitting the model
    X_train = select.partial_fit_transform(X_train)
    X_train = scaler.partial_fit_transform(X_train)
    model.partial_fit(X_train, y_train)

    if start % 20000 == 0:
        print("MAE after", start, "iterations", metric.get())

print("Finally, MAE:", metric.get())

MAE after 0 iterations 0.0
MAE after 20000 iterations [4.91369848]
MAE after 40000 iterations [5.33356474]
MAE after 60000 iterations [5.33099467]
MAE after 80000 iterations [5.39232983]
MAE after 100000 iterations [5.42310781]
MAE after 120000 iterations [5.54129902]
MAE after 140000 iterations [5.61305014]
MAE after 160000 iterations [5.62248674]
MAE after 180000 iterations [5.5678413]
Finally, MAE: [5.56392979]


### Add a new SAIL transformer: TargetAgg and restart incremental training

For each station we can look at the average number of bikes per hour. To do so we first have to extract the hour from the moment field. We can then use a TargetAgg to aggregate the values of the target.


In [None]:
df["hour"] = df.moment.dt.hour

X = df.drop(["moment", "description","target"], axis=1)
y = df["target"]

In [3]:
metric = metrics.MAE()

agg = TargetAgg(
    by=["station", "hour"],
    how=stats.Mean(),
)
scaler = StandardScaler()
model = LinearRegression(optimizer=optim.SGD(0.001))

batch_size = 1
for start in range(0, X.shape[0], batch_size):
    end = start + batch_size
    X_train = X.iloc[start:end]
    y_train = y.iloc[start:end]
    
    if start > 0:
        X_train_predict = X_train.copy()
        X_train_predict.insert(0, "agg", agg.transform(X_train_predict), True)
        X_train_predict = X_train_predict.drop(["station", "hour"], axis=1)
        X_train_predict = scaler.transform(X_train_predict)

        # Predicting
        yhat = model.predict(X_train_predict)

        # Update the metric
        metric.update(y_train.to_numpy(), yhat)

    X_train.insert(0, "agg", agg.partial_fit_transform(X_train, y_train), True)
    X_train = X_train.drop(["station", "hour"], axis=1)
    X_train = scaler.partial_fit_transform(X_train, y_train)
    
    # Partially fitting the model
    model.partial_fit(X_train, y_train)

    if start % 20000 == 0:
        print("MAE after", start, "iterations", metric.get())

print("Finally, MAE:", metric.get())

MAE after 0 iterations 0.0
MAE after 20000 iterations [3.69599632]
MAE after 40000 iterations [3.81656079]
MAE after 60000 iterations [3.83577123]
MAE after 80000 iterations [3.90303467]
MAE after 100000 iterations [3.88279632]
MAE after 120000 iterations [3.91873956]
MAE after 140000 iterations [3.97662073]
MAE after 160000 iterations [3.94625184]
MAE after 180000 iterations [3.93115752]
Finally, MAE: [3.93017217]


The model have improved considerably by adding the average number of bikes.
However, in real life scenarios we will not be able to update the average number of bikes immediately.

Instead, we will have to wait for some time before having that true values.

River's evaluate.progressive_val_score allows you to simulate this real life scenarios by adding a "delay". For more information: https://riverml.xyz/dev/api/stream/simulate-qa/
