# Stream Regression
---

## `BikeSharing` dataset

**Description:** This dataset contains the hourly count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. The task is to predict the number of bikes that will be rented.

**Features:** 10

|       Attribute      | Description |
|:--------------------:|:-----------------------------|
| `season`             | Season (1:winter, 2:spring, 3:summer, 4:fall)
| `yr`                 | Year (0: 2011, 1:2012)
| `mnth`               | Month (1 to 12)
| `hr`                 | Hour (0 to 23)
| `workingday`         | If day is neither weekend nor holiday is 1, otherwise is 0
| `weathersit`         | 1) Clear, Few clouds, Partly cloudy, Partly cloudy; 2)Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
| `temp`               | Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39
| `atemp`              | Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50
| `hum`                | Normalized humidity. The values are divided to 100 (max)
| `windspeed`          | Normalized wind speed. The values are divided to 67 (max)


**Target:** `cnt` | Count of total rental bikes

**Samples:** 17,379


In [6]:
import pandas as pd
from river.stream import iter_pandas
from river.metrics.base import Metrics
from river.metrics import MAE,MAPE,MSE,RMSE,base
from river.evaluate import progressive_val_score
from river.preprocessing import StandardScaler
from river.drift import ADWIN
from river.evaluate import iter_progressive_val_score
import numpy as np
import matplotlib.pyplot as plt

In [4]:
data = pd.read_csv("/content/BikeSharing.csv")
features = data.columns[:-1]
numerical_features = ['mnth','hr','weathersit','temp','atemp','hum','windspeed']

Metrics used in this example:

- MAE() – Mean Absolute Error.
Average of the absolute differences between predictions and true values. Easy to interpret and robust to outliers compared to squared-error metrics.

- MSE() – Mean Squared Error.
Average of the squared differences between predictions and true values. Penalizes large errors more heavily due to squaring.

- RMSE() – Root Mean Squared Error.
Square root of MSE. Expressed in the same units as the target variable, making it more interpretable while still penalizing large errors.

- MAPE() – Mean Absolute Percentage Error.
Average of the absolute percentage errors. Shows, in percentage terms, how far predictions deviate from true values. Sensitive when true values are close to zero.

## Linear Regression

---
[Linear Regression](https://riverml.xyz/0.21.2/api/linear-model/LinearRegression/) is a simple Linear Regression model.
To be noticed that a Linear Regression model is not able to deal with **categorical** features, so we have to use only the **numerical** ones

In [5]:
from river.linear_model import LinearRegression

model = (StandardScaler() |
        LinearRegression(intercept_lr=.1))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[numerical_features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 288.279961
MSE: 17,047,160.775399
RMSE: 4,128.820749
MAPE: 1,385.867405
[6,000] MAE: 185.032462
MSE: 8,529,694.792679
RMSE: 2,920.564122
MAPE: 763.405372
[9,000] MAE: 148.078618
MSE: 5,689,921.174497
RMSE: 2,385.355566
MAPE: 573.827743
[12,000] MAE: 139.00848
MSE: 4,273,111.613931
RMSE: 2,067.150603
MAPE: 513.158487
[15,000] MAE: 137.200488
MSE: 3,424,603.683628
RMSE: 1,850.568476
MAPE: 443.332994
[17,379] MAE: 134.258975
MSE: 2,959,394.502725
RMSE: 1,720.289075
MAPE: 410.826825


MAE: 134.258975
MSE: 2,959,394.502725
RMSE: 1,720.289075
MAPE: 410.826825

## KNNRegressor
---

[KNNRegressor](https://riverml.xyz/0.21.2/api/neighbors/KNNRegressor/) is the KNN adaptation for regression tasks

In [12]:
from river.neighbors import KNNRegressor,SWINN
import functools
from river import utils

l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (StandardScaler() |
        KNNRegressor(n_neighbors=5, engine=SWINN(dist_func=l1_dist,seed=42,maxlen=1000)))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 44.902594
MSE: 4,429.26204
RMSE: 66.552701
MAPE: 198.744291
[6,000] MAE: 57.192264
MSE: 7,171.208933
RMSE: 84.682991
MAPE: 164.019858
[9,000] MAE: 59.318665
MSE: 7,681.654907
RMSE: 87.645051
MAPE: 157.885905
[12,000] MAE: 67.674065
MSE: 10,284.12618
RMSE: 101.410681
MAPE: 177.25251
[15,000] MAE: 74.087372
MSE: 12,503.110003
RMSE: 111.817306
MAPE: 164.356208
[17,379] MAE: 77.316151
MSE: 13,685.312653
RMSE: 116.984241
MAPE: 163.918046


MAE: 77.316151
MSE: 13,685.312653
RMSE: 116.984241
MAPE: 163.918046

## AMRules
---


[AMRules](https://riverml.xyz/0.21.2/api/rules/AMRules/) is a rule-based algorithm for incremental regression tasks. AMRules relies on the Hoeffding bound to build its rule set, similarly to Hoeffding Trees. The Variance-Ratio heuristic is used to evaluate rules' splits. Moreover, this rule-based regressor has additional capacities not usually found in decision trees.
As **prediction type**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [13]:
from river.rules import AMRules

model = (StandardScaler() |
        AMRules(
            delta=0.01,
            n_min=100,
            drift_detector=ADWIN(),
            pred_type='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 47.483389
MSE: 6,712.076797
RMSE: 81.927265
MAPE: 350.621094
[6,000] MAE: 61.898876
MSE: 8,963.839316
RMSE: 94.677554
MAPE: 233.630399
[9,000] MAE: 71.896545
MSE: 163,564.406279
RMSE: 404.430966
MAPE: 264.029194
[12,000] MAE: 78.210291
MSE: 126,897.26972
RMSE: 356.22643
MAPE: 304.070931
[15,000] MAE: 83.945597
MSE: 105,891.676256
RMSE: 325.410013
MAPE: 272.681425
[17,379] MAE: 90.048823
MSE: 95,436.57975
RMSE: 308.928114
MAPE: 285.616572


MAE: 90.048823
MSE: 95,436.57975
RMSE: 308.928114
MAPE: 285.616572

## HoeffdingTreeRegressor
---
[HoeffdingTreeRegressor](https://riverml.xyz/0.21.2/api/tree/HoeffdingTreeRegressor/) is the HT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [14]:
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        HoeffdingTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            nominal_attributes=['season','yr','workingday','weathersit'],
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 37.4899
MSE: 3,510.511037
RMSE: 59.249566
MAPE: 97.853457
[6,000] MAE: 50.185978
MSE: 5,956.307979
RMSE: 77.177121
MAPE: 80.258187
[9,000] MAE: 51.661091
MSE: 7,369.524825
RMSE: 85.845937
MAPE: 86.018372
[12,000] MAE: 59.220675
MSE: 9,725.277491
RMSE: 98.616822
MAPE: 88.190962
[15,000] MAE: 66.988523
MSE: 12,196.026776
RMSE: 110.435623
MAPE: 80.271243
[17,379] MAE: 68.881706
MSE: 12,686.542083
RMSE: 112.634551
MAPE: 79.807926


MAE: 68.881706
MSE: 12,686.542083
RMSE: 112.634551
MAPE: 79.807926

## HoeffdingAdaptiveTreeRegressor
---
[HoeffdingAdaptiveTreeRegressor](https://riverml.xyz/0.21.2/api/tree/HoeffdingAdaptiveTreeRegressor/) is the HAT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [15]:
from river.tree import HoeffdingAdaptiveTreeRegressor

model = (StandardScaler() |
        HoeffdingAdaptiveTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            seed=1,
            nominal_attributes=['season','yr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 35.412433
MSE: 3,138.203416
RMSE: 56.01967
MAPE: 98.429186
[6,000] MAE: 46.440371
MSE: 5,054.758403
RMSE: 71.096824
MAPE: 83.90049
[9,000] MAE: 48.891568
MSE: 19,688.977358
RMSE: 140.317416
MAPE: 101.068058
[12,000] MAE: 58.22859
MSE: 18,758.59558
RMSE: 136.962022
MAPE: 109.19483
[15,000] MAE: 65.51235
MSE: 18,900.316196
RMSE: 137.478421
MAPE: 99.905188
[17,379] MAE: 67.944803
MSE: 18,385.118067
RMSE: 135.591733
MAPE: 98.20099


MAE: 67.944803
MSE: 18,385.118067
RMSE: 135.591733
MAPE: 98.20099

## AdaptiveRandomForestRegressor
---
[ARFRegressor](https://riverml.xyz/0.21.2/api/forest/ARFRegressor/) is the ARF adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [16]:
from river.forest import ARFRegressor

model = (StandardScaler() |
        ARFRegressor(
            n_models=10,
            seed=1,
            model_selector_decay=0.9,
            nominal_attributes=['season','yr','workingday','weathersit'],
            leaf_prediction='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features.drop('season')], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 40.669926
MSE: 4,112.69312
RMSE: 64.130282
MAPE: 163.430288
[6,000] MAE: 53.736399
MSE: 6,387.935346
RMSE: 79.92456
MAPE: 133.926258
[9,000] MAE: 55.958774
MSE: 16,622.703154
RMSE: 128.929062
MAPE: 133.725333
[12,000] MAE: 62.050059
MSE: 15,995.976187
RMSE: 126.4752
MAPE: 142.99385
[15,000] MAE: 68.20507
MSE: 16,401.697991
RMSE: 128.069114
MAPE: 131.863318
[17,379] MAE: 70.35022
MSE: 16,335.029803
RMSE: 127.808567
MAPE: 131.62528


MAE: 70.35022
MSE: 16,335.029803
RMSE: 127.808567
MAPE: 131.62528

## SRPRegressor
---
[SRPRegressor](https://riverml.xyz/0.21.2/api/ensemble/SRPRegressor/) is the SRP adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [17]:
from river.ensemble import SRPRegressor
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        SRPRegressor(
            n_models=10,
            seed=1,
            drift_detector=ADWIN(delta=0.001),
            warning_detector=ADWIN(delta=0.01),
            model = HoeffdingTreeRegressor(
                grace_period=100,
                leaf_prediction='adaptive',
                model_selector_decay=0.9,
                nominal_attributes=['season','yr','workingday','weathersit']
            )
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=3000)

[3,000] MAE: 33.8939
MSE: 2,725.414766
RMSE: 52.205505
MAPE: 149.401463
[6,000] MAE: 45.338757
MSE: 4,598.222117
RMSE: 67.810192
MAPE: 132.176415
[9,000] MAE: 49.537383
MSE: 5,335.916063
RMSE: 73.047355
MAPE: 141.604269
[12,000] MAE: 55.38207
MSE: 6,845.138537
RMSE: 82.735352
MAPE: 154.179523
[15,000] MAE: 62.50347
MSE: 8,750.134056
RMSE: 93.542151
MAPE: 149.48341
[17,379] MAE: 64.21346
MSE: 9,284.258881
RMSE: 96.354859
MAPE: 145.466495


MAE: 64.21346
MSE: 9,284.258881
RMSE: 96.354859
MAPE: 145.466495

### Linear Regression
MAE: 134.258975
MSE: 2,959,394.502725
RMSE: 1,720.289075
MAPE: 410.826825


### KNN Regressor
MAE: 77.316151
MSE: 13,685.312653
RMSE: 116.984241
MAPE: 163.918046

### AMRules
MAE: 90.048823
MSE: 95,436.57975
RMSE: 308.928114
MAPE: 285.616572

### Hoeffding Tree Regressor
MAE: 68.881706
MSE: 12,686.542083
RMSE: 112.634551
MAPE: 79.807926

### Hoeffding Adaptive Tree Regressor
MAE: 67.944803
MSE: 18,385.118067
RMSE: 135.591733
MAPE: 98.20099

### Adaptive Random Forest Regressor
MAE: 70.35022
MSE: 16,335.029803
RMSE: 127.808567
MAPE: 131.62528

### Streaming Random Patches Regressor
MAE: 64.21346
MSE: 9,284.258881
RMSE: 96.354859
MAPE: 145.466495

#### To be noticed that SRPRegressor, using Global Random Subspaces, achieves best performance and takes less time to execute w.r.t. ARFRegressor