# Stream Regression
---

## `BikeSharing` dataset

**Description:** This dataset contains the hourly count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. The task is to predict the number of bikes that will be rented.

**Features:** 10
 
|       Attribute      | Description |
|:--------------------:|:-----------------------------|
| `season`             | Season (1:winter, 2:spring, 3:summer, 4:fall)
| `yr`                 | Year (0: 2011, 1:2012)
| `mnth`               | Month (1 to 12)
| `hr`                 | Hour (0 to 23)
| `workingday`         | If day is neither weekend nor holiday is 1, otherwise is 0
| `weathersit`         | 1) Clear, Few clouds, Partly cloudy, Partly cloudy; 2)Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
| `temp`               | Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39
| `atemp`              | Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50
| `hum`                | Normalized humidity. The values are divided to 100 (max)
| `windspeed`          | Normalized wind speed. The values are divided to 67 (max)


**Target:** `cnt` | Count of total rental bikes
 
**Samples:** 17,379


In [12]:
import pandas as pd
from river.stream import iter_pandas
from river.metrics.base import Metrics
from river.metrics import MAE,MAPE,MSE,RMSE,base
from river.evaluate import progressive_val_score
from river.preprocessing import StandardScaler
from river.drift import ADWIN
import numpy as np

In [4]:
data = pd.read_csv("../datasets/BikeSharing.csv")
features = data.columns[:-1]
numerical_features = ['mnth','hr','weathersit','temp','atemp','hum','windspeed']

## Linear Regression

---
[Linear Regression](https://riverml.xyz/0.21.2/api/linear-model/LinearRegression/) is a simple Linear Regression model.
To be noticed that a Linear Regression model is not able to deal with **categorical** features, so we have to use only the **numerical** ones

In [13]:
from river.linear_model import LinearRegression

model = (StandardScaler() |
        LinearRegression(intercept_lr=.1))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[numerical_features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 735.343523
MSE: 51,125,088.341664
RMSE: 7,150.181001
MAPE: 3,546.289952
[2,000] MAE: 400.344274
MSE: 25,566,803.941949
RMSE: 5,056.362719
MAPE: 1,991.906978
[3,000] MAE: 288.279961
MSE: 17,047,160.775399
RMSE: 4,128.820749
MAPE: 1,385.867405
[4,000] MAE: 236.423313
MSE: 12,788,332.199829
RMSE: 3,576.077768
MAPE: 1,079.296069
[5,000] MAE: 205.389904
MSE: 10,233,139.485564
RMSE: 3,198.927865
MAPE: 888.089114
[6,000] MAE: 185.032462
MSE: 8,529,694.792679
RMSE: 2,920.564122
MAPE: 763.405372
[7,000] MAE: 170.503776
MSE: 7,312,972.429118
RMSE: 2,704.250807
MAPE: 679.262035
[8,000] MAE: 158.479504
MSE: 6,400,147.288562
RMSE: 2,529.851238
MAPE: 618.194916
[9,000] MAE: 148.078618
MSE: 5,689,921.174497
RMSE: 2,385.355566
MAPE: 573.827743
[10,000] MAE: 141.696339
MSE: 5,122,247.95603
RMSE: 2,263.238378
MAPE: 550.890392
[11,000] MAE: 140.222639
MSE: 4,659,041.05356
RMSE: 2,158.481191
MAPE: 533.852722
[12,000] MAE: 139.00848
MSE: 4,273,111.613931
RMSE: 2,067.150603
MAPE: 513.158487
[13

MAE: 134.258975
MSE: 2,959,394.502725
RMSE: 1,720.289075
MAPE: 410.826825

## KNNRegressor
---

[KNNRegressor](https://riverml.xyz/0.21.2/api/neighbors/KNNRegressor/) is the KNN adaptation for regression tasks

In [18]:
from river.neighbors import KNNRegressor,SWINN
import functools
from river import utils

l1_dist = functools.partial(utils.math.minkowski_distance, p=1)
model = (StandardScaler() |
        KNNRegressor(n_neighbors=5, engine=SWINN(dist_func=l1_dist,seed=42,maxlen=1000)))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 30.015
MSE: 1,799.621
RMSE: 42.42194
MAPE: 200.218581
[2,000] MAE: 36.341
MSE: 2,717.981
RMSE: 52.134259
MAPE: 210.661063
[3,000] MAE: 44.844667
MSE: 4,435.529333
RMSE: 66.59977
MAPE: 196.138964
[4,000] MAE: 50.5755
MSE: 5,701.9105
RMSE: 75.510996
MAPE: 187.548853
[5,000] MAE: 54.667
MSE: 6,633.2058
RMSE: 81.444495
MAPE: 171.011353
[6,000] MAE: 57.1395
MSE: 7,176.460167
RMSE: 84.71399
MAPE: 162.237467
[7,000] MAE: 58.837857
MSE: 7,595.294714
RMSE: 87.150988
MAPE: 157.402088
[8,000] MAE: 59.18625
MSE: 7,660.1565
RMSE: 87.52232
MAPE: 153.581202
[9,000] MAE: 59.260667
MSE: 7,685.615778
RMSE: 87.667644
MAPE: 156.196917
[10,000] MAE: 60.7365
MSE: 8,070.4229
RMSE: 89.835533
MAPE: 165.202612
[11,000] MAE: 64.109727
MSE: 9,139.849727
RMSE: 95.602561
MAPE: 170.128652
[12,000] MAE: 67.61275
MSE: 10,286.573583
RMSE: 101.422747
MAPE: 175.582461
[13,000] MAE: 70.067538
MSE: 11,161.438923
RMSE: 105.647711
MAPE: 170.424833
[14,000] MAE: 71.931429
MSE: 11,756.521
RMSE: 108.427492
MAPE: 16

MAE: 77.254675
MSE: 13,688.211232
RMSE: 116.996629
MAPE: 162.493998

## AMRules
---


[AMRules](https://riverml.xyz/0.21.2/api/rules/AMRules/) is a rule-based algorithm for incremental regression tasks. AMRules relies on the Hoeffding bound to build its rule set, similarly to Hoeffding Trees. The Variance-Ratio heuristic is used to evaluate rules' splits. Moreover, this rule-based regressor has additional capacities not usually found in decision trees.
As **prediction type**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [19]:
from river.rules import AMRules

model = (StandardScaler() |
        AMRules(
            delta=0.01,
            n_min=100,
            drift_detector=ADWIN(),
            pred_type='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 34.073961
MSE: 2,333.182324
RMSE: 48.303026
MAPE: 347.575514
[2,000] MAE: 41.58626
MSE: 6,564.024595
RMSE: 81.018668
MAPE: 433.914491
[3,000] MAE: 47.483389
MSE: 6,712.076797
RMSE: 81.927265
MAPE: 350.621094
[4,000] MAE: 54.68614
MSE: 7,855.283332
RMSE: 88.630036
MAPE: 291.658369
[5,000] MAE: 61.305879
MSE: 8,940.030828
RMSE: 94.551736
MAPE: 276.00514
[6,000] MAE: 63.500615
MSE: 9,267.999252
RMSE: 96.270448
MAPE: 248.769111
[7,000] MAE: 64.973929
MSE: 9,452.150493
RMSE: 97.222171
MAPE: 228.719888
[8,000] MAE: 65.384508
MSE: 9,340.362874
RMSE: 96.645553
MAPE: 221.772122
[9,000] MAE: 93.134097
MSE: 800,663.323923
RMSE: 894.797924
MAPE: 493.052955
[10,000] MAE: 93.885754
MSE: 722,116.80404
RMSE: 849.774561
MAPE: 525.908911
[11,000] MAE: 96.666083
MSE: 658,684.497159
RMSE: 811.593801
MAPE: 538.760464
[12,000] MAE: 97.908436
MSE: 605,577.260946
RMSE: 778.188448
MAPE: 523.914997
[13,000] MAE: 98.849456
MSE: 560,688.987768
RMSE: 748.791685
MAPE: 497.128307
[14,000] MAE: 99.177562

MAE: 98.716742
MSE: 424,409.866798
RMSE: 651.467472
MAPE: 405.74488

## HoeffdingTreeRegressor
---
[HoeffdingTreeRegressor](https://riverml.xyz/0.21.2/api/tree/HoeffdingTreeRegressor/) is the HT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [20]:
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        HoeffdingTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,            
            nominal_attributes=['season','yr','workingday','weathersit'],           
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 26.758816
MSE: 1,516.643752
RMSE: 38.944111
MAPE: 135.528001
[2,000] MAE: 30.577758
MSE: 2,202.844686
RMSE: 46.934472
MAPE: 113.675784
[3,000] MAE: 37.4899
MSE: 3,510.511037
RMSE: 59.249566
MAPE: 97.853457
[4,000] MAE: 44.279857
MSE: 4,873.181758
RMSE: 69.808178
MAPE: 89.323575
[5,000] MAE: 48.465732
MSE: 5,685.60826
RMSE: 75.402972
MAPE: 82.797089
[6,000] MAE: 50.185978
MSE: 5,956.307979
RMSE: 77.177121
MAPE: 80.258187
[7,000] MAE: 50.950212
MSE: 6,033.007028
RMSE: 77.672434
MAPE: 77.423233
[8,000] MAE: 50.544324
MSE: 5,889.136255
RMSE: 76.740708
MAPE: 76.160884
[9,000] MAE: 51.661091
MSE: 7,369.524825
RMSE: 85.845937
MAPE: 86.018372
[10,000] MAE: 52.456146
MSE: 7,439.15977
RMSE: 86.250564
MAPE: 90.403291
[11,000] MAE: 55.857505
MSE: 8,606.302065
RMSE: 92.770157
MAPE: 89.909142
[12,000] MAE: 59.220675
MSE: 9,725.277491
RMSE: 98.616822
MAPE: 88.190962
[13,000] MAE: 62.274777
MSE: 10,648.576395
RMSE: 103.19194
MAPE: 85.115081
[14,000] MAE: 64.601209
MSE: 11,300.027786
RMSE:

MAE: 68.881706
MSE: 12,686.542083
RMSE: 112.634551
MAPE: 79.807926

## HoeffdingAdaptiveTreeRegressor
---
[HoeffdingAdaptiveTreeRegressor](https://riverml.xyz/0.21.2/api/tree/HoeffdingAdaptiveTreeRegressor/) is the HAT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [21]:
from river.tree import HoeffdingAdaptiveTreeRegressor

model = (StandardScaler() |
        HoeffdingAdaptiveTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            seed=1,            
            nominal_attributes=['season','yr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 25.223692
MSE: 1,404.530824
RMSE: 37.477071
MAPE: 127.13104
[2,000] MAE: 27.801114
MSE: 1,825.676437
RMSE: 42.727935
MAPE: 107.728053
[3,000] MAE: 35.085616
MSE: 3,137.103959
RMSE: 56.009856
MAPE: 100.4674
[4,000] MAE: 44.557521
MSE: 4,939.765456
RMSE: 70.283465
MAPE: 101.588627
[5,000] MAE: 49.630134
MSE: 5,862.779579
RMSE: 76.56879
MAPE: 94.255019
[6,000] MAE: 51.345082
MSE: 6,155.373118
RMSE: 78.456186
MAPE: 92.391688
[7,000] MAE: 54.094229
MSE: 6,786.165622
RMSE: 82.378187
MAPE: 90.476854
[8,000] MAE: 54.415821
MSE: 6,800.147578
RMSE: 82.463007
MAPE: 90.109306
[9,000] MAE: 54.886511
MSE: 9,254.114518
RMSE: 96.198308
MAPE: 99.692771
[10,000] MAE: 55.606362
MSE: 9,157.607741
RMSE: 95.69539
MAPE: 103.773302
[11,000] MAE: 58.446783
MSE: 10,076.936576
RMSE: 100.383946
MAPE: 101.309362
[12,000] MAE: 60.210008
MSE: 10,431.392204
RMSE: 102.134187
MAPE: 98.338156
[13,000] MAE: 61.145745
MSE: 10,539.498742
RMSE: 102.662061
MAPE: 94.260906
[14,000] MAE: 62.159938
MSE: 10,730.9604

MAE: 63.892874
MSE: 10,926.391394
RMSE: 104.529381
MAPE: 85.230238

## AdaptiveRandomForestRegressor
---
[ARFRegressor](https://riverml.xyz/0.21.2/api/forest/ARFRegressor/) is the ARF adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [23]:
from river.forest import ARFRegressor

model = (StandardScaler() |
        ARFRegressor(
            n_models=10,
            seed=1,
            model_selector_decay=0.9,           
            nominal_attributes=['season','yr','workingday','weathersit'],
            leaf_prediction='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features.drop('season')], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 27.203403
MSE: 1,658.751753
RMSE: 40.727776
MAPE: 202.242978
[2,000] MAE: 33.224231
MSE: 2,518.978681
RMSE: 50.189428
MAPE: 192.828196
[3,000] MAE: 41.608445
MSE: 4,373.026693
RMSE: 66.128864
MAPE: 164.867927
[4,000] MAE: 48.557059
MSE: 5,575.085451
RMSE: 74.666495
MAPE: 150.639754
[5,000] MAE: 52.522809
MSE: 6,100.895022
RMSE: 78.108226
MAPE: 139.69539
[6,000] MAE: 54.484408
MSE: 6,470.880459
RMSE: 80.441783
MAPE: 134.076615
[7,000] MAE: 55.210988
MSE: 6,669.23035
RMSE: 81.665356
MAPE: 127.799098
[8,000] MAE: 54.726657
MSE: 6,526.012694
RMSE: 80.78374
MAPE: 123.882905
[9,000] MAE: 55.666096
MSE: 7,507.265628
RMSE: 86.644478
MAPE: 129.812128
[10,000] MAE: 56.766551
MSE: 7,667.998901
RMSE: 87.567111
MAPE: 132.845716
[11,000] MAE: 59.034461
MSE: 8,292.152854
RMSE: 91.061259
MAPE: 131.192271
[12,000] MAE: 61.016697
MSE: 8,790.042341
RMSE: 93.755226
MAPE: 127.570557
[13,000] MAE: 62.891046
MSE: 9,243.507623
RMSE: 96.143162
MAPE: 122.359976
[14,000] MAE: 64.884081
MSE: 9,773.43

MAE: 68.643015
MSE: 10,896.113678
RMSE: 104.384451
MAPE: 116.025267

## SRPRegressor
---
[SRPRegressor](https://riverml.xyz/0.21.2/api/ensemble/SRPRegressor/) is the SRP adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [24]:
from river.ensemble import SRPRegressor
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        SRPRegressor(
            n_models=10,
            seed=1,
            drift_detector=ADWIN(delta=0.001),
            warning_detector=ADWIN(delta=0.01),
            model = HoeffdingTreeRegressor(
                grace_period=100,
                leaf_prediction='adaptive',
                model_selector_decay=0.9,                
                nominal_attributes=['season','yr','workingday','weathersit']
            )            
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 21.660852
MSE: 1,082.717083
RMSE: 32.904667
MAPE: 149.90175
[2,000] MAE: 26.294077
MSE: 1,551.655382
RMSE: 39.391057
MAPE: 152.856793
[3,000] MAE: 33.184048
MSE: 2,640.543477
RMSE: 51.386219
MAPE: 143.382742
[4,000] MAE: 39.362225
MSE: 3,647.993961
RMSE: 60.398625
MAPE: 138.091711
[5,000] MAE: 42.500925
MSE: 4,100.260749
RMSE: 64.033278
MAPE: 128.909848
[6,000] MAE: 44.988719
MSE: 4,529.089262
RMSE: 67.298509
MAPE: 125.737001
[7,000] MAE: 46.489551
MSE: 4,766.829872
RMSE: 69.042233
MAPE: 123.707196
[8,000] MAE: 47.093812
MSE: 4,859.926185
RMSE: 69.713171
MAPE: 122.466616
[9,000] MAE: 46.717087
MSE: 4,805.897888
RMSE: 69.324584
MAPE: 125.1953
[10,000] MAE: 47.364178
MSE: 4,949.929464
RMSE: 70.355735
MAPE: 132.298181
[11,000] MAE: 50.073606
MSE: 5,609.177207
RMSE: 74.89444
MAPE: 138.362816
[12,000] MAE: 51.970348
MSE: 6,087.215358
RMSE: 78.020609
MAPE: 138.024677
[13,000] MAE: 53.506743
MSE: 6,447.132637
RMSE: 80.294039
MAPE: 133.544938
[14,000] MAE: 55.068632
MSE: 6,837.673

MAE: 59.287195
MSE: 7,949.29465
RMSE: 89.158817
MAPE: 127.874836

#### To be noticed that SRPRegressor, using Global Random Subspaces, achieves best performance and takes less time to execute w.r.t. ARFRegressor 