# Stream Regression
---

## `BikeSharing` dataset

**Description:** This dataset contains the hourly count of rental bikes between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information. The task is to predict the number of bikes that will be rented.

**Features:** 10
 
|       Attribute      | Description |
|:--------------------:|:-----------------------------|
| `season`             | Season (1:winter, 2:spring, 3:summer, 4:fall)
| `yr`                 | Year (0: 2011, 1:2012)
| `mnth`               | Month (1 to 12)
| `hr`                 | Hour (0 to 23)
| `workingday`         | If day is neither weekend nor holiday is 1, otherwise is 0
| `weathersit`         | 1) Clear, Few clouds, Partly cloudy, Partly cloudy; 2)Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3) Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4) Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
| `temp`               | Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39
| `atemp`              | Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50
| `hum`                | Normalized humidity. The values are divided to 100 (max)
| `windspeed`          | Normalized wind speed. The values are divided to 67 (max)


**Target:** `cnt` | Count of total rental bikes
 
**Samples:** 17,379


In [1]:
import pandas as pd
from river.stream import iter_pandas
from river.metrics import Metrics,MAE,MSE,RMSE,base
from river.evaluate import progressive_val_score
from river.preprocessing import StandardScaler
from river.drift import ADWIN
import numpy as np

In [2]:
class MAPE(base.MeanMetric, base.RegressionMetric):    
    def _eval(self, y_true, y_pred):
        y_true, y_pred = np.array(y_true), np.array(y_pred)
        return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

In [3]:
data = pd.read_csv("../datasets/BikeSharing.csv")
features = data.columns[:-1]
numerical_features = ['mnth','hr','weathersit','temp','atemp','hum','windspeed']

## Linear Regression

---
[Linear Regression](https://riverml.xyz/0.10.1/api/linear-model/LinearRegression/) is a simple Linear Regression model.
To be noticed that a Linear Regression model is not able to deal with **categorical** features, so we have to use only the **numerical** ones

In [17]:
from river.linear_model import LinearRegression

model = (StandardScaler() |
        LinearRegression(intercept_lr=.1))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[numerical_features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 702.412945, MSE: 46,184,160.882792, RMSE: 6,795.892942, MAPE: 3,322.300054
[2,000] MAE: 383.854206, MSE: 23,096,328.899143, RMSE: 4,805.864012, MAPE: 1,879.986611
[3,000] MAE: 277.286153, MSE: 15,400,176.912274, RMSE: 3,924.305915, MAPE: 1,311.33675
[4,000] MAE: 228.175123, MSE: 11,553,093.91782, RMSE: 3,398.984248, MAPE: 1,023.402274
[5,000] MAE: 198.79005, MSE: 9,244,948.59481, RMSE: 3,040.550706, MAPE: 843.377409
[6,000] MAE: 179.532748, MSE: 7,706,202.294215, RMSE: 2,776.004736, MAPE: 726.15731
[7,000] MAE: 165.79017, MSE: 6,607,121.644385, RMSE: 2,570.43219, MAPE: 647.345553
[8,000] MAE: 154.355256, MSE: 5,782,527.79314, RMSE: 2,404.68871, MAPE: 590.274773
[9,000] MAE: 144.412695, MSE: 5,140,926.032258, RMSE: 2,267.361028, MAPE: 549.016392
[10,000] MAE: 138.396775, MSE: 4,628,152.295476, RMSE: 2,151.314086, MAPE: 528.559934
[11,000] MAE: 137.222862, MSE: 4,209,863.137087, RMSE: 2,051.795101, MAPE: 513.552669
[12,000] MAE: 136.258552, MSE: 3,861,365.169476, RMSE: 1,965

MAE: 132.359553, MSE: 2,675,088.190485, RMSE: 1,635.569684, MAPE: 397.981011

## KNNRegressor
---

[KNNRegressor](https://riverml.xyz/0.10.1/api/neighbors/KNNRegressor/) is the KNN adaptation for regression tasks

In [18]:
from river.neighbors import KNNRegressor

model = (StandardScaler() |
        KNNRegressor(window_size=1000))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 31.739783, MSE: 2,028.1698, RMSE: 45.035206, MAPE: 225.183307
[2,000] MAE: 37.879492, MSE: 2,891.47506, RMSE: 53.772438, MAPE: 249.152532
[3,000] MAE: 46.324594, MSE: 4,634.606493, RMSE: 68.077944, MAPE: 235.640988
[4,000] MAE: 50.804996, MSE: 5,602.46592, RMSE: 74.849622, MAPE: 210.876861
[5,000] MAE: 54.460437, MSE: 6,454.568712, RMSE: 80.340331, MAPE: 189.168704
[6,000] MAE: 56.444697, MSE: 6,929.107473, RMSE: 83.241261, MAPE: 175.824021
[7,000] MAE: 58.613655, MSE: 7,455.649051, RMSE: 86.3461, MAPE: 167.263411
[8,000] MAE: 58.880998, MSE: 7,510.97703, RMSE: 86.665893, MAPE: 161.388982
[9,000] MAE: 59.053509, MSE: 7,535.757209, RMSE: 86.808739, MAPE: 166.152635
[10,000] MAE: 60.668418, MSE: 7,943.33842, RMSE: 89.125408, MAPE: 178.425456
[11,000] MAE: 64.124217, MSE: 9,037.029738, RMSE: 95.063293, MAPE: 182.715363
[12,000] MAE: 67.172315, MSE: 10,018.612197, RMSE: 100.093018, MAPE: 185.660626
[13,000] MAE: 69.249937, MSE: 10,744.05968, RMSE: 103.653556, MAPE: 179.091209


MAE: 75.776292, MSE: 13,010.496556, RMSE: 114.063564, MAPE: 166.858389

## AMRules
---


[AMRules](https://riverml.xyz/0.10.1/api/rules/AMRules/) is a rule-based algorithm for incremental regression tasks. AMRules relies on the Hoeffding bound to build its rule set, similarly to Hoeffding Trees. The Variance-Ratio heuristic is used to evaluate rules' splits. Moreover, this rule-based regressor has additional capacities not usually found in decision trees.
As **prediction type**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [7]:
from river.rules import AMRules

model = (StandardScaler() |
        AMRules(
            delta=0.01,
            n_min=100,
            drift_detector=ADWIN(),
            pred_type='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

[1,000] MAE: 32.590182, MSE: 1,947.654555, RMSE: 44.132239, MAPE: 285.244251
[2,000] MAE: 44.871019, MSE: 50,335.508206, RMSE: 224.355763, MAPE: 415.380055
[3,000] MAE: 54.678233, MSE: 37,073.771243, RMSE: 192.545504, MAPE: 386.155082
[4,000] MAE: 61.689957, MSE: 31,184.689506, RMSE: 176.591873, MAPE: 336.298804
[5,000] MAE: 67.978598, MSE: 27,848.749597, RMSE: 166.879446, MAPE: 329.20018
[6,000] MAE: 70.383538, MSE: 25,124.863109, RMSE: 158.508243, MAPE: 325.627602
[7,000] MAE: 70.580298, MSE: 22,914.114938, RMSE: 151.374089, MAPE: 300.184356
[8,000] MAE: 68.929821, MSE: 20,870.464011, RMSE: 144.466134, MAPE: 276.432837
[9,000] MAE: 98.127483, MSE: 446,501.117746, RMSE: 668.207391, MAPE: 392.244574
[10,000] MAE: 99.000328, MSE: 403,457.031062, RMSE: 635.182675, MAPE: 449.78846
[11,000] MAE: 98.971592, MSE: 368,286.238412, RMSE: 606.865915, MAPE: 442.950483
[12,000] MAE: 99.312243, MSE: 339,237.45977, RMSE: 582.44095, MAPE: 422.282343
[13,000] MAE: 99.746414, MSE: 314,686.435329, RMSE:

MAE: 104.968841, MSE: 242,158.323115, RMSE: 492.095847, MAPE: 352.943342

## HoeffdingTreeRegressor
---
[HoeffdingTreeRegressor](https://riverml.xyz/0.10.1/api/tree/HoeffdingTreeRegressor/) is the HT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [8]:
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        HoeffdingTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,            
            nominal_attributes=['season','yr','workingday','weathersit'],           
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 27.29901, MSE: 1,559.328408, RMSE: 39.488333, MAPE: 125.236207
[2,000] MAE: 30.468168, MSE: 2,049.217485, RMSE: 45.268283, MAPE: 104.30825
[3,000] MAE: 37.571582, MSE: 3,413.954358, RMSE: 58.429054, MAPE: 90.340989
[4,000] MAE: 43.010506, MSE: 4,560.455028, RMSE: 67.531141, MAPE: 81.252637
[5,000] MAE: 46.193905, MSE: 5,225.581332, RMSE: 72.288183, MAPE: 75.352576
[6,000] MAE: 47.445984, MSE: 5,422.90597, RMSE: 73.640383, MAPE: 73.637629
[7,000] MAE: 48.851862, MSE: 5,692.930826, RMSE: 75.451513, MAPE: 72.333929
[8,000] MAE: 49.042905, MSE: 5,718.460227, RMSE: 75.620501, MAPE: 72.977934
[9,000] MAE: 50.076157, MSE: 7,030.353824, RMSE: 83.847205, MAPE: 79.652527
[10,000] MAE: 50.912125, MSE: 7,060.652796, RMSE: 84.027691, MAPE: 83.595658
[11,000] MAE: 54.014824, MSE: 8,076.920767, RMSE: 89.871691, MAPE: 83.052694
[12,000] MAE: 57.837782, MSE: 9,254.905104, RMSE: 96.202417, MAPE: 82.34025
[13,000] MAE: 60.724616, MSE: 10,137.81184, RMSE: 100.686701, MAPE: 79.583618
[14,000] 

MAE: 68.229484, MSE: 12,735.514685, RMSE: 112.851738, MAPE: 78.697376

## HoeffdingAdaptiveTreeRegressor
---
[HoeffdingAdaptiveTreeRegressor](https://riverml.xyz/0.10.1/api/tree/HoeffdingAdaptiveTreeRegressor/) is the HAT adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [9]:
from river.tree import HoeffdingAdaptiveTreeRegressor

model = (StandardScaler() |
        HoeffdingAdaptiveTreeRegressor(
            grace_period=100,
            leaf_prediction='adaptive',
            model_selector_decay=0.9,
            seed=1,            
            nominal_attributes=['season','yr','workingday','weathersit']
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 25.202601, MSE: 1,374.689305, RMSE: 37.076803, MAPE: 133.395034
[2,000] MAE: 27.202401, MSE: 2,109.47032, RMSE: 45.92897, MAPE: 110.968447
[3,000] MAE: 35.12362, MSE: 3,486.829491, RMSE: 59.049382, MAPE: 101.016207
[4,000] MAE: 45.017906, MSE: 5,331.458947, RMSE: 73.01684, MAPE: 104.575141
[5,000] MAE: 49.826018, MSE: 6,215.198049, RMSE: 78.836527, MAPE: 98.00587
[6,000] MAE: 52.383747, MSE: 6,718.679383, RMSE: 81.967551, MAPE: 94.731297
[7,000] MAE: 54.121729, MSE: 7,007.600091, RMSE: 83.71141, MAPE: 90.821663
[8,000] MAE: 54.35347, MSE: 7,032.944518, RMSE: 83.862653, MAPE: 88.816309
[9,000] MAE: 56.149199, MSE: 9,929.448866, RMSE: 99.64662, MAPE: 96.226089
[10,000] MAE: 58.360644, MSE: 10,116.438989, RMSE: 100.58051, MAPE: 101.720224
[11,000] MAE: 62.242584, MSE: 11,190.768373, RMSE: 105.786428, MAPE: 105.434505
[12,000] MAE: 64.264621, MSE: 11,540.118874, RMSE: 107.424945, MAPE: 105.317253
[13,000] MAE: 65.723855, MSE: 11,796.184878, RMSE: 108.610243, MAPE: 101.981091
[

MAE: 67.402293, MSE: 12,008.804075, RMSE: 109.584689, MAPE: 93.153622

## AdaptiveRandomForestRegressor
---
[AdaptiveRandomForestRegressor](https://riverml.xyz/0.10.1/api/ensemble/AdaptiveRandomForestRegressor/) is the ARF adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [10]:
from river.ensemble import AdaptiveRandomForestRegressor

model = (StandardScaler() |
        AdaptiveRandomForestRegressor(
            n_models=10,
            seed=1,
            model_selector_decay=0.9,           
            nominal_attributes=['season','yr','workingday','weathersit'],
            leaf_prediction='adaptive'
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features.drop('season')], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 28.745496, MSE: 1,772.284532, RMSE: 42.09851, MAPE: 263.339068
[2,000] MAE: 34.30199, MSE: 2,510.463593, RMSE: 50.104527, MAPE: 242.55457
[3,000] MAE: 44.180038, MSE: 4,773.66668, RMSE: 69.091727, MAPE: 202.607483
[4,000] MAE: 52.063998, MSE: 6,240.948547, RMSE: 78.999674, MAPE: 195.416226
[5,000] MAE: 57.030373, MSE: 7,001.615735, RMSE: 83.675658, MAPE: 183.049899
[6,000] MAE: 59.326317, MSE: 7,452.264176, RMSE: 86.326498, MAPE: 176.489962
[7,000] MAE: 60.45756, MSE: 7,716.880211, RMSE: 87.845775, MAPE: 170.07653
[8,000] MAE: 60.6858, MSE: 7,692.328044, RMSE: 87.705918, MAPE: 172.308509
[9,000] MAE: 60.495537, MSE: 7,656.192451, RMSE: 87.499671, MAPE: 179.018958
[10,000] MAE: 61.838088, MSE: 7,959.058961, RMSE: 89.213558, MAPE: 194.887916
[11,000] MAE: 65.598719, MSE: 9,219.405949, RMSE: 96.017738, MAPE: 201.645975
[12,000] MAE: 68.62497, MSE: 10,072.777604, RMSE: 100.363228, MAPE: 201.383015
[13,000] MAE: 71.627585, MSE: 10,976.59257, RMSE: 104.769235, MAPE: 194.469381
[

MAE: 79.04681, MSE: 13,567.759837, RMSE: 116.480727, MAPE: 175.556848

## SRPRegressor
---
[SRPRegressor](https://riverml.xyz/0.10.1/api/ensemble/SRPRegressor/) is the SRP adaptation for regression tasks. As **leaf prediction**, let's use *adaptive*, that dynamically selects between "mean" and "Linear Regression" for each incoming example.

In [11]:
from river.ensemble import SRPRegressor
from river.tree import HoeffdingTreeRegressor

model = (StandardScaler() |
        SRPRegressor(
            n_models=10,
            seed=1,
            drift_detector=ADWIN(delta=0.001),
            warning_detector=ADWIN(delta=0.01),
            model = HoeffdingTreeRegressor(
                grace_period=100,
                leaf_prediction='adaptive',
                model_selector_decay=0.9,                
                nominal_attributes=['season','yr','workingday','weathersit']
            )            
        ))
metrics = Metrics(metrics=[MAE(),MSE(),RMSE(),MAPE()])
stream = iter_pandas(X=data[features], y=data['cnt'])

progressive_val_score(dataset=stream, 
                      model=model, 
                      metric=metrics, 
                      print_every=1000)

[1,000] MAE: 24.848872, MSE: 1,319.556808, RMSE: 36.325705, MAPE: 207.739665
[2,000] MAE: 28.781345, MSE: 1,783.193626, RMSE: 42.227877, MAPE: 195.408213
[3,000] MAE: 35.216968, MSE: 2,887.231078, RMSE: 53.732961, MAPE: 166.799959
[4,000] MAE: 41.23148, MSE: 3,942.859123, RMSE: 62.79219, MAPE: 151.216253
[5,000] MAE: 43.933795, MSE: 4,325.948176, RMSE: 65.771941, MAPE: 139.389822
[6,000] MAE: 45.693313, MSE: 4,645.891973, RMSE: 68.16078, MAPE: 131.383307
[7,000] MAE: 47.018871, MSE: 4,844.525095, RMSE: 69.602623, MAPE: 127.87677
[8,000] MAE: 47.714774, MSE: 4,948.561424, RMSE: 70.346012, MAPE: 126.797352
[9,000] MAE: 47.972327, MSE: 4,993.645842, RMSE: 70.665733, MAPE: 133.727811
[10,000] MAE: 48.787203, MSE: 5,160.183908, RMSE: 71.83442, MAPE: 145.140938
[11,000] MAE: 51.728682, MSE: 5,897.840229, RMSE: 76.797397, MAPE: 151.121696
[12,000] MAE: 54.078749, MSE: 6,502.903993, RMSE: 80.640585, MAPE: 151.487107
[13,000] MAE: 55.925903, MSE: 6,964.978038, RMSE: 83.456444, MAPE: 146.264078


MAE: 60.974581, MSE: 8,372.197939, RMSE: 91.499716, MAPE: 132.351003

#### To be noticed that SRPRegressor, using Global Random Subspaces, achieves best performance and takes less time to execute w.r.t. ARFRegressor 