<a href="https://colab.research.google.com/github/alemolteni/codecarbon_project/blob/main/3_0_Stream_Classification_codecarbon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stream Classification
---

## `NEWeather` dataset

**Description:** The National Oceanic and Atmospheric Administration (NOAA),
has compiled a database of weather measurements from over 7,000 weather
stations worldwide. Records date back to the mid-1900s. Daily measurements
include a variety of features (temperature, pressure, wind speed, etc.) as
well as a series of indicators for precipitation and other weather-related
events. The `NEweather` dataset contains data from this database, specifically
from the Offutt Air Force Base in Bellevue, Nebraska ranging for over 50 years
(1949-1999).

**Features:** 8 Daily weather measurements

|       Attribute      | Description |
|:--------------------:|:-----------------------------|
| `temp`                   | Temperature
| `dew_pnt`                | Dew Point
| `sea_lvl_press`          | Sea Level Pressure
| `visibility`             | Visibility
| `avg_wind_spd`           | Average Wind Speed
| `max_sustained_wind_spd` | Maximum Sustained Wind Speed
| `max_temp`               | Maximum Temperature
| `min_temp`               | Minimum Temperature


**Class:** `rain` | 0: no rain, 1: rain

**Samples:** 18,159


In [1]:
from google.colab import drive
drive.mount('/gdrive')
!cp "/gdrive/My Drive/CodeCarbon/datasets/NEweather.csv" /content
!cp "/gdrive/My Drive/CodeCarbon/datasets/agr_a_20k.csv" /content

Mounted at /gdrive


In [2]:
!pip install river==0.7

Collecting river==0.7
  Downloading river-0.7.0.tar.gz (845 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/846.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m846.0/846.0 kB[0m [31m38.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: river
  Building wheel for river (setup.py) ... [?25l[?25hdone
  Created wheel for river: filename=river-0.7.0-cp310-cp310-linux_x86_64.whl size=2359771 sha256=d3f5bb3b265b96d3323283d65426fe80c97d3d59021d766b1130d5ec1337fd8b
  Stored in directory: /root/.cache/pip/wheels/71/e9/7e/105173d51ebb5262f6f2dc4f6a5003ec86365255e8fd989733
Successfully built river
Installing collected packages: river
Successfully installed river-0.7.0


In [3]:
!pip install codecarbon

Collecting codecarbon
  Downloading codecarbon-2.2.4-py3-none-any.whl (176 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m176.0/176.0 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting arrow (from codecarbon)
  Downloading arrow-1.2.3-py3-none-any.whl (66 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.4/66.4 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
Collecting pynvml (from codecarbon)
  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Collecting fuzzywuzzy (from codecarbon)
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy, pynvml, arrow, codecarbon
Successfully installed arrow-1.2.3 codecarbon-2.2.4 fuzzywuzzy-0.18.0 pynvml-11.5.0


In [4]:
import pandas as pd
from river.stream import iter_pandas
from river.metrics import Metrics,Accuracy,BalancedAccuracy,CohenKappa,GeometricMean
from river.evaluate import progressive_val_score
from codecarbon import EmissionsTracker

In [5]:
data = pd.read_csv("NEweather.csv")
features = data.columns[:-1]

In this example, we load the data from a csv file with `pandas.read_csv`, and we use the [iter_pandas](https://riverml.xyz/latest/api/stream/iter-pandas/) utility method to iterate over the `DataFrame`.

## Naïve Bayes
---
[GaussianNB](https://riverml.xyz/latest/api/naive-bayes/GaussianNB/) maintains a Gaussian distribution $G_{cf}$ is maintained for each class $c$ and each feature $f$. Each Gaussian is updated using the amount associated with each feature; the details can be be found in proba.Gaussian. The joint log-likelihood is then obtained by summing the log probabilities of each feature associated with each class.

In [6]:
from river.naive_bayes import GaussianNB

tracker = EmissionsTracker()
tracker.start()

model = GaussianNB()
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['rain'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()


[codecarbon INFO @ 20:13:51] [setup] RAM Tracking...
[codecarbon INFO @ 20:13:51] [setup] GPU Tracking...
[codecarbon INFO @ 20:13:51] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:13:51] [setup] CPU Tracking...
[codecarbon INFO @ 20:13:52] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:13:52] >>> Tracker's metadata:
[codecarbon INFO @ 20:13:52]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:13:52]   Python version: 3.10.12
[codecarbon INFO @ 20:13:52]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:13:52]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:13:52]   CPU count: 2
[codecarbon INFO @ 20:13:52]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:13:52]   GPU count: 1
[codecarbon INFO @ 20:13:52]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 71.27%
[2,000] Accuracy: 69.88%
[3,000] Accuracy: 68.99%
[4,000] Accuracy: 68.82%
[5,000] Accuracy: 69.09%
[6,000] Accuracy: 69.13%
[7,000] Accuracy: 69.15%
[8,000] Accuracy: 68.50%
[9,000] Accuracy: 68.65%
[10,000] Accuracy: 69.04%
[11,000] Accuracy: 69.52%
[12,000] Accuracy: 69.74%
[13,000] Accuracy: 69.79%
[14,000] Accuracy: 69.88%
[15,000] Accuracy: 70.14%
[16,000] Accuracy: 70.05%
[17,000] Accuracy: 69.70%


[codecarbon INFO @ 20:13:58] Energy consumed for RAM : 0.000007 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:13:58] Energy consumed for all GPUs : 0.000015 kWh. Total GPU Power : 10.754 W
[codecarbon INFO @ 20:13:58] Energy consumed for all CPUs : 0.000060 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:13:58] 0.000082 kWh of electricity used since the beginning.


[18,000] Accuracy: 69.36%


1.940630047744291e-07

## K-Nearest Neighbors
---
[KNN](https://riverml.xyz/latest/api/neighbors/KNNClassifier/) is a non-parametric classification method that keeps track of the last window_size training samples. The predicted class-label for a given query sample is obtained in two steps:

- Find the closest n_neighbors to the query sample in the data window.
- Aggregate the class-labels of the n_neighbors to define the predicted class for the query sample.

In [7]:
from river.neighbors import KNNClassifier

tracker = EmissionsTracker()
tracker.start()

model = KNNClassifier(n_neighbors=5, window_size=1000)
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['rain'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:13:58] [setup] RAM Tracking...
[codecarbon INFO @ 20:13:58] [setup] GPU Tracking...
[codecarbon INFO @ 20:13:58] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:13:58] [setup] CPU Tracking...
[codecarbon INFO @ 20:13:59] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:13:59] >>> Tracker's metadata:
[codecarbon INFO @ 20:13:59]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:13:59]   Python version: 3.10.12
[codecarbon INFO @ 20:13:59]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:13:59]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:13:59]   CPU count: 2
[codecarbon INFO @ 20:13:59]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:13:59]   GPU count: 1
[codecarbon INFO @ 20:13:59]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 77.18%
[2,000] Accuracy: 78.34%
[3,000] Accuracy: 78.86%
[4,000] Accuracy: 78.29%
[5,000] Accuracy: 78.06%
[6,000] Accuracy: 77.95%
[7,000] Accuracy: 78.24%
[8,000] Accuracy: 77.96%
[9,000] Accuracy: 78.12%
[10,000] Accuracy: 78.16%
[11,000] Accuracy: 78.35%
[12,000] Accuracy: 78.47%
[13,000] Accuracy: 78.36%


[codecarbon INFO @ 20:14:14] Energy consumed for RAM : 0.000020 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:14] Energy consumed for all GPUs : 0.000043 kWh. Total GPU Power : 10.357000000000001 W
[codecarbon INFO @ 20:14:14] Energy consumed for all CPUs : 0.000177 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:14] 0.000240 kWh of electricity used since the beginning.


[14,000] Accuracy: 78.26%
[15,000] Accuracy: 78.36%
[16,000] Accuracy: 78.24%
[17,000] Accuracy: 78.10%


[codecarbon INFO @ 20:14:17] Energy consumed for RAM : 0.000023 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:17] Energy consumed for all GPUs : 0.000050 kWh. Total GPU Power : 10.456 W
[codecarbon INFO @ 20:14:17] Energy consumed for all CPUs : 0.000205 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:17] 0.000278 kWh of electricity used since the beginning.


[18,000] Accuracy: 77.90%


6.61735034072963e-07

## Hoeffding Tree
---

[Hoeffding Tree](https://riverml.xyz/latest/api/tree/HoeffdingTreeClassifier/)

Tree-based models are popular due to their interpretability. They use a tree data structure to model the data. When a sample arrives, it traverses the tree until it reaches a leaf node. Internal nodes define the path for a data sample based on the values of its features. Leaf nodes are models that provide predictions for unlabeled-samples and can update their internal state using the labels from labeled samples.

In [8]:
from river.tree import HoeffdingTreeClassifier

tracker = EmissionsTracker()
tracker.start()

model = HoeffdingTreeClassifier()
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['rain'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:14:17] [setup] RAM Tracking...
[codecarbon INFO @ 20:14:17] [setup] GPU Tracking...
[codecarbon INFO @ 20:14:17] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:14:17] [setup] CPU Tracking...
[codecarbon INFO @ 20:14:18] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:18] >>> Tracker's metadata:
[codecarbon INFO @ 20:14:18]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:14:18]   Python version: 3.10.12
[codecarbon INFO @ 20:14:18]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:14:18]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:14:18]   CPU count: 2
[codecarbon INFO @ 20:14:18]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:18]   GPU count: 1
[codecarbon INFO @ 20:14:18]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 70.87%
[2,000] Accuracy: 69.73%
[3,000] Accuracy: 70.89%
[4,000] Accuracy: 71.29%
[5,000] Accuracy: 71.79%
[6,000] Accuracy: 72.13%
[7,000] Accuracy: 72.82%
[8,000] Accuracy: 72.58%
[9,000] Accuracy: 72.80%
[10,000] Accuracy: 72.85%
[11,000] Accuracy: 73.30%
[12,000] Accuracy: 73.55%
[13,000] Accuracy: 73.80%
[14,000] Accuracy: 73.73%
[15,000] Accuracy: 73.99%
[16,000] Accuracy: 74.03%
[17,000] Accuracy: 73.93%


[codecarbon INFO @ 20:14:23] Energy consumed for RAM : 0.000006 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:23] Energy consumed for all GPUs : 0.000013 kWh. Total GPU Power : 10.456 W
[codecarbon INFO @ 20:14:23] Energy consumed for all CPUs : 0.000052 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:23] 0.000070 kWh of electricity used since the beginning.


[18,000] Accuracy: 73.58%


1.6725079984596167e-07

## Hoeffding Adaptive Tree
---
The [HAT](https://riverml.xyz/latest/api/tree/HoeffdingAdaptiveTreeClassifier/) model uses `ADWIN` to detect changes. If change is detected in a given branch, an alternate branch is created and eventually replaces the original branch if it shows better performance on new data.

In [9]:
from river.tree import HoeffdingAdaptiveTreeClassifier

tracker = EmissionsTracker()
tracker.start()

model = HoeffdingAdaptiveTreeClassifier(seed=42)
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['rain'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:14:23] [setup] RAM Tracking...
[codecarbon INFO @ 20:14:23] [setup] GPU Tracking...
[codecarbon INFO @ 20:14:23] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:14:23] [setup] CPU Tracking...
[codecarbon INFO @ 20:14:25] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:25] >>> Tracker's metadata:
[codecarbon INFO @ 20:14:25]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:14:25]   Python version: 3.10.12
[codecarbon INFO @ 20:14:25]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:14:25]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:14:25]   CPU count: 2
[codecarbon INFO @ 20:14:25]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:25]   GPU count: 1
[codecarbon INFO @ 20:14:25]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 68.37%
[2,000] Accuracy: 69.48%
[3,000] Accuracy: 71.09%
[4,000] Accuracy: 72.02%
[5,000] Accuracy: 72.85%
[6,000] Accuracy: 73.33%
[7,000] Accuracy: 73.91%
[8,000] Accuracy: 73.51%
[9,000] Accuracy: 73.81%
[10,000] Accuracy: 73.85%
[11,000] Accuracy: 74.03%
[12,000] Accuracy: 74.16%
[13,000] Accuracy: 74.14%
[14,000] Accuracy: 73.96%
[15,000] Accuracy: 74.28%
[16,000] Accuracy: 74.34%


[codecarbon INFO @ 20:14:32] Energy consumed for RAM : 0.000009 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:32] Energy consumed for all GPUs : 0.000021 kWh. Total GPU Power : 10.456 W
[codecarbon INFO @ 20:14:32] Energy consumed for all CPUs : 0.000084 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:32] 0.000114 kWh of electricity used since the beginning.


[17,000] Accuracy: 74.12%
[18,000] Accuracy: 73.60%


2.7102453901552257e-07

## Concept Drift Impact

Concept drift can negatively impact learning methods if not properly handled. Multiple real-world applications suffer **model degradation** as the models can not adapt to changes in the data.

---
## `AGRAWAL` dataset

We will load the data from a csv file. The data was generated using the `AGRAWAL` data generator with 3 **gradual drifts** at the 5k, 10k, and 15k marks. It contains 9 features, 6 numeric and 3 categorical.

There are 10 functions for generating binary class labels from the features. These functions determine whether a **loan** should be approved.

| Feature    | Description            | Values                                                                |
|------------|------------------------|-----------------------------------------------------------------------|
| `salary`     | salary                 | uniformly distributed from 20k to 150k                                |
| `commission` | commission             | if (salary <   75k) then 0 else uniformly distributed from 10k to 75k |
| `age`        | age                    | uniformly distributed from 20 to 80                                   |
| `elevel`     | education level        | uniformly chosen from 0 to 4                                          |
| `car`        | car maker              | uniformly chosen from 1 to 20                                         |
| `zipcode`    | zip code of the town   | uniformly chosen from 0 to 8                                          |
| `hvalue`     | value of the house     | uniformly distributed from 50k x zipcode to 100k x zipcode            |
| `hyears`     | years house owned      | uniformly distributed from 1 to 30                                    |
| `loan`       | total loan amount      | uniformly distributed from 0 to 500k                                  |

**Class:** `y` | 0: no loan, 1: loan

**Samples:** 20,000

`elevel`, `car`, and `zipcode` are categorical features.

In [10]:
data = pd.read_csv("agr_a_20k.csv")
features = data.columns[:-1]

## Naïve Bayes

In [11]:
from river.naive_bayes import GaussianNB

tracker = EmissionsTracker()
tracker.start()

model = GaussianNB()
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['class'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:14:32] [setup] RAM Tracking...
[codecarbon INFO @ 20:14:32] [setup] GPU Tracking...
[codecarbon INFO @ 20:14:32] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:14:32] [setup] CPU Tracking...
[codecarbon INFO @ 20:14:34] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:34] >>> Tracker's metadata:
[codecarbon INFO @ 20:14:34]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:14:34]   Python version: 3.10.12
[codecarbon INFO @ 20:14:34]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:14:34]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:14:34]   CPU count: 2
[codecarbon INFO @ 20:14:34]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:34]   GPU count: 1
[codecarbon INFO @ 20:14:34]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 83.98%
[2,000] Accuracy: 86.29%
[3,000] Accuracy: 87.00%
[4,000] Accuracy: 87.55%
[5,000] Accuracy: 87.42%
[6,000] Accuracy: 80.50%
[7,000] Accuracy: 74.71%
[8,000] Accuracy: 70.87%
[9,000] Accuracy: 68.01%
[10,000] Accuracy: 66.25%
[11,000] Accuracy: 66.75%
[12,000] Accuracy: 67.30%
[13,000] Accuracy: 67.96%
[14,000] Accuracy: 68.74%
[15,000] Accuracy: 69.29%
[16,000] Accuracy: 68.33%
[17,000] Accuracy: 67.45%
[18,000] Accuracy: 66.90%
[19,000] Accuracy: 66.32%


[codecarbon INFO @ 20:14:40] Energy consumed for RAM : 0.000007 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:40] Energy consumed for all GPUs : 0.000016 kWh. Total GPU Power : 10.456 W
[codecarbon INFO @ 20:14:40] Energy consumed for all CPUs : 0.000066 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:40] 0.000090 kWh of electricity used since the beginning.


[20,000] Accuracy: 65.94%


2.1316725506739618e-07

## KNN with ADWIN
---

This classifier is an improvement from the regular kNN method, as it is resistant to concept drift. It uses the ADWIN change detector to decide which samples to keep and which ones to forget, and by doing so it regulates the sample window size.

In [12]:
from river.neighbors import KNNADWINClassifier
from river import compose

tracker = EmissionsTracker()
tracker.start()

model = (
    compose.Discard('elevel', 'car', 'zipcode') |
    KNNADWINClassifier(n_neighbors=5, window_size=1000)
)
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['class'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:14:40] [setup] RAM Tracking...
[codecarbon INFO @ 20:14:40] [setup] GPU Tracking...
[codecarbon INFO @ 20:14:40] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:14:40] [setup] CPU Tracking...
[codecarbon INFO @ 20:14:41] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:41] >>> Tracker's metadata:
[codecarbon INFO @ 20:14:41]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:14:41]   Python version: 3.10.12
[codecarbon INFO @ 20:14:41]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:14:41]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:14:41]   CPU count: 2
[codecarbon INFO @ 20:14:41]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:14:41]   GPU count: 1
[codecarbon INFO @ 20:14:41]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 58.16%
[2,000] Accuracy: 58.08%
[3,000] Accuracy: 58.72%
[4,000] Accuracy: 59.56%
[5,000] Accuracy: 59.99%
[6,000] Accuracy: 59.46%
[7,000] Accuracy: 60.55%
[8,000] Accuracy: 61.30%
[9,000] Accuracy: 61.98%
[10,000] Accuracy: 62.32%
[11,000] Accuracy: 61.23%
[12,000] Accuracy: 60.97%
[13,000] Accuracy: 60.88%
[14,000] Accuracy: 60.97%
[15,000] Accuracy: 61.00%
[16,000] Accuracy: 61.25%


[codecarbon INFO @ 20:14:56] Energy consumed for RAM : 0.000020 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:56] Energy consumed for all GPUs : 0.000044 kWh. Total GPU Power : 10.556000000000001 W
[codecarbon INFO @ 20:14:56] Energy consumed for all CPUs : 0.000177 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:56] 0.000241 kWh of electricity used since the beginning.


[17,000] Accuracy: 62.22%
[18,000] Accuracy: 63.09%
[19,000] Accuracy: 63.80%


[codecarbon INFO @ 20:14:59] Energy consumed for RAM : 0.000024 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:14:59] Energy consumed for all GPUs : 0.000053 kWh. Total GPU Power : 10.456 W
[codecarbon INFO @ 20:14:59] Energy consumed for all CPUs : 0.000213 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:14:59] 0.000290 kWh of electricity used since the beginning.


[20,000] Accuracy: 64.41%


6.89409982708149e-07

## Hoeffding Tree

In [13]:
from river.tree import HoeffdingTreeClassifier

tracker = EmissionsTracker()
tracker.start()

model = HoeffdingTreeClassifier(nominal_attributes=['elevel', 'car', 'zipcode'])
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['class'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:14:59] [setup] RAM Tracking...
[codecarbon INFO @ 20:14:59] [setup] GPU Tracking...
[codecarbon INFO @ 20:14:59] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:14:59] [setup] CPU Tracking...
[codecarbon INFO @ 20:15:01] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:15:01] >>> Tracker's metadata:
[codecarbon INFO @ 20:15:01]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:15:01]   Python version: 3.10.12
[codecarbon INFO @ 20:15:01]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:15:01]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:15:01]   CPU count: 2
[codecarbon INFO @ 20:15:01]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:15:01]   GPU count: 1
[codecarbon INFO @ 20:15:01]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 82.18%
[2,000] Accuracy: 82.79%
[3,000] Accuracy: 84.63%
[4,000] Accuracy: 86.27%
[5,000] Accuracy: 87.08%
[6,000] Accuracy: 80.76%
[7,000] Accuracy: 76.87%
[8,000] Accuracy: 74.67%
[9,000] Accuracy: 74.14%
[10,000] Accuracy: 74.41%
[11,000] Accuracy: 73.54%
[12,000] Accuracy: 73.48%
[13,000] Accuracy: 73.84%
[14,000] Accuracy: 74.56%
[15,000] Accuracy: 75.55%
[16,000] Accuracy: 74.16%
[17,000] Accuracy: 73.17%
[18,000] Accuracy: 72.73%
[19,000] Accuracy: 72.42%


[codecarbon INFO @ 20:15:06] Energy consumed for RAM : 0.000007 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:15:06] Energy consumed for all GPUs : 0.000015 kWh. Total GPU Power : 10.357000000000001 W
[codecarbon INFO @ 20:15:06] Energy consumed for all CPUs : 0.000062 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:15:06] 0.000085 kWh of electricity used since the beginning.


[20,000] Accuracy: 72.28%


2.0104779522657926e-07

## Hoeffding Adaptive Tree

In [14]:
from river.tree import HoeffdingAdaptiveTreeClassifier

tracker = EmissionsTracker()
tracker.start()

model = HoeffdingAdaptiveTreeClassifier(nominal_attributes=['elevel', 'car', 'zipcode'], seed=42)
metrics = Metrics(metrics=[Accuracy()])
stream = iter_pandas(X=data[features], y=data['class'])

progressive_val_score(dataset=stream,
                      model=model,
                      metric=metrics,
                      print_every=1000)

tracker.stop()

[codecarbon INFO @ 20:15:06] [setup] RAM Tracking...
[codecarbon INFO @ 20:15:06] [setup] GPU Tracking...
[codecarbon INFO @ 20:15:06] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 20:15:06] [setup] CPU Tracking...
[codecarbon INFO @ 20:15:08] CPU Model on constant consumption mode: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:15:08] >>> Tracker's metadata:
[codecarbon INFO @ 20:15:08]   Platform system: Linux-5.15.107+-x86_64-with-glibc2.31
[codecarbon INFO @ 20:15:08]   Python version: 3.10.12
[codecarbon INFO @ 20:15:08]   CodeCarbon version: 2.2.4
[codecarbon INFO @ 20:15:08]   Available RAM : 12.678 GB
[codecarbon INFO @ 20:15:08]   CPU count: 2
[codecarbon INFO @ 20:15:08]   CPU model: Intel(R) Xeon(R) CPU @ 2.20GHz
[codecarbon INFO @ 20:15:08]   GPU count: 1
[codecarbon INFO @ 20:15:08]   GPU model: 1 x Tesla T4
<head><title>500 Internal Server Error</title></head>
<body>
<center><h1>500 Internal Server Error</h1></center>
<hr><center>openresty</center>
</body>
</htm

[1,000] Accuracy: 84.38%
[2,000] Accuracy: 87.84%
[3,000] Accuracy: 89.03%
[4,000] Accuracy: 90.30%
[5,000] Accuracy: 90.74%
[6,000] Accuracy: 84.38%
[7,000] Accuracy: 81.33%
[8,000] Accuracy: 79.51%
[9,000] Accuracy: 78.25%
[10,000] Accuracy: 77.10%
[11,000] Accuracy: 75.24%
[12,000] Accuracy: 74.58%
[13,000] Accuracy: 75.38%
[14,000] Accuracy: 76.71%
[15,000] Accuracy: 77.75%
[16,000] Accuracy: 76.36%
[17,000] Accuracy: 76.38%
[18,000] Accuracy: 76.60%
[19,000] Accuracy: 76.79%


[codecarbon INFO @ 20:15:15] Energy consumed for RAM : 0.000009 kWh. RAM Power : 4.754396438598633 W
[codecarbon INFO @ 20:15:15] Energy consumed for all GPUs : 0.000020 kWh. Total GPU Power : 10.258000000000001 W
[codecarbon INFO @ 20:15:15] Energy consumed for all CPUs : 0.000081 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 20:15:15] 0.000110 kWh of electricity used since the beginning.


[20,000] Accuracy: 76.91%


2.618318779121877e-07