# Session 13 - Measuring environmental impact

In this session, we're going to look at one particular way that we can measure the impact of our code on the world around us. In particular, we're going to be looking at how we can approximate the *environmental impact* of our cultural data science footprint.

To do this, we're going to use the open-source software package *CodeCarbon*. You can find more information at the following links:

- CodeCarbon Website: [https://codecarbon.io/](https://codecarbon.io/)
- GitHub Repo: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)
- Documentation: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)

We will do comparison of methods for Sentiment analysis, namely logistic regression, a simple neural network and a HuggingFace transformer.

## Method 1 - Creating a tracker object

In [1]:
from codecarbon import EmissionsTracker

tracker = EmissionsTracker(project_name="sum_tracker_object")
tracker.start()
sum(1 for _ in range(1_000_000_000))  # run code
tracker.stop()

[codecarbon INFO @ 11:23:32] [setup] RAM Tracking...
[codecarbon INFO @ 11:23:32] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon INFO @ 11:23:33] CPU Model on constant consumption mode: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:23:33] [setup] GPU Tracking...
[codecarbon INFO @ 11:23:33] No GPU found.
[codecarbon INFO @ 11:23:33] >>> Tracker's metadata:
[codecarbon INFO @ 11:23:33]   Platform system: Linux-6.8.0-59-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 11:23:33]   Python version: 3.12.3
[codecarbon INFO @ 11:23:33]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 11:23:33]   Available RAM : 14.400 GB
[codecarbon INFO @ 11:23:33]   CPU count: 16
[codecarbon INFO @ 11:23:33]   CPU model: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:23:33]   GPU count: None
[codecarbon INFO @ 11:23:33]   GPU model: None
[codecarbon INFO @ 11:23:35] Saving emissions d

4.5250663138548104e-05

## Method 2 - Context manager

In [3]:
with EmissionsTracker(project_name="sum_context_manager") as tracker:
    sum(1 for _ in range(1_000_000_000))  # run code

[codecarbon INFO @ 11:24:57] [setup] RAM Tracking...
[codecarbon INFO @ 11:24:57] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon INFO @ 11:24:58] CPU Model on constant consumption mode: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:24:58] [setup] GPU Tracking...
[codecarbon INFO @ 11:24:58] No GPU found.
[codecarbon INFO @ 11:24:58] >>> Tracker's metadata:
[codecarbon INFO @ 11:24:58]   Platform system: Linux-6.8.0-59-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 11:24:58]   Python version: 3.12.3
[codecarbon INFO @ 11:24:58]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 11:24:58]   Available RAM : 14.400 GB
[codecarbon INFO @ 11:24:58]   CPU count: 16
[codecarbon INFO @ 11:24:58]   CPU model: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:24:58]   GPU count: None
[codecarbon INFO @ 11:24:58]   GPU model: None
[codecarbon INFO @ 11:25:01] Saving emissions d

## Method 3 - A Python decoractor



In [4]:
from codecarbon import track_emissions


@track_emissions(project_name="sum_decorator")
def foo():
    sum(1 for _ in range(1_000_000_000))  # run code

In [5]:
foo()

[codecarbon INFO @ 11:25:24] [setup] RAM Tracking...
[codecarbon INFO @ 11:25:24] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon INFO @ 11:25:25] CPU Model on constant consumption mode: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:25:25] [setup] GPU Tracking...
[codecarbon INFO @ 11:25:25] No GPU found.
[codecarbon INFO @ 11:25:25] >>> Tracker's metadata:
[codecarbon INFO @ 11:25:25]   Platform system: Linux-6.8.0-59-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 11:25:25]   Python version: 3.12.3
[codecarbon INFO @ 11:25:25]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 11:25:25]   Available RAM : 14.400 GB
[codecarbon INFO @ 11:25:25]   CPU count: 16
[codecarbon INFO @ 11:25:25]   CPU model: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:25:25]   GPU count: None
[codecarbon INFO @ 11:25:25]   GPU model: None
[codecarbon INFO @ 11:25:29] Saving emissions d

## Inspecting output
Codecarbon creates a default file which we can inspect.

In [2]:
import pandas as pd

pd.read_csv("emissions.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'emissions.csv'

## Exercise
Compare different methods of sentiment analysis of IMDB movie reviews.
1. Locate and load the IMDB dataset on UCloud.
2. Train and test a logistic regression classifier. See notebook from class 4.
3. Train and test a neural network classifier. See notebook from class 5.
4. Test a HuggingFace transformer model for sentiment analysis. See notebook from class 11. NOTE: The input can be a bit too long for the transformer. You can truncate input like this: `pipeline(..., truncation=True)`.

Questions:
- What is the training cost of the two more "traditional" classifiers?
- What is the relative cost of inference between the classifiers?
- What is the trade-off between cost and performance?

NOTE: CodeCarbon offers more functionality than shown above, including splitting into subtasks and directing output to specific files. Feel free to experiment.

In [1]:
!pip install codecarbon scikit-learn transformers torch

Defaulting to user installation because normal site-packages is not writeable


In [2]:
from sklearn.model_selection import train_test_split
import pandas as pd
from codecarbon import EmissionsTracker

# track data loading
with EmissionsTracker(project_name="data_loading"):
    data = pd.read_csv("/work/data/imdb/IMDB Dataset.csv")

    X = data["review"]
    y = data["sentiment"]

    X_train, X_test, y_train, y_test = train_test_split(X,  # texts for the model
                                                        y,  # classification labels
                                                        test_size=0.05,  # create a 95/5 split
                                                        random_state=42)  # random state for reproducibility

[codecarbon INFO @ 12:14:32] [setup] RAM Tracking...
[codecarbon INFO @ 12:14:32] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 12:14:33] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 12:14:33] [setup] GPU Tracking...
[codecarbon INFO @ 12:14:33] No GPU found.
[codecarbon INFO @ 12:14:33] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 12:14:33] >>> Tracker's metadata:
[codecarbon INFO @ 12:14:33]   Platform system: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 12:14:33]   Python version: 3.12.3
[codecarbon INFO @ 12:14:33]   CodeCarbon version: 3.0.1
[codecarbon INFO @ 12:14:33]   Available RAM : 1007.294 GB
[codecarbon INFO @ 12:

In [3]:
from sklearn.feature_extraction.text import TfidfVectorizer

# vectorization
with EmissionsTracker(project_name="vectorization"):
    vectorizer = TfidfVectorizer(ngram_range=(1, 2),  # unigrams and bigrams (1 word and 2 word units)
                                 lowercase=True,  # why use lowercase?
                                 max_df=0.95,  # remove very common words
                                 min_df=0.05,  # remove very rare words
                                 max_features=250)  # keep only top 100 features

    # first we fit to the training data...
    X_train_feats = vectorizer.fit_transform(X_train)

    #... then do it for our test data
    X_test_feats = vectorizer.transform(X_test)



[codecarbon INFO @ 12:14:38] [setup] RAM Tracking...
[codecarbon INFO @ 12:14:38] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 12:14:40] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 12:14:40] [setup] GPU Tracking...
[codecarbon INFO @ 12:14:40] No GPU found.
[codecarbon INFO @ 12:14:40] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 12:14:40] >>> Tracker's metadata:
[codecarbon INFO @ 12:14:40]   Platform system: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 12:14:40]   Python version: 3.12.3
[codecarbon INFO @ 12:14:40]   CodeCarbon version: 3.0.1
[codecarbon INFO @ 12:14:40]   Available RAM : 1007.294 GB
[codecarbon INFO @ 12:

In [4]:
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression

with EmissionsTracker(project_name="logistic_regression_training"):
    # track training and inference of logistic regression, possibly with different configurations
    classifier = LogisticRegression(random_state=42).fit(X_train_feats, y_train)

with EmissionsTracker(project_name="logistic_regression_inference"):
    y_pred = classifier.predict(X_test_feats)

    classifier_metrics = classification_report(y_test, y_pred)
    print(classifier_metrics)

[codecarbon INFO @ 12:15:09] [setup] RAM Tracking...
[codecarbon INFO @ 12:15:09] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 12:15:10] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 12:15:10] [setup] GPU Tracking...
[codecarbon INFO @ 12:15:10] No GPU found.
[codecarbon INFO @ 12:15:10] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 12:15:10] >>> Tracker's metadata:
[codecarbon INFO @ 12:15:10]   Platform system: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 12:15:10]   Python version: 3.12.3
[codecarbon INFO @ 12:15:10]   CodeCarbon version: 3.0.1
[codecarbon INFO @ 12:15:10]   Available RAM : 1007.294 GB
[codecarbon INFO @ 12:

              precision    recall  f1-score   support

    negative       0.78      0.77      0.77      1239
    positive       0.78      0.78      0.78      1261

    accuracy                           0.78      2500
   macro avg       0.78      0.78      0.78      2500
weighted avg       0.78      0.78      0.78      2500



[codecarbon INFO @ 12:15:20] Delta energy consumed for CPU with cpu_load : 0.000006 kWh, power : 40.0 W
[codecarbon INFO @ 12:15:20] Energy consumed for All CPU : 0.000006 kWh
[codecarbon INFO @ 12:15:20] 0.000016 kWh of electricity used since the beginning.


In [5]:
from sklearn.neural_network import MLPClassifier

# track training and inference of a neural network, possibly with different configurations
with EmissionsTracker(project_name="neural_network_training"):
    classifier = MLPClassifier(activation="relu",
                               hidden_layer_sizes=(20,),
                               max_iter=1000,
                               random_state=42).fit(X_train_feats, y_train)

with EmissionsTracker(project_name="neural_network_inference"):
    y_pred = classifier.predict(X_test_feats)

    classifier_metrics = classification_report(y_test, y_pred)
    print(classifier_metrics)

[codecarbon INFO @ 12:15:20] [setup] RAM Tracking...
[codecarbon INFO @ 12:15:20] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 12:15:22] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 12:15:22] [setup] GPU Tracking...
[codecarbon INFO @ 12:15:22] No GPU found.
[codecarbon INFO @ 12:15:22] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 12:15:22] >>> Tracker's metadata:
[codecarbon INFO @ 12:15:22]   Platform system: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 12:15:22]   Python version: 3.12.3
[codecarbon INFO @ 12:15:22]   CodeCarbon version: 3.0.1
[codecarbon INFO @ 12:15:22]   Available RAM : 1007.294 GB
[codecarbon INFO @ 12:

              precision    recall  f1-score   support

    negative       0.73      0.75      0.74      1239
    positive       0.75      0.73      0.74      1261

    accuracy                           0.74      2500
   macro avg       0.74      0.74      0.74      2500
weighted avg       0.74      0.74      0.74      2500



[codecarbon INFO @ 12:16:17] Delta energy consumed for CPU with cpu_load : 0.000006 kWh, power : 40.0 W
[codecarbon INFO @ 12:16:17] Energy consumed for All CPU : 0.000006 kWh
[codecarbon INFO @ 12:16:17] 0.000016 kWh of electricity used since the beginning.


In [None]:
# track inference of a transformer model for sentiment analysis, possibly also alternative models

from transformers import pipeline

with EmissionsTracker(project_name="transformer_loading"):
    classifier = pipeline(task="sentiment-analysis", truncation=True)

with EmissionsTracker(project_name="transformer_inference"):
    y_pred = classifier(X_test.to_list())

    classifier_metrics = classification_report(y_test, y_pred)


[codecarbon INFO @ 12:16:21] [setup] RAM Tracking...
[codecarbon INFO @ 12:16:21] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 12:16:22] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 12:16:22] [setup] GPU Tracking...
[codecarbon INFO @ 12:16:22] No GPU found.
[codecarbon INFO @ 12:16:22] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 12:16:22] >>> Tracker's metadata:
[codecarbon INFO @ 12:16:22]   Platform system: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 12:16:22]   Python version: 3.12.3
[codecarbon INFO @ 12:16:22]   CodeCarbon version: 3.0.1
[codecarbon INFO @ 12:16:22]   Available RAM : 1007.294 GB
[codecarbon INFO @ 12:

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu
[codecarbon INFO @ 12:16:30] Energy consumed for RAM : 0.000093 kWh. RAM Power : 70.0 W
[codecarbon INFO @ 12:16:31] Delta energy consumed for CPU with cpu_load : 0.000053 kWh, power : 40.0 W
[codecarbon INFO @ 12:16:31] Energy consumed for All CPU : 0.000053 kWh
[codecarbon INFO @ 12:16:31] 0.000146 kWh of electricity used since the beginning.
[codecarbon INFO @ 12:16:31] [setup] RAM Tracking...
[codecarbon INFO @ 12:16:31] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 12:16:32] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 12:16:32] [setup] GPU Tracking...
[codecarbon INFO @ 12:16:32] No GPU found.
[codecarbon INFO @ 12:16:32] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Meth

In [None]:
# Bonus question if you finish quickly: What about generative models? You can start with generative models from HuggingFace (GPT-2, T5, etc.). You can also try to set up small LLMs like llama3 via llama_cpp. You can also use ollama, but it is less clear whether codecarbon catches it all when it via API and not directly in Python code.

In [None]:
pd.read_csv("emissions.csv")

Unnamed: 0,timestamp,project_name,run_id,experiment_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,...,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
7,2025-05-12T11:46:54,logistic_regression_training,46d03c0d-0f03-4f47-8388-a5f0a4480307,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.467834,9.409605e-07,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
8,2025-05-12T11:46:58,logistic_regression_inference,edf24fdc-d73a-48f8-bfa5-d09fdd7a3c70,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.034353,6.680603e-08,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
9,2025-05-12T11:47:13,transformer_loading,da557ca9-e2e7-4141-8cb5-ca07dae6add0,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.834587,1.681187e-06,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
10,2025-05-12T11:47:17,transformer_inference,e2a554e7-8e41-4341-95fb-0fc837f8a657,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.003812,4.284286e-09,1e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
11,2025-05-12T11:47:34,transformer_loading,107897c6-03df-4f42-ac62-8b36218cf862,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.257213,5.167259e-07,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
12,2025-05-12T11:47:40,transformer_inference,47c351b1-6260-4fe5-9d3d-e9c4fca8ebde,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,1.028528,2.072657e-06,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
13,2025-05-12T11:48:00,transformer_loading,633d06b2-a743-40a7-a403-4ca5377fd9b6,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.220433,4.416195e-07,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
14,2025-05-12T11:48:04,transformer_inference,d0fa935b-f864-43fe-9210-6c970b1ca1ad,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.004722,6.794822e-09,1e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
15,2025-05-12T11:49:32,transformer_loading,ce7650d2-876e-499d-a6f7-bc03c68c47e3,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,0.223287,4.48076e-07,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
16,2025-05-12T11:52:17,transformer_inference,34613f49-f19e-468f-8458-d8469b3991e1,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,160.186538,0.000323191,2e-06,42.5,0.0,5.399928,...,16,AMD Ryzen 7 PRO 6850U with Radeon Graphics,,,10.1995,56.1897,14.399807,machine,N,1.0
