# Session 13 - Measuring environmental impact

In this session, we're going to look at one particular way that we can measure the impact of our code on the world around us. In particular, we're going to be looking at how we can approximate the *environmental impact* of our cultural data science footprint.

To do this, we're going to use the open-source software package *CodeCarbon*. You can find more information at the following links:

- CodeCarbon Website: [https://codecarbon.io/](https://codecarbon.io/)
- GitHub Repo: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)
- Documentation: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)

We will do comparison of methods for Sentiment analysis, namely logistic regression, a simple neural network and a HuggingFace transformer.

## Method 1 - Creating a tracker object

In [1]:
from codecarbon import EmissionsTracker

tracker = EmissionsTracker(project_name="sum_tracker_object")
tracker.start()
sum(1 for _ in range(1_000_000_000)) # run code
tracker.stop()

[codecarbon INFO @ 11:26:56] [setup] RAM Tracking...
[codecarbon INFO @ 11:26:56] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at /sys/class/powercap/intel-rapl/subsystem to measure CPU

[codecarbon INFO @ 11:26:58] CPU Model on constant consumption mode: AMD EPYC 7702 64-Core Processor
[codecarbon INFO @ 11:26:58] [setup] GPU Tracking...
[codecarbon INFO @ 11:26:58] No GPU found.
[codecarbon INFO @ 11:26:58] The below tracking methods have been set up:
                RAM Tracking Method: RAM power estimation model
                CPU Tracking Method: cpu_load
                GPU Tracking Method: Unspecified
            
[codecarbon INFO @ 11:26:58] >>> Tracker's metadata:
[codecarbon INFO @ 11:26:58]   Platform system: Linux-5.15.0-131-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 11:26:58]   Python version: 3.12.3
[codecarbon INFO @ 11:26:58]   CodeCarbon version: 3.0.1
[codecarbon INFO @ 11:26:58]   Available RAM : 1007.294 GB
[codecarbon INFO @ 11:

0.0001541341602596955

## Method 2 - Context manager

In [3]:
with EmissionsTracker(project_name="sum_context_manager") as tracker:
    sum(1 for _ in range(1_000_000_000)) # run code

[codecarbon INFO @ 11:24:57] [setup] RAM Tracking...
[codecarbon INFO @ 11:24:57] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon INFO @ 11:24:58] CPU Model on constant consumption mode: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:24:58] [setup] GPU Tracking...
[codecarbon INFO @ 11:24:58] No GPU found.
[codecarbon INFO @ 11:24:58] >>> Tracker's metadata:
[codecarbon INFO @ 11:24:58]   Platform system: Linux-6.8.0-59-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 11:24:58]   Python version: 3.12.3
[codecarbon INFO @ 11:24:58]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 11:24:58]   Available RAM : 14.400 GB
[codecarbon INFO @ 11:24:58]   CPU count: 16
[codecarbon INFO @ 11:24:58]   CPU model: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:24:58]   GPU count: None
[codecarbon INFO @ 11:24:58]   GPU model: None
[codecarbon INFO @ 11:25:01] Saving emissions d

## Method 3 - A Python decoractor



In [4]:
from codecarbon import track_emissions

@track_emissions(project_name="sum_decorator")
def foo():
    sum(1 for _ in range(1_000_000_000)) # run code

In [5]:
foo()

[codecarbon INFO @ 11:25:24] [setup] RAM Tracking...
[codecarbon INFO @ 11:25:24] [setup] CPU Tracking...
 Linux OS detected: Please ensure RAPL files exist at \sys\class\powercap\intel-rapl to measure CPU

[codecarbon INFO @ 11:25:25] CPU Model on constant consumption mode: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:25:25] [setup] GPU Tracking...
[codecarbon INFO @ 11:25:25] No GPU found.
[codecarbon INFO @ 11:25:25] >>> Tracker's metadata:
[codecarbon INFO @ 11:25:25]   Platform system: Linux-6.8.0-59-generic-x86_64-with-glibc2.39
[codecarbon INFO @ 11:25:25]   Python version: 3.12.3
[codecarbon INFO @ 11:25:25]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 11:25:25]   Available RAM : 14.400 GB
[codecarbon INFO @ 11:25:25]   CPU count: 16
[codecarbon INFO @ 11:25:25]   CPU model: AMD Ryzen 7 PRO 6850U with Radeon Graphics
[codecarbon INFO @ 11:25:25]   GPU count: None
[codecarbon INFO @ 11:25:25]   GPU model: None
[codecarbon INFO @ 11:25:29] Saving emissions d

## Inspecting output
Codecarbon creates a default file which we can inspect.

In [2]:
import pandas as pd
pd.read_csv("emissions.csv")

Unnamed: 0,timestamp,project_name,run_id,experiment_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,...,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
0,2025-05-12T11:27:36,sum_tracker_object,9c1481d6-1b43-4a91-8900-8b8c4f7e48e0,5b0fa12a-3dd7-45bb-9766-cc326314d9f1,35.016761,0.000154,4e-06,40.0,0.0,70.0,...,256,AMD EPYC 7702 64-Core Processor,,,9.9906,57.0288,1007.293625,machine,N,1.0


## Exercise
Compare different methods of sentiment analysis of IMDB movie reviews.
1. Locate and load the IMDB dataset on UCloud.
2. Train and test a logistic regression classifier. See notebook from class 4.
3. Train and test a neural network classifier. See notebook from class 5.
4. Test a HuggingFace transformer model for sentiment analysis. See notebook from class 11. NOTE: The input can be a bit too long for the transformer. You can truncate input like this: `pipeline(..., truncation=True)`.

Questions:
- What is the training cost of the two more "traditional" classifiers?
- What is the relative cost of inference between the classifiers?
- What is the trade-off between cost and performance?

NOTE: CodeCarbon offers more functionality than shown above, including splitting into subtasks and directing output to specific files. Feel free to experiment.

In [3]:
# track data loading

import pandas as pd

data = pd.read_csv("/work/data/imdb/IMDB Dataset.csv")

In [4]:
data.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [None]:
# track training and inference of logistic regression, possibly with different configurations

In [None]:
# track training and inference of a neural network, possibly with different configurations

In [None]:
# track inference of a transformer model for sentiment analysis, possibly also alternative models

In [11]:
# Bonus question if you finish quickly: What about generative models? You can start with generative models from HuggingFace (GPT-2, T5, etc.). You can also try to set up small LLMs like llama3 via llama_cpp. You can also use ollama, but it is less clear whether codecarbon catches it all when it via API and not directly in Python code.