# Session 12 - Measuring environmental impact

In this session, we're going to look at one particular way that we can measure the impact of our code on the world around us. In particular, we're going to be looking at how we can approximate the *environmental impact* of our cultural data science footprint.

To do this, we're going to use the open-source software package *CodeCarbon*. You can find more information at the following links:

- CodeCarbon Website: [https://codecarbon.io/](https://codecarbon.io/)
- GitHub Repo: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)
- Documentation: [https://mlco2.github.io/codecarbon/](https://mlco2.github.io/codecarbon/)

We'll do some testing on HuggingFace pipelines.

## Testing HuggingFace pipelines

In [None]:
import os
from codecarbon import EmissionsTracker
from transformers import pipeline
import datasets
import pandas as pd
from tqdm.notebook import tqdm

__Text summarization pipeline__

You may remember from a couple of weeks ago that *text summarization* was quite a compute intensive task. So let's see exactly how compute intensive it is.

In [None]:
text = """In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. 
For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. 
On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. 
In the former task our best model outperforms even all previously reported ensembles."""

In [None]:
summarizer = pipeline(task="summarization", 
                      min_length=10,
                      max_length=30)

There are a number of different ways that we can work with CodeCarbon, all of which is clearly explained in the relevant documentation.

We'll go through each of them one at a time here.

## Method 1 - Creating a tracker object

In [None]:
tracker = EmissionsTracker()
tracker.start()
summary = summarizer(text)
tracker.stop()

## Method 2 - Context manager

In [None]:
with EmissionsTracker() as tracker:
    summary = summarizer(text)
    print(summary)

## Method 3 - A Python decoractor



In [None]:
from codecarbon import track_emissions

@track_emissions
def summarization(text):
    summary = summarizer(text)
    print(summary)

In [None]:
summarization(text)

## A more complex example

We can make the results more useful by changing the tracker parameters - full list can be found here [https://mlco2.github.io/codecarbon/parameters.html](https://mlco2.github.io/codecarbon/parameters.html).

In the example that follows, we're going to download a HuggingFace dataset and a pretrained emotion classification model. 

We also introduce specific *tasks* to more clearly understand the impact of different parts of our code.

In [None]:
outfolder = os.path.join("..", "emissions")
os.mkdir(outfolder)

tracker = EmissionsTracker(project_name="sentiment classification",
                           experiment_id="sentiment_classifier",
                           output_dir=outfolder,
                           output_file="emissions_sentiment.csv")

# tracking data downloading
tracker.start_task("load dataset")
dataset = datasets.load_dataset("imdb", 
                                split="test")
imdb_emissions = tracker.stop_task()

# tracking downloading and initializing model
tracker.start_task("build model")
classifier = pipeline(task="sentiment-analysis", 
                      model="cardiffnlp/twitter-roberta-base-emotion")
model_emissions = tracker.stop_task()

# tracking classification pipeline
tracker.start_task("run classification")
preds = []
for row in tqdm(dataset["text"][:1000]):
    preds.append(classifier(row[:100]))
classifier_emissions = tracker.stop_task()

tracker.stop()

__Inspecting the results__

In [None]:
emissions_df = pd.read_csv()

In [None]:
emissions_df.columns

## Tasks

- Now that you have the basics down, head over and consider Assignment 5!