# CodeCarbon Tracking
This notebooks is an example of how we can track the carbon impact of our work within the AI Lab.

We have several methods dependent on what we want to track.

We use a Python Package called [CodeCarbon](https://github.com/mlco2/codecarbon) for this.

For more info on usage check their [documentation](https://mlco2.github.io/codecarbon/index.html).

## Input Parameters
For all parameters check out the documentation, but good to be aware of these ones straight away since they influence logging structure and Azure specific results:
- `project_name`: Name of the project, defaults to codecarbon
- `experiment_id`: Id of the experiment
- `pue`: Power Usage Effectiveness of the data center where the experiment is being run. For Azure west-europe this is [1.185](https://azure.microsoft.com/en-us/blog/how-microsoft-measures-datacenter-water-and-energy-use-to-improve-azure-cloud-sustainability/).  
Note that you should only use this if the code is run on an Azure compute.

### Tracking whole Notebook
To track the whole notebook we need to start the tracker at the beginning of the notebook, and stop the tracker in the end of the notebook.

We can also use this method for tracking 

In [None]:
# Codeblock where we import the EmissionsTracker and start the tracker.
# This should be placed at the beginning of the notebook if we want to track the whole notebook.
from codecarbon import EmissionsTracker
tracker = EmissionsTracker()
tracker.start()

In [2]:
# Codeblock where we stop the tracker.
# This should be placed at the end of the notebook if we want to track the whole notebook.
tracker.stop()

### Tracking multiple _small_ pieces of code individually
To track a small piece of code we use the _task manager_. 

This way CodeCarbon will track the emissions of each task. 

The task will not be written to disk to prevent overhead, you have to get the results from the return of `stop_task()`. 

If no name is provided, CodeCarbon will generate a uuid.



In [3]:
from codecarbon import EmissionsTracker
import datasets

try:
    tracker = EmissionsTracker(project_name="small_pieces", measure_power_secs=10)
    # We start a specific task using the tracker
    tracker.start_task("load dataset")
    dataset = load_dataset("imdb", split="test")
    # We have to stop individual tasks
    imdb_emissions = tracker.stop_task()
    tracker.start_task("build model")
    model = build_model()
    model_emissions = tracker.stop_task()
finally:
    # We 
    _ = tracker.stop()

Please note that you can’t use task mode and normal mode at the same time. 

Because `start_task` will stop the scheduler to prevent it interfering with the task measurement.



### Tracking a specific codeblock
To track a specific codeblock we can use the `EmissionsTracker` as a context manager.

In [None]:
from codecarbon import EmissionsTracker

with EmissionsTracker() as tracker:
    # Compute intensive training code goes here
    print()

### Tracking a function
To track code wrapped within a function, we can use the decorator `@track_emissions` within the function.

In [9]:
from codecarbon import track_emissions

@track_emissions
def training_loop():
    # Compute intensive training code goes here
    print()

## Examples
Following are examples to train a Logistic Regression model using sklearn.

### Using the Explicit Object
We can use this to simply start and stop a tracker object and track all code between start and stop.

In [4]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import time

from codecarbon import EmissionsTracker

# We create some random data to 
np.random.seed(42)  
X = np.random.rand(1000, 5)  
y = (np.sum(X, axis=1) > 2.5).astype(int) 

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a simple logistic regression model
model = LogisticRegression()

# Initialize and start the emissions tracker
tracker = EmissionsTracker('explicit_object')
tracker.start()

# Fit the model
model.fit(X_train, y_train)

# We sleep for 5 sec to actually have some results
time.sleep(5)

# Stop the emissions tracker and output the emissions data
emissions: float =  tracker.stop()
print(f"Emissions during training: {emissions:.8f} kg CO2eq")


### Using the Context Manager

In [5]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import time

from codecarbon import EmissionsTracker

# We create some random data to 
np.random.seed(42)  
X = np.random.rand(1000, 5)  
y = (np.sum(X, axis=1) > 2.5).astype(int) 

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a simple logistic regression model
model = LogisticRegression()


with EmissionsTracker('context_manager') as tracker:
    # Fit the model
    model.fit(X_train, y_train)
    # We sleep for 5 sec to actually have some results
    time.sleep(5)
    

### Using the Decorator

In [6]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

from codecarbon import track_emissions


@track_emissions(project_name="decorator")
def train_model():
    # We create some random data to train on
    np.random.seed(42)  
    X = np.random.rand(1000, 5)  
    y = (np.sum(X, axis=1) > 2.5).astype(int) 
    # Split the data into training and testing sets (80% train, 20% test)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Initialize and train a simple logistic regression model
    model = LogisticRegression()
    # We sleep for 5 sec to actually have some results
    time.sleep(5)
    return model

model = train_model()

## Visualization
CodeCarbon comes with a `Dash App` where the emissions are visualized.

To host it locally it, we execute the CLI command below:

`carbonboard --filepath="emissions.csv" --port=3333`