### Connecting to a runtime

Before you begin, connect to a Python GPU-enabled runtime.

#### Install required dependencies

In [None]:
!apt-get update
!apt-get install -y iputils-ping
!apt-get install -y traceroute

#### Import required libraries

In [None]:
from google.colab import files
import torch
import time
import requests

#### Confirm that your runtime is GPU-supported

Observe the GPU type, its power draw and its maximum power.

In [None]:
!nvidia-smi

Next, get the CPU information.

In [None]:
!lscpu |grep 'Model name'

Search for the power range based on the CPU type. We will use this information later for estimating power consumption.

### 1. GPU cold start energy consumption

Cloud GPU nodes may be "sleeping". Powering them up wastes energy and time. Local GPUs don't have this issue.

Run the following code and observe the difference in response time.

In [None]:
t0 = time.time()
torch.cuda.is_available()
t1 = time.time()

print(f"Initialization time: {t1-t0:.5f} s")

t0 = time.time()
torch.cuda.is_available()
t1 = time.time()

print(f"Second run time: {t1-t0:.5f} s")

### 2. CPU vs. GPU performance

However, utilizing GPU can significantly reduce execution time. Here, we will compare CPU and GPU performance on a matrix multiplication task.

In [None]:
for device in ["cpu", "cuda"]:
    if device == "cuda" and not torch.cuda.is_available():
        print("GPU unavailable, skipping.")
        continue

    device = torch.device(device)
    print(f"Using device: {device}")

    x = torch.rand(5000, 5000, device=device)

    t0 = time.time()
    for i in range(10):
      x = x @ x
    t1 = time.time()

    print(f"Computation time: {t1-t0:.5f} s")

What is the difference in performance? Write down your observations.

Based on the models of CPU and GPU used, find information about their power draw and estimate the energy consumption in kWh.

### 3. Colab cloud VM location

Running processes on a cloud instance using a GPU can significantly decerease the computation time. However, the carbon emissions depend largely on the geolocation of the server where the virtual machine (VM) is running.

The geolocation of the Google Cloud VM your notebook is running on can be determined by obtaining its IP address.

Get the external IP address of the Colab server instance.

In [None]:
!curl -s https://ipinfo.io

### 4. Carbon intensity of cloud compute

First, print again the location of the Colab VM.

In [None]:
loc = requests.get("https://ipinfo.io").json()
print(f"Colab server region: {loc["country"], loc["region"]}")

Now, we will investigate the carbon intensity of the region where Colab VM is running. To do so, follow the next steps:

1. Open https://app.electricitymaps.com/settings/api-access, copy your Test API Key and paste it below. Please note that for doing so, you will need to register first.

2. Go to https://app.electricitymaps.com/map/live/fifteen_minutes and find the exact region your VM is located on.

3. Go to https://app.electricitymaps.com/developer-hub/playground and search for it under Region dropdown menu. After selection, it will appear as a zone paramerer in the Request string. Paste it in the url variable below.

In [None]:
token = f"oIQdVkbRw40zntgnx4J9"
region = loc["country"]

url = f"https://api.electricitymaps.com/v3/carbon-intensity/latest?zone=US-NW-PACW&temporalGranularity=hourly&emissionFactorType=direct"
data = requests.get(url, headers={"auth-token": token}).json()

print(f"Carbon intensity for a Colab server in {loc["country"]} is: {data['carbonIntensity']} g CO2e/kWh")

On the https://app.electricitymaps.com/map/live/fifteen_minutes and investigate from which sources the energy comes from and which sources are dominant. Compare it to your country's carbon intensity. Write down your observations.

For the previous matrix multiplication example, now calculate the carbon emissions in g CO2e.

### 5. Data transfer speed

Cloud computing that ocurrs thousands of kilometers away is also increasing latency and energy for data transfer.

Therefore, next we are going to measure the data transfer latency and energy consumtion based on the location the data is hosted on.

For measuring data transfer, Colab cannot sniff packets, but we *can* measure effective throughput.

#### 4.1. Download the MNIST dataset to your computer

Open terminal and type

```
wget https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
```

The output will give you the length of the file in bytes and the time it takes to transfer the file from a remote server to your local machine. Write it down.

#### 4.2 Upload MNIST from your computer to Colab

Now, upload mnist.npz file from your local machine to Colab and time it.

`files.upload()` prompts the user to upload files from their local machine to the rutnime. It returns a dictionary of the files which were uploaded. The dictionary is keyed by the file name and values are the data which were uploaded.

In [None]:
t0 = time.time()
uploaded = files.upload()
t1 = time.time()

print(f"Local->Colab upload time: {t1-t0:.2f} s")

#### 4.3. Download the same file from Colab back to your machine

In [None]:
files.download('/content/mnist.npz')

Download time cannot be measured, but can you observe a difference between downloading the same file from its original location using `wget` and from Colab?



4.4. Download MNIST directly to Colab

Now, download the same dataset to Colab using `wget`.
How much time this takes and what is the difference when the file is uploaded from your local machine and from another remote cloud storage?


In [None]:
!wget -O /dev/null https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz

The output will give you the length of the file in bytes and the time it takes to transfer the file from a remote cloud storage to Colab.

If you were to run a machine learning model on Colab, where would you download the data from? Does it depend on the location where the data is stored?

Now, let's try to estimate the distance to the dataset location.

First, let's use ping to roughly estimate the distance from our Google Cloud VM where Colab is running to the location of the Google Cloud Storage where our MNIST dataset is locates.

CAUTION: Ping will only tell you:
 Packets transmitted and received.

* Round-trip time (RTT)
* The IP address that responded
* Packet loss percentage
* Minimum, maximum, average, and standard deviation of the round-trip time

The IP address you get is not reliable for geolocation because Google Cloud Storage endpoints (including storage.googleapis.com) are served using anycast routing. This means that many servers around the world share the same IP address, your packets are routed to the nearest Google edge location, and the IP does not indicate the physical data center.

However, ping can help you roughly estimate proximity:

* RTT ~1–10 ms - likely the same region
* RTT ~30–80 ms - likely the same continent
* RTT >150 ms - probably cross-continent

In [None]:
!ping -c 5 www.storage.googleapis.com

What can you conclude based on the round-trip time?

Let's see how many hops exist between Google Cloud VM and Google Cloud Storage where MNIST is stored.

In [None]:
!traceroute www.storage.googleapis.com

Run the same command on your local machine. The last hop is the hop to the destination (Google Cloud Storage).

What latencies do you observe?

### 6. ML model training energy consumption

Finally, we will investigate the energy consumed when training a ML model on CPU and GPU.

We will measure the execution time and the CPU and GPU utilization and power consumption in order to estimate the carbon emissions.

Install the necessary dependencies first

In [None]:
!pip install psutil
!pip install gcsfs

Load the necessary libraries

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

import gcsfs
import gc
import json
import pandas as pd
import numpy as np
import os
import requests
import re
import psutil

from sklearn.model_selection import train_test_split, KFold
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn import metrics
from sklearn.metrics import r2_score

import subprocess
import threading

import time
from time import perf_counter as timer
from timeit import default_timer as timer

import warnings
warnings.filterwarnings("ignore")

Resource logger

Examine the class and describe what it does.

In [None]:
class ResourceLogger:
    def __init__(self, interval=0.2):
        self.interval = interval
        self.running = False
        self.data = []

    def get_gpu_stats(self):
        try:
            result = subprocess.run(
                ["nvidia-smi",
                 "--query-gpu=utilization.gpu,utilization.memory,power.draw",
                 "--format=csv,noheader,nounits"],
                stdout=subprocess.PIPE, text=True
            ).stdout.strip()

            gpu_util, mem_util, power = map(float, re.split(r",\s*", result))
            return gpu_util, mem_util, power
        except:
            return None, None, None

    def sample(self):
        while self.running:
            cpu = psutil.cpu_percent(interval=None)
            gpu_util, mem_util, power = self.get_gpu_stats()
            t = time.time()

            self.data.append({
                "timestamp": t,
                "cpu_util": cpu,
                "gpu_util": gpu_util,
                "gpu_mem": mem_util,
                "gpu_power_w": power
            })
            time.sleep(self.interval)

    def start(self):
        self.running = True
        self.thread = threading.Thread(target=self.sample)
        self.thread.start()

    def stop(self):
        self.running = False
        self.thread.join()

logger = ResourceLogger(interval=0.2)

Download and load the dataset

In [None]:
!wget -O YearPredictionMSD.txt.zip "https://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip"

In [None]:
local_url = "YearPredictionMSD.txt.zip"

print("Loading CSV...")
start = timer()
year = pd.read_csv(local_url, header=None)

load_time = timer() - start
print(f"Dataset load time: {load_time}s")

x = year.iloc[:, 1:].to_numpy(dtype=np.float32)
y = year.iloc[:, 0].to_numpy(dtype=np.float32)

del year
gc.collect()

Where do you think this dataset is located and why do you think it takes this amount of time to load it?

Preprocess and prepare the dataset

In [None]:
# Train-test split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)

# Normalization
scaler_x = MinMaxScaler()
scaler_y = StandardScaler()

scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)

scaler_y.fit(y_train.reshape(-1, 1))
y_train = scaler_y.transform(y_train.reshape(-1, 1)).ravel()
y_test = scaler_y.transform(y_test.reshape(-1, 1)).ravel()

PyTorch Linear Regression Model

In [None]:
class LinearRegressionModel(nn.Module):
    def __init__(self, in_features):
        super().__init__()
        self.linear = nn.Linear(in_features, 1, bias=True)

    def forward(self, x):
        return self.linear(x)

Training function

In [None]:
def train_lbfgs(X, y, device):
    X = torch.tensor(X, dtype=torch.float16, device=device)
    y = torch.tensor(y, dtype=torch.float16, device=device)

    model = LinearRegressionModel(X.shape[1]).to(device)
    criterion = nn.MSELoss()
    optimizer = torch.optim.LBFGS(model.parameters(), lr=0.5, max_iter=200, history_size=5)

    def closure():
        optimizer.zero_grad()
        pred = model(X)
        loss = criterion(pred, y)
        loss.backward()
        return loss

    optimizer.step(closure)
    return model

Helper scripts for energy use calculation

In [None]:
def gpu_power_to_energy(logs, power_key="gpu_power_w"):
    """
    logs: list of dicts with 'timestamp' and power_key
    Integrate power over time using piecewise-constant on logs intervals
    Returns energy_kwh, duration_s
    """
    if len(logs) < 2:
        return 0.0, 0.0

    energy_j = 0.0
    for i in range(1, len(logs)):
        t0 = logs[i-1]["timestamp"]
        t1 = logs[i]["timestamp"]
        dt = max(0.0, t1 - t0)
        p0 = logs[i-1].get(power_key)
        p1 = logs[i].get(power_key)
        # Integration approximation (trapezoidal integration)
        energy_j += 0.5 * (p0 + p1) * dt  # watts * seconds = joules

    duration = logs[-1]["timestamp"] - logs[0]["timestamp"]
    energy_kwh = energy_j / 3_600_000.0
    return energy_kwh, duration

def cpu_util_to_energy(logs, cpu_power_w):
    """
    logs: list of dicts with 'timestamp' and "cpu_util"
    Convert sampled CPU percent series to estimated energy assuming a host cpu_power_w
    Returns energy_kwh, duration_s
    """
    if len(logs) < 2:
        return 0.0, 0.0

    total_area = 0.0
    total_time = 0.0
    for i in range(1, len(logs)):
        t0 = logs[i-1]["timestamp"]
        t1 = logs[i]["timestamp"]
        dt = max(0.0, t1 - t0)
        p0 = logs[i-1].get("cpu_util") or 0.0
        p1 = logs[i].get("cpu_util") or 0.0
        # Integration approximation (trapezoidal integration)
        total_area += 0.5 * (p0 + p1) * dt
        total_time += dt

    avg_percent = (total_area / total_time) if total_time>0 else 0.0
    cpu_energy_j = cpu_power_w * (avg_percent/100.0) * total_time
    cpu_energy_kwh = cpu_energy_j / 3_600_000.0

    return cpu_energy_kwh, total_time

Run training on CPU and GPU with utilization logging

In [None]:
# Estimate CPU power based on CPU model
cpu_power_w = 70.0

In [None]:
results = {}

for device in ["cpu", "cuda"]:
    if device == "cuda" and not torch.cuda.is_available():
        print("GPU unavailable, skipping.")
        continue
    print(f"Running on {device}...")

    # Convert to PyTorch tensors
    X_train = torch.tensor(x_train, dtype=torch.float32).to(device)
    Y_train = torch.tensor(y_train, dtype=torch.float32).view(-1, 1).to(device)

    X_test = torch.tensor(x_test, dtype=torch.float32).to(device)
    Y_test = torch.tensor(y_test, dtype=torch.float32).view(-1, 1).to(device)

    train_dataset = TensorDataset(X_train, Y_train)
    test_dataset = TensorDataset(X_test, Y_test)

    model = LinearRegressionModel(X_train.shape[1]).to(device)
    criterion = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

    # Training loop
    loader = DataLoader(train_dataset, batch_size=2048, shuffle=True)

    logger.data = []
    logger.start()

    start = timer()
    epochs = 10

    for epoch in range(epochs):
        for xb, yb in loader:
            optimizer.zero_grad()
            pred = model(xb)
            loss = criterion(pred, yb)
            loss.backward()
            optimizer.step()
    train_time = timer() - start

    logger.stop()

    # Model testing
    model.eval()
    with torch.no_grad():
        preds = model(X_test)
        mse = metrics.mean_squared_error(
            Y_test.cpu().numpy(), preds.cpu().numpy()
        )

    print(f"PyTorch {device} MSE: {mse} (training time: {train_time}s)")
    cpu_energy_kw, cpu_duration = cpu_util_to_energy(logger.data, cpu_power_w)
    gpu_energy_kw, gpu_duration = gpu_power_to_energy(logger.data, "gpu_power_w")
    print(f"CPU energy use: {cpu_energy_kw} kWh")
    print(f"GPU energy use: {gpu_energy_kw} kWh")

    del model, X_train, Y_train, X_test, Y_test
    del train_dataset, test_dataset, loader
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.ipc_collect()

Compare the results and comment on them. Use the energy usage information to calculate carbon emissions based on the previously determined VM location.