
# Introduction to Python 🐍
**Pre-work notebook** for the course:  
**"Green Algorithms for Artificial Intelligence: Design and Implementation"**

> Goal: become comfortable reading, writing, and running basic Python with good efficiency practices (time, memory, and energy) before the course begins.



## What you'll find in this notebook
1. Environment setup
2. Running cells and getting help
3. Essential syntax: variables, types, and operations
4. Control structures
5. Functions and modules
6. Data structures (lists, tuples, dictionaries, sets)
7. File I/O with *pathlib*
8. NumPy and vectorization: foundations for efficient compute
9. Pandas: data manipulation
10. Quick visualization with Matplotlib
11. Performance & efficiency: **CodeCarbon** energy/emissions tracking
11.5. Template: **Hugging Face Transformers** + **CodeCarbon** (quick fine-tuning on SST-2)
12. Practice exercises with solutions (at the end)


## 1) Environment setup
## How to open and run this notebook in **Google Colab**
1. Go to Google Colab.
2. **File → Upload notebook** and select this `.ipynb` file.
3. (Optional) **Runtime → Change runtime type** and choose **GPU** or **TPU** if needed.
4. Install dependencies in a cell, for example:
   ```python
   !pip -q install numpy pandas matplotlib codecarbon
   ```
5. Run the cells in order (**Runtime → Run all**) and inspect time & energy metrics via **CodeCarbon**.
6. To save changes in Drive: **File → Save a copy in Drive**.

> Tip: when measuring energy with CodeCarbon in Colab, avoid heavy tasks in other tabs to reduce measurement noise.



## 2) Running cells and asking for help
- Run a cell with **Shift+Enter**.
- Use `help(obj)` or `obj?` for quick docs.


In [None]:
# Run this
x = 42
print("Hello, Python. x =", x)

In [None]:
# Quick help
help(len)  # or use len? in an interactive cell


## 3) Essential syntax: variables, types, and operations
Python is **dynamic** and **strongly typed**. Common types:
- `int`, `float`, `bool`, `str`, `None`
- Arithmetic operators: `+ - * / // % **`
- Comparison operators: `== != < <= > >=`
- Logical operators: `and or not`


In [None]:
a = 10
b = 3
print("sum:", a + b)
print("power:", a ** b)
print("true division:", a / b)
print("integer division:", a // b)
print("modulo:", a % b)
print("a>b?:", a > b)

text = "Sustainable AI"
print(text.upper(), len(text))


## 4) Control structures
### Conditionals and loops


In [None]:
n = 7

if n % 2 == 0:
    print("even")
else:
    print("odd")

# Loops
for i in range(5):
    print("i:", i)

# while with condition
s = 0
k = 1
while k <= 5:
    s += k
    k += 1
s


## 5) Functions and modules
- Define functions with `def`.
- Document with *docstrings*.
- Import modules with `import`.


In [None]:
def is_prime(n: int) -> bool:
    """Return True if n is prime, False otherwise."""
    if n < 2:
        return False
    if n % 2 == 0:
        return n == 2
    i = 3
    while i * i <= n:
        if n % i == 0:
            return False
        i += 2
    return True

[ (k, is_prime(k)) for k in range(1, 20) ]

In [None]:
import math
math.sqrt(144), math.pi


## 6) Data structures


In [None]:
# Lists (mutable)
nums = [3, 1, 4, 1, 5, 9]
nums.append(2)
nums.sort()
nums

In [None]:
# Tuples (immutable)
point = (10, 20)
point

In [None]:
# Dictionaries (key->value)
energy = {"CPU": 65.0, "GPU": 150.0, "TPU": 120.0}
energy["GPU"]

In [None]:
# Sets (no duplicates)
unique = set([1, 2, 2, 3, 3, 3])
unique


## 7) File I/O and *pathlib*


In [None]:
from pathlib import Path

p = Path("demo.txt")
p.write_text("Hello file!\nThis is an example.")
print("Content:\n", p.read_text())

# Clean up
p.unlink()


## 8) NumPy and vectorization: foundations for efficient compute
- `numpy` offers **efficient arrays** and vectorized operations (avoid Python loops).
- Key for performance and energy: **less interpreter, more bulk operations**.


In [None]:
import numpy as np

a = np.arange(100000, dtype=np.float64)
b = np.arange(100000, 0, -1, dtype=np.float64)

# Vectorized operation
c = a * b + np.sin(a)
c[:5], c.shape, c.dtype

In [None]:
# Quick comparison (rough, not exact timing)
import time

def python_loop(a_list, b_list):
    out = []
    append = out.append
    import math
    for x, y in zip(a_list, b_list):
        append(x*y + math.sin(x))
    return out

a_list = list(range(5000000))
b_list = list(range(5000000, 0, -1))

t0 = time.perf_counter()
out_list = python_loop(a_list, b_list)
t1 = time.perf_counter()

t2 = time.perf_counter()
out_np = np.array(a_list) * np.array(b_list) + np.sin(np.array(a_list))
t3 = time.perf_counter()

t4 = time.perf_counter()
out_np = a * b + np.sin(a)
t5 = time.perf_counter()

print("Python loop time:", round(t1 - t0, 4), "s")
print("NumPy time (with creation):", round(t3 - t2, 4), "s")
print("NumPy time (arrays already created):", round(t5 - t4, 4), "s")


## 9) Pandas: data manipulation


In [None]:
import pandas as pd

df = pd.DataFrame({
    "algorithm": ["A", "B", "C", "A", "B", "C"],
    "accuracy": [0.81, 0.85, 0.83, 0.82, 0.84, 0.86],
    "energy_wh": [5.1, 7.3, 6.0, 5.0, 7.1, 6.2],
    "time_s": [1.2, 2.1, 1.5, 1.1, 2.0, 1.6]
})
df.head()

In [None]:
# Aggregations
summary = df.groupby("algorithm").agg(accuracy_mean=("accuracy", "mean"),
                                      energy_mean=("energy_wh", "mean"),
                                      time_mean=("time_s", "mean"))
summary


## 10) Quick visualization with Matplotlib
> Tip: avoid hard-coding styles/colors when you just need a quick look.


In [None]:
import matplotlib.pyplot as plt

summary_sorted = summary.sort_values("accuracy_mean")
plt.figure()
plt.bar(summary_sorted.index, summary_sorted["accuracy_mean"])
plt.title("Mean accuracy per algorithm")
plt.xlabel("Algorithm")
plt.ylabel("Mean accuracy")
plt.show()


## 11) Performance and efficiency: time, memory, and **energy** with CodeCarbon
We'll measure **execution time**, **estimated energy consumption**, and **CO₂e emissions** using **[CodeCarbon](https://mlco2.github.io/codecarbon/)**.

**Key ideas:**
- **Measure before you optimize**: avoid blind changes.
- **Vectorize** when possible (less interpreter → less CPU → less energy).
- **Reduce copies** and pick proper dtypes (`float32` vs `float64`).
- **Be mindful of parallelism**: more threads/GPUs do not always mean less total energy for a given SLA.



### 11.1) Setup: install CodeCarbon (Colab or local)
Run the cell below if you don't have `codecarbon` installed. In **Google Colab** it is typically enough.


In [None]:
# Install CodeCarbon if needed (ignore if already installed).
# In some environments you may need to restart the kernel after installation.
try:
    import codecarbon  # noqa: F401
except ImportError:
    !pip install codecarbon


### 11.2) Measuring a simple experiment (pure Python vs NumPy)
We compare two approaches for the same task: a **Python loop** and a **vectorized NumPy** version.  
**CodeCarbon** will estimate energy and emissions. For portability, we use **offline** mode and set the country to **Spain (ESP)**.


**What this cell does**

Define two equivalent functions (Python loop vs NumPy) and measure **time**, **energy**, and **emissions** with CodeCarbon (offline, ESP).

In [None]:
from time import perf_counter
import numpy as np

from codecarbon import EmissionsTracker

def experiment_python(N=200_000):
    import math
    out = []
    append = out.append
    for i in range(N):
        append(i * (N - i) + math.sin(i))
    return sum(out)

def experiment_numpy(N=200_000):
    a = np.arange(N, dtype=np.float64)
    b = np.arange(N, 0, -1, dtype=np.float64)
    c = a * b + np.sin(a)
    return float(np.sum(c))

def measure(func, *args, **kwargs):
    tracker = EmissionsTracker()
    tracker.start()
    t0 = perf_counter()
    res = func(*args, **kwargs)
    t1 = perf_counter()
    emissions = tracker.stop()
    print(f"\nCarbon emissions from computation: {tracker.final_emissions * 1000:.4f} g CO2eq")
    print("\nDetailed emissions data:", tracker.final_emissions_data)
    energy_kwh = tracker.final_emissions_data.energy_consumed  # estimated energy
    return {
        "result": res,
        "time_s": t1 - t0,
        "energy_kwh": energy_kwh,
        "emissions_kgco2": emissions,
        "method": func.__name__,
    }

m_py = measure(experiment_python, 1500000)
m_np = measure(experiment_numpy, 1500000)
m_py, m_np

In [None]:
import pandas as pd
from math import isfinite

df_cc = pd.DataFrame([m_py, m_np])
cols = ["method", "time_s", "energy_kwh", "emissions_kgco2"]
for c in cols[1:]:
    df_cc[c] = df_cc[c].apply(lambda x: round(x, 6) if (isinstance(x, (int, float)) and isfinite(x)) else x)
df_cc[cols]


**Quick read:** on typical machines, the **vectorized NumPy** approach tends to be faster and consume less energy than an equivalent Python loop, leading to **lower emissions** for the same work.

> Note: numbers are **estimates** and depend on your hardware and system load. On GPU/TPU, also consider data movement overheads: speeding up doesn't always reduce total energy if the problem is very small.



### 11.4) Good practices (recap)
1. **Profile** and use measurement tools (time + energy).
2. **Vectorize** or use native libraries (BLAS, FFT).
3. **Mind memory** (dtypes and copies).
4. **Batching** and lazy loading for large datasets.
5. **Document** trade-offs (time vs energy vs accuracy).
