# Python & PyTorch Basics

**Recommended runtime:** Google Colab (GPU optional) or local JupyterLab.

- ✅ Python basics: syntax, containers, functions, files, iteration & higher-order functions
- ✅ PyTorch basics: Tensor, GPU device, Autograd, `nn.Module`, `DataLoader`, train/eval loops
- ✅ Example 1 (NLP): Bag-of-Words small example (from scratch)
- ✅ Example 2 (CV): FashionMNIST quick training (MLP; runs on CPU/GPU)
- ⛳ Optional: Tiny BERT fine-tuning (1 epoch) with `transformers`

> Notes: cells try to **gracefully degrade**—if a tool is missing, the notebook prints alternatives or skips steps.  
> Last updated: 2025-11-12 10:12


## 0) Runtime Checks

In [1]:

import os, sys, platform
print("Python:", sys.version.split()[0])
print("Platform:", platform.platform())
IN_COLAB = 'COLAB_GPU' in os.environ or 'COLAB_RELEASE_TAG' in os.environ
print("In Colab:", IN_COLAB)
try:
    import torch
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
except Exception as e:
    print("PyTorch not found:", e)


Python: 3.12.12
Platform: Linux-6.6.105+-x86_64-with-glibc2.35
In Colab: True
PyTorch: 2.8.0+cu126
CUDA available: False


## 1) **Requirements**

In [2]:
%pip install -qq -U transformers datasets accelerate pyarrow==19

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.1/42.1 MB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import os, wandb
from getpass import getpass

os.environ["WANDB_API_KEY"] = getpass("Enter your WANDB_API_KEY: ")
wandb.login()

Enter your WANDB_API_KEY: ··········


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Currently logged in as: [33mlanjinrao[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [4]:

import numpy as np, pandas as pd, matplotlib.pyplot

、、、、

as plt
print("NumPy:", np.__version__)
print("Pandas:", pd.__version__)
print("Matplotlib:", plt.matplotlib.__version__)


NumPy: 2.0.2
Pandas: 2.2.2
Matplotlib: 3.10.0


## 2) Python Basics · Syntax / Containers / Functions / Files / Iteration
Covers numbers/strings, lists/tuples/dicts, functions/args, loops, file I/O, list comprehensions, and basic higher-order functions.

In [5]:

# Numbers, strings, lists/tuples/dicts
x = 5
y = 2.5
s = "Hello, Python!"
lst = [1, 2, 3]
tup = ('a', 1)
d = {'k': 3, 'v': 9}
print(x+y, s.upper(), lst, tup, d['k'])


7.5 HELLO, PYTHON! [1, 2, 3] ('a', 1) 3


In [6]:

# Functions & keyword-only args
def greet(name, title="Dr.", *, excited=False):
    msg = f"Hi, {title} {name}"
    return msg + "!!!" if excited else msg

print(greet("Smith"))
print(greet("Ada", title="Prof.", excited=True))


Hi, Dr. Smith
Hi, Prof. Ada!!!


In [7]:

# Loops, comprehensions, higher-order utilities
from functools import lru_cache, reduce
squares = [i*i for i in range(6)]
even_squares = [z for z in squares if z % 2 == 0]
sum_squares = reduce(lambda a,b: a+b, squares, 0)

@lru_cache(maxsize=None)
def fib(n):
    return n if n < 2 else fib(n-1)+fib(n-2)

print("squares:", squares)
print("even_squares:", even_squares)
print("sum_squares:", sum_squares)
print("fib(20):", fib(20))


squares: [0, 1, 4, 9, 16, 25]
even_squares: [0, 4, 16]
sum_squares: 55
fib(20): 6765


In [8]:

# Simple file I/O
from pathlib import Path
p = Path("demo.txt")
p.write_text("First line\nSecond line\nThird line\n", encoding="utf-8")
print("File content:")
print(p.read_text(encoding="utf-8"))


File content:
First line
Second line
Third line



## 3) PyTorch Basics · Tensor / GPU / Autograd / nn
Goal: get a quick grasp of tensors, device moves, broadcasting, autograd, and building a minimal `nn.Module`.

In [9]:

import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)
a = torch.randn(2,3, device=device)
b = torch.randn(3,2, device=device)
c = a @ b    # matmul
print("a:", a.shape, "b:", b.shape, "c:", c.shape)
x = torch.arange(6, dtype=torch.float32, device=device).reshape(2,3)
v = torch.tensor([1.0, 2.0, 3.0], device=device)
print("Broadcast:", (x+v).shape)


Using device: cpu
a: torch.Size([2, 3]) b: torch.Size([3, 2]) c: torch.Size([2, 2])
Broadcast: torch.Size([2, 3])


In [10]:

# Autograd demo: y = (x^2).sum() -> dy/dx = 2x
x = torch.randn(4, requires_grad=True)
y = (x**2).sum()
y.backward()
print("x:", x)
print("x.grad:", x.grad)
# Detach to stop tracking
z = x.detach()
print("Detached requires_grad:", z.requires_grad)


x: tensor([-0.5997, -1.6607,  1.4610,  0.0182], requires_grad=True)
x.grad: tensor([-1.1994, -3.3214,  2.9220,  0.0365])
Detached requires_grad: False


In [11]:

# Minimal nn.Module
import torch.nn as nn
class TinyNet(nn.Module):
    def __init__(self, d_in=10, d_h=16, d_out=2):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_in, d_h),
            nn.ReLU(),
            nn.Linear(d_h, d_out)
        )
    def forward(self, x):
        return self.net(x)

model = TinyNet().to(device)
print(model)
dummy = torch.randn(5, 10, device=device)
logits = model(dummy)
print("logits:", logits.shape)


TinyNet(
  (net): Sequential(
    (0): Linear(in_features=10, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=2, bias=True)
  )
)
logits: torch.Size([5, 2])


## 4) Example #1 · Bag-of-Words (from scratch, tiny dataset)
- Tokenization → vocab → vectorization → linear classifier (`nn.Linear`).
- Good for showing the full pipeline: preprocessing → tensors → training → evaluation.

In [12]:

import re, random, torch, torch.nn as nn, torch.optim as optim
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

data = [
    ("deep learning changes everything", 1),
    ("neural networks are powerful", 1),
    ("this movie was fantastic", 1),
    ("awful plot and bad acting", 0),
    ("terrible movie and boring", 0),
    ("an excellent and enjoyable film", 1),
    ("bad direction and poor script", 0),
    ("i loved the visuals", 1),
]

random.shuffle(data)

def tokenize(s): return re.findall(r"[a-z]+", s.lower())
vocab = {}
for s,_ in data:
    for tok in tokenize(s):
        if tok not in vocab: vocab[tok] = len(vocab)
V = len(vocab)

def vectorize(s):
    x = torch.zeros(V)
    for tok in tokenize(s):
        if tok in vocab:
            x[vocab[tok]] += 1.0
    return x

X = torch.stack([vectorize(s) for s,_ in data])
y = torch.tensor([lbl for _,lbl in data], dtype=torch.long)
n_train = int(0.75*len(data))
Xtr, Xte = X[:n_train], X[n_train:]
ytr, yte = y[:n_train], y[n_train:]

model = nn.Sequential(nn.Linear(V, 2)).to(device)
loss_fn = nn.CrossEntropyLoss()
opt = optim.Adam(model.parameters(), lr=0.05)

Xtr_d = Xtr.to(device)
ytr_d = ytr.to(device)
for epoch in range(80):
    opt.zero_grad()
    logits = model(Xtr_d)
    loss = loss_fn(logits, ytr_d)
    loss.backward(); opt.step()
    if (epoch+1) % 20 == 0:
        print(f"epoch {epoch+1:03d} loss={loss.item():.4f}")

@torch.no_grad()
def evaluate(X, y):
    logits = model(X.to(device))
    pred = logits.argmax(dim=1).cpu()
    acc = (pred == y).float().mean().item()
    return acc, pred

acc, pred = evaluate(Xte, yte)
print("Test acc:", round(acc, 3))
print("Pred vs true:", list(zip(pred.tolist(), yte.tolist())))


epoch 020 loss=0.0097
epoch 040 loss=0.0027
epoch 060 loss=0.0019
epoch 080 loss=0.0016
Test acc: 0.5
Pred vs true: [(0, 1), (0, 0)]


## 5) Example #2 · FashionMNIST (MLP quick training)
- Pipeline: `Dataset/DataLoader → Model → Loss/Optimizer → Train/Eval`
- Runs on CPU by default; uses GPU if available.

In [13]:

import torch, torch.nn as nn, torch.optim as optim, os
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tfm = transforms.Compose([transforms.ToTensor()])
train_ds = datasets.FashionMNIST(root="data", train=True, download=True, transform=tfm)
test_ds  = datasets.FashionMNIST(root="data", train=False, download=True, transform=tfm)
train_loader = DataLoader(train_ds, batch_size=128, shuffle=True, num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_ds,  batch_size=256, shuffle=False, num_workers=2, pin_memory=True)

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28*28, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )
    def forward(self, x): return self.net(x)

model = MLP().to(device)
opt = optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.CrossEntropyLoss()

def train_one_epoch(model, loader):
    model.train()
    total, correct, total_loss = 0, 0, 0.0
    for xb, yb in loader:
        xb, yb = xb.to(device), yb.to(device)
        opt.zero_grad()
        logits = model(xb)
        loss = loss_fn(logits, yb)
        loss.backward(); opt.step()
        total_loss += loss.item() * xb.size(0)
        correct += (logits.argmax(1) == yb).sum().item()
        total += xb.size(0)
    return total_loss/total, correct/total

@torch.no_grad()
def evaluate(model, loader):
    model.eval()
    total, correct, total_loss = 0, 0, 0.0
    for xb, yb in loader:
        xb, yb = xb.to(device), yb.to(device)
        logits = model(xb)
        loss = loss_fn(logits, yb)
        total_loss += loss.item() * xb.size(0)
        correct += (logits.argmax(1) == yb).sum().item()
        total += xb.size(0)
    return total_loss/total, correct/total

for epoch in range(3):
    tr_loss, tr_acc = train_one_epoch(model, train_loader)
    te_loss, te_acc = evaluate(model, test_loader)
    print(f"epoch {epoch+1} | train acc {tr_acc:.3f} | val acc {te_acc:.3f}")

os.makedirs("checkpoints", exist_ok=True)
torch.save(model.state_dict(), "checkpoints/fmnist_mlp.pt")
print("Saved to checkpoints/fmnist_mlp.pt")


100%|██████████| 26.4M/26.4M [00:01<00:00, 18.9MB/s]
100%|██████████| 29.5k/29.5k [00:00<00:00, 305kB/s]
100%|██████████| 4.42M/4.42M [00:00<00:00, 5.60MB/s]
100%|██████████| 5.15k/5.15k [00:00<00:00, 15.4MB/s]


epoch 1 | train acc 0.809 | val acc 0.836
epoch 2 | train acc 0.858 | val acc 0.854
epoch 3 | train acc 0.871 | val acc 0.863
Saved to checkpoints/fmnist_mlp.pt


## 6) (Optional) Tiny BERT Fine-tuning
> Requires internet to install `transformers` / `datasets`. Runtime depends on your environment.
This cell fine-tunes **`distilbert-base-uncased`** for 1 epoch on a small sample, demonstrating the Trainer API.

In [14]:

RUN_BERT = True  # Set to True to actually run the cell
if RUN_BERT:
    from datasets import load_dataset
    from transformers import AutoTokenizer, DataCollatorWithPadding, AutoModelForSequenceClassification, TrainingArguments, Trainer

    ds = load_dataset("ag_news")
    small_train = ds["train"].shuffle(seed=42).select(range(1000))
    small_test  = ds["test"].shuffle(seed=42).select(range(500))

    tok = AutoTokenizer.from_pretrained("distilbert-base-uncased")
    def tokenize(batch): return tok(batch["text"], truncation=True)
    small_train = small_train.map(tokenize, batched=True)
    small_test  = small_test.map(tokenize, batched=True)

    collate = DataCollatorWithPadding(tokenizer=tok)
    model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=4)

    args = TrainingArguments(
        output_dir="bert_demo",
        eval_strategy="epoch",
        num_train_epochs=1,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=32,
        learning_rate=2e-5,
        logging_steps=50,
        fp16=False
    )

    def compute_metrics(eval_pred):
        import numpy as np
        logits, labels = eval_pred
        preds = logits.argmax(axis=-1)
        acc = (preds == labels).astype(float).mean().item()
        return {"accuracy": acc}

    trainer = Trainer(model=model, args=args,
                      train_dataset=small_train, eval_dataset=small_test,
                      tokenizer=tok, data_collator=collate,
                      compute_metrics=compute_metrics)
    trainer.train()
    print(trainer.evaluate())
else:
    print("Set RUN_BERT=True to run the tiny BERT fine-tuning demo.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(model=model, args=args,




Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss,Accuracy
1,1.0924,0.782734,0.838




{'eval_loss': 0.782734215259552, 'eval_accuracy': 0.838, 'eval_runtime': 91.1521, 'eval_samples_per_second': 5.485, 'eval_steps_per_second': 0.176, 'epoch': 1.0}


---

### Appendix · Quick Cheatsheet
- **conda env**  
`conda create -n my_env python=3.10` · `conda activate my_env` · `conda install -c conda-forge numpy`  
`conda env export > env.yml` · `conda env create -f env.yml`
- **mamba**: a faster drop-in replacement for conda
- **pip**: `pip install packagename` (prefer conda first when mixing)
- **PyTorch install**: see https://pytorch.org/get-started/ for CUDA-matched commands
- **Training loop quartet**: `Dataset/DataLoader` → `Model(nn.Module)` → `Loss` → `Optimizer.step()`
