# ðŸ“˜ Notebook 1: Python for Deep Learning

## 1\. Introduction

This notebook covers the specific subset of Python features that are ubiquitous in Deep Learning. We skip the basics (if statements, basic loops) and focus on:

  * **List Comprehensions:** For efficient data preprocessing.
  * **Advanced Argument Parsing:** `*args` and `**kwargs` (essential for wrapping models).
  * **Object-Oriented Programming:** Specifically `__init__`, `__call__`, and inheritance (the skeleton of every PyTorch model).
  * **Type Hinting:** For readable, modern ML code.


## 2\. Pythonic Data Handling: Comprehensions & Slicing

Deep learning involves moving massive amounts of data into lists and tensors. Writing "C-style" loops is too slow and verbose.

### 2.1 List Comprehensions

**Why it matters:** You will often need to transform a list of file paths or image labels in one line.

In [1]:
# Traditional loop (Avoid this in scripts)
files = ["image1.jpg", "image2.png", "data.txt", "image3.jpeg"]
image_files = []
for f in files:
    if f.endswith((".jpg", ".png", ".jpeg")):
        image_files.append(f)

print(f"Loop result: {image_files}")

Loop result: ['image1.jpg', 'image2.png', 'image3.jpeg']


In [None]:

# Pythonic List Comprehension (Preferred)
# Syntax: [expression for item in iterable if condition]
# It is generally faster and more readable.
clean_images = [f for f in files if f.endswith((".jpg", ".png", ".jpeg"))]

print(f"Comprehension result: {clean_images}")

# Example: Normalizing pixel values (0-255) to (0-1)
pixels = [0, 255, 128, 64]
normalized = [p / 255.0 for p in pixels]
print(f"Normalized: {normalized}")

### 2.2 Slicing and Indexing

**Why it matters:** Slicing lists works exactly like slicing Tensors in NumPy/PyTorch.

In [None]:
data = list(range(10)) # [0, 1, 2, ..., 9]

print(f"First 3 items: {data[:3]}")
print(f"Last 3 items: {data[-3:]}")
print(f"Every 2nd item: {data[::2]}")
print(f"Reverse list: {data[::-1]}")

## 3\. Functions & Argument Unpacking

In PyTorch, you often wrap layers or functions where you don't know exactly how many arguments will be passed.

### 3.1 `*args` and `**kwargs`

**Why it matters:** You will see `def forward(self, x, **kwargs):` in almost every transformer implementation to handle optional arguments like attention masks or caching.


In [None]:
def train_model(model_name, **hyperparameters):
    """
    **kwargs packs keyword arguments into a dictionary.
    """
    print(f"Training {model_name}...")
    for key, value in hyperparameters.items():
        print(f" -> Setting {key} to {value}")

# Flexible calling
train_model("ResNet50", learning_rate=0.01, optimizer="Adam", batch_size=32)

# ---------------------------------------------------------

def summary(layer_type, *dimensions):
    """
    *args packs positional arguments into a tuple.
    """
    print(f"Layer: {layer_type}")
    print(f"Dimensions: {dimensions}")

summary("Conv2d", 3, 64, 3, 3) # (Channels, Output, KernelH, KernelW)

### 3.2 Lambda Functions

**Why it matters:** Useful for quick transforms in data loaders (e.g., sorting a list of tuples based on the second element).


In [2]:
# List of (file_path, label)
dataset = [("img1.jpg", 1), ("img2.jpg", 0), ("img3.jpg", 1)]

# Sort by label
sorted_dataset = sorted(dataset, key=lambda x: x[1])
print(f"Sorted: {sorted_dataset}")

Sorted: [('img2.jpg', 0), ('img1.jpg', 1), ('img3.jpg', 1)]


## 4\. Object-Oriented Programming (The PyTorch Way)

This is the **most critical section**. Every Neural Network in PyTorch is a class that inherits from `nn.Module`.

### 4.1 The `__init__` and `__call__` Duality

**Why it matters:**

  * `__init__`: Define your layers (weights).
  * `__call__`: Define the forward pass (how data flows).
  * *Note: In PyTorch, we actually implement `forward()`, but the framework uses `__call__` to trigger hooks.*

In [None]:
class SimpleLayer:
    def __init__(self, weight):
        """Initialize weights/parameters here."""
        self.weight = weight
        print(f"Layer initialized with weight: {self.weight}")

    def __call__(self, x):
        """
        Makes the instance callable like a function: layer(x).
        This mimics how PyTorch models work.
        """
        return x * self.weight

# Usage
layer = SimpleLayer(weight=2.0) # __init__ runs
output = layer(5)               # __call__ runs
print(f"Output: {output}")

### 4.2 Inheritance and `super()`

**Why it matters:** You never write a model from scratch; you always extend a base class.


In [None]:
class BaseModel:
    def save(self):
        print("Saving model to disk...")

class Classifier(BaseModel):
    def __init__(self, num_classes):
        # critical: initialize the parent class
        super().__init__()
        self.num_classes = num_classes

    def predict(self, x):
        return f"Predicting {x} into {self.num_classes} classes."

model = Classifier(num_classes=10)
print(model.predict("image_data"))
model.save() # Inherited methodc


## 5\. Modern Python: Typing and Dataclasses

Deep learning codebases can get messy. Modern Python features help keep them clean.

### 5.1 Type Hinting

**Why it matters:** Helps your IDE (VS Code/PyCharm) autocomplete and catch errors before you run a 3-hour training job.


In [None]:
from typing import List, Tuple, Optional

def preprocess_batch(images: List[str], size: int = 256) -> Tuple[int, int]:
    """
    Inputs:
        images: A list of filenames
        size: Target resize dimension
    Returns:
        Tuple of (batch_size, image_size)
    """
    return len(images), size

# The hints don't enforce types at runtime, but they are crucial for documentation
print(preprocess_batch(["img1", "img2"]))

### 5.2 Dataclasses for Configuration

**Why it matters:** Instead of passing massive dictionaries of hyperparameters (learning rate, epochs, dropout) around, use Dataclasses.


In [None]:
from dataclasses import dataclass

@dataclass
class TrainingConfig:
    learning_rate: float
    batch_size: int
    epochs: int = 10 # Default value
    use_gpu: bool = True

# Usage
config = TrainingConfig(learning_rate=1e-4, batch_size=64)

print(f"Config: {config}")
print(f"LR: {config.learning_rate}")

## 6\. Useful Standard Libraries for ML

A few built-in libraries appear constantly.

  * `pathlib`: Modern file path handling (replaces `os.path`).
  * `tqdm`: Progress bars (essential for training loops).

In [None]:
# Pathlib example
from pathlib import Path

# Create a dummy path object (works on Windows/Linux/Mac automatically)
p = Path("data/images")
print(f"Parent directory: {p.parent}")
print(f"Absolute path: {p.resolve()}")

# ---------------------------------------------------------

# TQDM Example (You may need to install it: pip install tqdm)
# This creates a progress bar for your loops.
import time
from tqdm import tqdm

print("Training simulation...")
for i in tqdm(range(5)):
    time.sleep(0.1) # Simulate work

## 7\. Comprehensive Checkpoint & Challenge

To ensure you are ready to move on, try to solve this mini-challenge without looking back at the code above.

**The Challenge:**
Create a class `ImageLoader` that simulates a deep learning data pipeline.

1.  **`__init__`**: Accepts a list of file paths and a `target_size` (int).
2.  **`__call__`**: Accepts an index `i`. It should return a dictionary `{'path': ..., 'size': ...}` for the file at that index.
3.  **Process**: Filter out any file that doesn't end in `.jpg` using a list comprehension.
4.  **Bonus**: Use a dataclass to store the configuration (paths and size).

**Self-Evaluation Checklist:**

  * [ ] Can I write a list comprehension to filter or transform data in one line?
  * [ ] Do I understand that `model(x)` actually calls the `__call__` method?
  * [ ] Am I comfortable using `**kwargs` to pass optional arguments?
  * [ ] Do I understand why `super().__init__()` is necessary in inheritance?