# 🔢 Notebook 04: Padding, Tensors & DataLoader

## From Variable Sequences to Fixed-Size Batches

This notebook teaches you how to handle the fundamental challenge of neural networks: they need fixed-size inputs, but text sequences vary in length. You'll implement padding, convert to PyTorch tensors, and create efficient batch loading.


## 🧠 Concept Primer: Padding and Batching

### What We're Doing
Converting variable-length encoded sequences into fixed-size tensors that neural networks can process in batches.

### Why This Step is Critical
**Neural networks require fixed-size inputs.** The challenges:
- **Variable sequence lengths** (reviews have different numbers of words)
- **Batch processing** (neural networks are faster with batches)
- **Memory efficiency** (tensors must be rectangular)

### What We'll Build
- **Padding function** that extends short sequences with `<pad>` tokens
- **Truncation logic** that cuts long sequences to max length
- **Tensor conversion** using `torch.tensor()`
- **DataLoader setup** for efficient batch loading

### Shape Expectations
- **Input sequences**: Variable length → Fixed length (128)
- **Tensors**: `[batch_size, sequence_length]` → `[16, 128]`
- **Labels**: `[batch_size]` → `[16]`

### Expected Output Example
```python
for X_batch, y_batch in train_dataloader:
    print(X_batch.shape, y_batch.shape)
# Output: torch.Size([16, 128]) torch.Size([16])
```


## 🔧 TODO #1: Implement Padding Function

**Task:** Create function that pads short sequences and truncates long ones to fixed length.

**Hint:** If `len(seq) < max_len`, extend with `1` (pad token); if longer, slice `seq[:max_len]`

**Expected Function Signature:**
```python
def pad_or_truncate(seq, max_len=128):
    # Your implementation here
    return padded_seq  # List of length max_len
```

**Expected Output Example:**
```python
pad_or_truncate([2, 3, 4], max_len=5)
# Returns: [2, 3, 4, 1, 1]  # padded

pad_or_truncate([2, 3, 4, 5, 6, 7, 8], max_len=5)
# Returns: [2, 3, 4, 5, 6]  # truncated
```


In [None]:
# TODO #1: Implement padding function
# Your code here


## 🔧 TODO #2: Encode and Pad All Texts

**Task:** Apply encoding and padding to all training texts.

**Hint:** Use list comprehension: `padded_text_seqs = [pad_or_truncate(encode_text(text, vocab)) for text in train_texts]`

**Expected Variable:**
- `padded_text_seqs` → List of lists, each inner list has length 128

**Shape Check:** All sequences should have the same length (128)


In [None]:
# TODO #2: Encode and pad all texts
# Your code here


## 🔧 TODO #3: Convert to PyTorch Tensors

**Task:** Convert padded sequences and labels to PyTorch tensors.

**Hint:** Use `torch.tensor(padded_text_seqs, dtype=torch.long)` and `torch.tensor(train_labels, dtype=torch.long)`

**Expected Variables:**
- `X_tensor` → Tensor of shape `[num_samples, 128]`
- `y_tensor` → Tensor of shape `[num_samples]`

**Data Types:** Use `torch.long` for both (integers for embeddings and class labels)


In [None]:
# TODO #3: Convert to PyTorch tensors
import torch

# Your code here


## 🔧 TODO #4: Create DataLoader

**Task:** Build PyTorch DataLoader for efficient batch processing.

**Hint:** Use `TensorDataset(X_tensor, y_tensor)` then `DataLoader(train_dataset, batch_size=16, shuffle=True)`

**Expected Variables:**
- `train_dataset` → TensorDataset combining features and labels
- `train_dataloader` → DataLoader with batch size 16 and shuffling

**Expected Output:** When you iterate through the DataLoader, you should get batches of shape `[16, 128]` and `[16]`


In [None]:
# TODO #4: Create DataLoader
from torch.utils.data import TensorDataset, DataLoader

# Your code here


## 📝 Reflection Prompts

### 🤔 Understanding Check
1. **Why shuffle training data?** What would happen if you didn't shuffle?

2. **What's the tradeoff of max_len=128 vs 256?** Consider memory usage and information loss.

3. **Why use `torch.long` for both inputs and labels?** What would happen with other data types?

4. **How does padding affect the model's understanding?** Will the model "see" the padding tokens?

### 🎯 Batching Strategy
- Why is batch processing more efficient than processing one sample at a time?
- How does the batch size of 16 affect training speed vs memory usage?
- What happens if your dataset size isn't divisible by batch size?

---

**Write your reflections here:**
