**Task 4: Training Loop Implementation (BONUS)**

**Background**  

In this task, we implement a structured **training loop** for the **Multi-Task Learning (MTL) model** from Task 2. The goal is to simulate training without running a full-scale model.  

This implementation focuses on:  
**Handling hypothetical data** (simulated inputs & labels)  
**Forward pass** (sending inputs through the model)  
**Loss computation & optimization** (separate losses for each task)  
**Evaluation metrics** (accuracy for both tasks)  

By structuring the training process, we ensure that the **model correctly updates weights** while balancing the learning of both tasks. Although actual training is not required, this framework allows easy adaptation for real-world datasets.

In [7]:
# Step 1: Import Dependencies

import torch
import torch.nn as nn
import torch.optim as optim
from models.multitask_model import MultiTaskModel

**Code Explanation:**
- Imports **PyTorch** modules for defining the model, loss function, and optimizer.
- Imports **MultiTaskModel** from the `models/` folder (modularized from Task 2).

In [8]:
# Step 2: Create Hypothetical Data

# Define batch size and number of classes for each task
batch_size = 4
num_classes_taskA = 3  # Example: Sentence classification
num_classes_taskB = 2  # Example: Sentiment analysis

# Generate hypothetical input sentences (as tokenized tensors)
input_sentences = [
    "The weather is nice today.",
    "I love programming in Python!",
    "This restaurant has great food.",
    "Artificial Intelligence is evolving rapidly."
]

# Convert sample inputs to tensors (simulate tokenized input)
input_tensors = torch.rand((batch_size, 768))  # Random 768-d embeddings

# Generate random target labels for both tasks
labels_taskA = torch.randint(0, num_classes_taskA, (batch_size,))
labels_taskB = torch.randint(0, num_classes_taskB, (batch_size,))

**Code Explanation:**
- Defines **batch size** and number of **classes** for both tasks.
- Simulates **dummy input sentences** (not actual text processing).
- Creates **random 768-dimensional embeddings** to mimic BERT outputs.
- Generates **random class labels** for both tasks.

In [9]:
# Step 3: Define Model, Loss Function & Optimizer

# Initialize the multi-task model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MultiTaskModel(
    num_classes_taskA=num_classes_taskA,
    num_classes_taskB=num_classes_taskB
    ).to(device)

# Define loss functions for both tasks
# (CrossEntropyLoss is typical for classification)
loss_fn_taskA = nn.CrossEntropyLoss()
loss_fn_taskB = nn.CrossEntropyLoss()

# Define optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

**Code Explanation:**
- Loads the **MultiTaskModel** onto CPU/GPU.
- Uses **CrossEntropyLoss** (a standard classification loss function).
- Sets up **Adam optimizer** with a learning rate of `0.001`.


In [10]:
# Step 4: Define the Training Loop

# Define number of epochs (for demonstration only)
num_epochs = 3

for epoch in range(num_epochs):
    print(f"\nEpoch {epoch+1}/{num_epochs}")

    # Forward pass: Get predictions for both tasks
    logits_taskA, logits_taskB = model(input_sentences)

    # Compute losses for each task
    loss_taskA = loss_fn_taskA(logits_taskA, labels_taskA)
    loss_taskB = loss_fn_taskB(logits_taskB, labels_taskB)

    # Compute total loss (sum of both task losses)
    total_loss = loss_taskA + loss_taskB

    # Backpropagation
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()

    # Print loss for tracking
    print(
    f"Loss Task A: {loss_taskA.item():.4f}, "
    f"Loss Task B: {loss_taskB.item():.4f}, "
    f"Total Loss: {total_loss.item():.4f}"
    )



Epoch 1/3
Loss Task A: 0.9914, Loss Task B: 0.7179, Total Loss: 1.7093

Epoch 2/3
Loss Task A: 0.7000, Loss Task B: 0.5656, Total Loss: 1.2657

Epoch 3/3
Loss Task A: 0.8685, Loss Task B: 0.5623, Total Loss: 1.4308


**Code Explanation:**
- Loops through **3 epochs** to simulate training.
- Passes **input sentences through the model** to get predictions for both tasks.
- Computes **CrossEntropy loss** separately for **Task A** and **Task B**.
- Combines both losses into **`total_loss = loss_taskA + loss_taskB`**.
- Performs **backpropagation (`.backward()`)** and **updates weights (`.step()`)**.
- Prints **loss values** for monitoring.

In [11]:
# Step 5: Define Evaluation Metrics

def compute_accuracy(logits, labels):
    """Calculate accuracy by comparing predictions with actual labels."""
    predictions = torch.argmax(logits, dim=1)
    correct = (predictions == labels).sum().item()
    return correct / labels.size(0)

# Compute accuracy for both tasks
accuracy_taskA = compute_accuracy(logits_taskA, labels_taskA)
accuracy_taskB = compute_accuracy(logits_taskB, labels_taskB)

# Print accuracy for both tasks (formatted for PEP 8 compliance)
print(
    f"Task A Accuracy: {accuracy_taskA:.2f}, "
    f"Task B Accuracy: {accuracy_taskB:.2f}"
)

Task A Accuracy: 0.75, Task B Accuracy: 0.75


**Code Explanation:**
- Uses `torch.argmax()` to get **predicted class labels** from logits.
- Compares predictions with **ground-truth labels** to compute **accuracy**.
- Prints **accuracy** for **Task A** and **Task B**.

**Key Decisions & Insights**  

 - **Hypothetical Data Handling:**  
  - Used **random tensors** (`batch_size, 768`) to simulate BERT embeddings.  
  - Created **random labels** for classification tasks.  
  - Ensured that the training loop can function without a real dataset.  

- **Loss Computation:**  
  - Used **CrossEntropyLoss** for both tasks (classification-based).  
  - Summed both losses to compute the **total loss** (`loss_taskA + loss_taskB`).  
  - Ensured **task balance** by preventing one loss from dominating.  

- **Optimization & Backpropagation:**  
  - Used **Adam optimizer** for adaptive learning rate updates.  
  - Applied **zero_grad() → backward() → step()** in sequence.  
  - Prevented gradient accumulation across iterations.  

- **Evaluation Metrics:**  
  - Computed **accuracy** for both tasks using `torch.argmax()`.  
  - Simple and effective metric for classification performance.  

- **Final Takeaways:**  
  - The training loop ensures **multi-task learning** is handled efficiently.  
  - The model trains on **both tasks simultaneously** while keeping task-specific gradients separate.  
  - The modular setup makes it **easy to integrate real datasets** later.  

This setup forms a **scalable and reusable framework** for multi-task training.