**Task 2: Multi-Task Learning Expansion**

**Background:**  
In Task 2, we extend our SentenceTransformer (from Task 1) into a multi-task framework by:
- **Task A:** Classifying sentences into predefined classes (e.g., three classes).
- **Task B:** Performing sentiment analysis (e.g., binary classification: positive/negative).

**Design Overview:**  
- **Shared Encoder:** Use the SentenceTransformer to extract fixed-length embeddings.
- **Task-Specific Heads:**  
  - A two-layer feedforward network for sentence classification.
  - A two-layer feedforward network for sentiment analysis.
- This design enables shared representations with fine-tuning for each task.


In [31]:
# Step 1: Imports and Setup

# Import required libraries.
import torch
import torch.nn as nn
from models.sentence_transformer import SentenceTransformer

# Set the random seed for reproducibility.
torch.manual_seed(42)

<torch._C.Generator at 0x78be91069ff0>

**Code Explanation:**  
- This block imports all necessary libraries, including PyTorch and the previously defined `SentenceTransformer` from Task 1. A fixed random seed is set to ensure that results are reproducible.



In [32]:
# Step 2: MultiTaskModel Class Definition

class MultiTaskModel(nn.Module):
    """
    A multi-task model that expands the SentenceTransformer for two tasks:
      - Task A: Sentence Classification.
      - Task B: Sentiment Analysis.

    This model uses a shared encoder (the SentenceTransformer) and
    two separate task-specific heads.
    """

    def __init__(
        self,
        pretrained_model_name='bert-base-uncased',
        num_classes_taskA=3,
        num_classes_taskB=2
        ):
        """
        Initializes the MultiTaskModel.

        Args:
            pretrained_model_name (str): Name of the pre-trained BERT model.
            num_classes_taskA (int): Number of classes
            for sentence classification.
            num_classes_taskB (int): Number of classes for sentiment analysis.
        """
        super(MultiTaskModel, self).__init__()
        # Shared encoder: reusing the SentenceTransformer from Task 1.
        self.encoder = SentenceTransformer(pretrained_model_name)
        hidden_size = self.encoder.transformer.config.hidden_size

        # Task A: Sentence Classification Head.
        self.classification_head = nn.Sequential(
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU(),
            nn.Linear(hidden_size // 2, num_classes_taskA)
        )

        # Task B: Sentiment Analysis Head.
        self.sentiment_head = nn.Sequential(
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU(),
            nn.Linear(hidden_size // 2, num_classes_taskB)
        )

    def forward(self, input_sentences):
        """
        Forward pass for the multi-task model.

        Args:
            input_sentences (list): A list of input sentence strings.

        Returns:
            tuple: (logits_taskA, logits_taskB) where:
                - logits_taskA: Output from the sentence classification head.
                - logits_taskB: Output from the sentiment analysis head.
        """
        # Generate sentence embeddings using the shared encoder.
        embeddings = self.encoder(input_sentences)

        # Process the embeddings through each task-specific head.
        logits_taskA = self.classification_head(embeddings)
        logits_taskB = self.sentiment_head(embeddings)

        return logits_taskA, logits_taskB


**Code Explanation:**

- **Class Definition:**  
  This step defines the `MultiTaskModel` class, which extends the SentenceTransformer from Task 1 by adding two separate task-specific heads:
  - **Task A (Sentence Classification):** A two-layer feedforward network for classifying sentences.
  - **Task B (Sentiment Analysis):** A two-layer feedforward network for analyzing sentiment.
  
- **Forward Method:**  
  The `forward` method processes the input sentences by generating embeddings with the shared encoder and then passing those embeddings through each of the task-specific heads. The outputs (logits) for each task are returned as a tuple.



In [33]:
# Step 3: Testing the MultiTaskModel

# Set up the computation device (GPU if available, else CPU).
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Instantiate the MultiTaskModel (using 3 classes for Task A and 2 for Task B)
# and move it to the device.
multi_task_model = MultiTaskModel(
    num_classes_taskA=3,
    num_classes_taskB=2
    ).to(device)

# Define sample sentences for testing.
sample_sentences = [
    "The quick brown fox jumps over the lazy dog.",
    "Transformers have revolutionized natural language processing."
]

# Generate the shared sentence embeddings using the encoder.
embeddings = multi_task_model.encoder(sample_sentences)

# Pass the sample sentences through the multi-task model to obtain
# logits for both tasks.
logits_taskA, logits_taskB = multi_task_model(sample_sentences)

# Print the shared embeddings and their shape.
print("Embeddings shape:", embeddings.shape)
print("Embeddings:")
print(embeddings.detach().cpu().numpy())

# Print the output logits shapes for each task.
print("Task A (Sentence Classification) Logits Shape:", logits_taskA.shape)
print("Task B (Sentiment Analysis) Logits Shape:", logits_taskB.shape)

Embeddings shape: torch.Size([2, 768])
Embeddings:
[[-0.36080578  0.22707793 -0.3029696  ... -0.42242897  0.69488996
   0.62128514]
 [-0.44105065 -0.14493446 -0.28509557 ... -0.32411036 -0.15410942
   0.34400165]]
Task A (Sentence Classification) Logits Shape: torch.Size([2, 3])
Task B (Sentiment Analysis) Logits Shape: torch.Size([2, 2])


**Code Explanation:**  
In this testing block:
- The computation device is set to GPU if available; otherwise, CPU is used.
- The `MultiTaskModel` is instantiated with 3 classes for Task A and 2 classes for Task B.
- Sample sentences are processed to generate:
  - **Shared Embeddings:** Expected shape `(2, hidden_size)` (e.g., `(2, 768)` for BERT).
  - **Task A Logits:** Expected shape `(2, 3)` for sentence classification.
  - **Task B Logits:** Expected shape `(2, 2)` for sentiment analysis.
- This block demonstrates that the model correctly processes input sentences and outputs raw logits for each task, which is the primary objective of Task 2.



**Architectural Changes for Multi-Task Learning (Task 2)**

- **Shared Encoder:**  
  We reuse the pre-trained SentenceTransformer from Task 1 to extract fixed-length embeddings for all tasks. This ensures consistent representations and reduces the number of parameters.

- **Task-Specific Heads:**  
  Two separate feedforward networks are added on top of the shared encoder:  
  - **Task A (Sentence Classification):** A two-layer network that outputs logits for a predefined number of classes (e.g., three classes).  
  - **Task B (Sentiment Analysis):** A two-layer network that outputs logits for binary sentiment (e.g., positive/negative).

- **Output as Logits:**  
  The model returns raw logits for each task, allowing the use of loss functions (like CrossEntropyLoss) during training. Predictions can be derived by applying softmax and argmax during inference.

- **Modular Design:**  
  By separating the shared encoder and task-specific heads, the architecture is flexible—each head can be fine-tuned independently while sharing common features. This modularity also facilitates future expansion to additional tasks.