# **Task 1: Sentence Transformer Implementation**

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
from sentence_transformers import SentenceTransformer

# Load a pre-trained Sentence Transformer model
model = SentenceTransformer('bert-base-nli-mean-tokens')

# Sample sentences to encode
sentences = [
    "Machine learning is amazing!",
    "Transformers are powerful for NLP tasks.",
    "Sentence embeddings help capture semantic meaning."
]

# Encode sentences into fixed-length embeddings
embeddings = model.encode(sentences)

# Display embeddings
for i, sentence in enumerate(sentences):
    print(f"Sentence: {sentence}")
    print(f"Embedding: {embeddings[i][:5]}... (truncated)\n")


Sentence: Machine learning is amazing!
Embedding: [-0.46188468  0.04595638  1.1395051   0.20642886 -0.2947008 ]... (truncated)

Sentence: Transformers are powerful for NLP tasks.
Embedding: [ 0.2185293  -0.3930077   1.0106223   0.23320135 -0.76732814]... (truncated)

Sentence: Sentence embeddings help capture semantic meaning.
Embedding: [-0.3018907   0.24977122  0.8252767   0.5386457   0.0526836 ]... (truncated)



#### **Architecture Choices**
*  **Model Selection:** "bert-base-nli-mean-tokens" is chosen since it's pre-trained on Natural Language Inference (NLI) tasks, making it effective for sentence similarity and embeddings.
*  **Framework:** The sentence-transformers library simplifies the implementation by providing efficient APIs for embedding generation.
*  **Embedding Representation:** Each sentence is encoded into a fixed-length vector, suitable for downstream tasks like classification, clustering, and similarity scoring.
*  **Performance Consideration:** Pre-trained models speed up development and avoid the need for training from scratch.

# **Task 2: Multi-Task Learning Expansion**

In [3]:
import torch
import torch.nn as nn
from sentence_transformers import SentenceTransformer

class MultiTaskSentenceTransformer(nn.Module):
    def __init__(self, model_name="bert-base-nli-mean-tokens", num_classes=3, num_sentiments=3):
        super(MultiTaskSentenceTransformer, self).__init__()

        # Load pre-trained transformer as the shared backbone
        self.sentence_transformer = SentenceTransformer(model_name)

        # Task A: Sentence Classification head
        self.classification_head = nn.Linear(768, num_classes)

        # Task B: Sentiment Analysis head
        self.sentiment_head = nn.Linear(768, num_sentiments)

    def forward(self, sentences):
        # Encode sentences using the shared transformer
        embeddings = self.sentence_transformer.encode(sentences, convert_to_tensor=True)

        # Task A: Sentence Classification
        class_logits = self.classification_head(embeddings)

        # Task B: Sentiment Analysis
        sentiment_logits = self.sentiment_head(embeddings)

        return class_logits, sentiment_logits


# Instantiate the model
model = MultiTaskSentenceTransformer()

# Sample input sentences
sample_sentences = ["I love AI research!", "The movie was terrible.", "This is a great innovation."]

# Forward pass
class_outputs, sentiment_outputs = model(sample_sentences)

print("Task A - Classification Output:", class_outputs)
print("Task B - Sentiment Output:", sentiment_outputs)


Task A - Classification Output: tensor([[ 0.2454,  0.0176,  0.3753],
        [-0.1844,  0.1087,  0.5830],
        [ 0.3544, -0.0117,  0.8064]], grad_fn=<AddmmBackward0>)
Task B - Sentiment Output: tensor([[ 0.3242, -0.1185,  0.3239],
        [ 0.4710,  0.2860, -0.0037],
        [-0.0236,  0.2406,  0.4187]], grad_fn=<AddmmBackward0>)


 #### **Changes made to the architecture**

**Shared Encoder:** Retain the pre-trained Sentence Transformer as a common encoder to generate embeddings for all tasks.

**Task-Specific Output Heads:** Introduce separate output layers for each task, such as:
* A linear layer for sentence classification.
* Another linear layer for sentiment analysis.

**Modified Forward Pass:**

* Process input sentences through the shared encoder to obtain embeddings.
* Pass these embeddings through each task-specific output head to generate respective predictions.



# **Task 3: Training Considerations**

### **Scenario 1: Freezing the Entire Network**

**What it means:** All model parameters (transformer + task-specific heads) are frozen, meaning no learning occurs.

**Implication:** This is only useful if using the model for inference without training.

**When to use:**
* When using the model for zero-shot learning without fine-tuning.
* If the pre-trained embeddings are already sufficient for downstream tasks.

### **Scenario 2: Freezing Only the Transformer Backbone**

**What it means:** The transformer model remains frozen, but the task-specific heads (classification & sentiment) are trainable.

**Implication:**
* Helps preserve pre-trained knowledge.
* educes computational cost since only a small part of the model updates.
* Might limit performance if embeddings are not well-suited for the tasks.

**When to use:**
* If there’s limited data and we want to prevent catastrophic forgetting.
* When fine-tuning for domain-specific tasks while keeping general language understanding intact.


### **Scenario 3: Freezing Only One Task-Specific Head**


**What it means:** One of the task-specific heads is frozen while the rest of the model trains.

**Implication:**
* Allows improving one task while keeping the learned knowledge of another intact.
* Useful when one task is already well-trained and does not require further updates.

**When to use:**
* If one task head has converged to a good performance level.
* To prevent a well-trained head from overfitting on noisy data.

### **Transfer Learning Approach**




**Choosing a Pre-Trained Model:**

* Select a model like bert-base-nli-mean-tokens, roberta-base, or distilbert-base-uncased.
* If domain-specific, consider models like BioBERT (for medical texts) or LegalBERT (for legal texts).

**Freezing vs. Unfreezing Layers:**

* Start by freezing the transformer and training only the task heads.
* Gradually unfreeze transformer layers if performance does not improve.

**Fine-Tuning Strategy:**

* Use progressive unfreezing, where deeper layers are unfrozen gradually.
* Apply differential learning rates, with lower rates for the transformer and higher for task heads.

## **Task 4: Training Loop Implementation (BONUS)**

In [4]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from sentence_transformers import SentenceTransformer

# Define the Multi-Task Learning model
class MultiTaskSentenceTransformer(nn.Module):
    def __init__(self, model_name="bert-base-nli-mean-tokens", num_classes=2, num_sentiments=3):
        super(MultiTaskSentenceTransformer, self).__init__()
        self.sentence_transformer = SentenceTransformer(model_name)
        self.classification_head = nn.Linear(768, num_classes)
        self.sentiment_head = nn.Linear(768, num_sentiments)

    def forward(self, sentences):
        embeddings = self.sentence_transformer.encode(sentences, convert_to_tensor=True)
        class_logits = self.classification_head(embeddings)
        sentiment_logits = self.sentiment_head(embeddings)
        return class_logits, sentiment_logits

# Custom Dataset class
class MultiTaskDataset(Dataset):
    def __init__(self, sentences, class_labels, sentiment_labels):
        self.sentences = sentences
        self.class_labels = class_labels
        self.sentiment_labels = sentiment_labels

    def __len__(self):
        return len(self.sentences)

    def __getitem__(self, idx):
        sentence = self.sentences[idx]
        class_label = self.class_labels[idx]
        sentiment_label = self.sentiment_labels[idx]
        return sentence, class_label, sentiment_label

# Sample data
sentences = [
    "I love AI research!",
    "The movie was terrible.",
    "Deep learning is fascinating.",
    "This restaurant is awful.",
    "I enjoy studying machine learning."
]
class_labels = torch.tensor([0, 1, 0, 1, 0])  # Example category labels
sentiment_labels = torch.tensor([1, 0, 2, 0, 1])  # Example sentiment labels 0 - negative, 1 - positive , 2 - neutral

# Create Dataset and DataLoader
dataset = MultiTaskDataset(sentences, class_labels, sentiment_labels)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# Initialize model, loss function, and optimizer
model = MultiTaskSentenceTransformer()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    total_loss = 0
    for batch_sentences, batch_class_labels, batch_sentiment_labels in dataloader:
        class_outputs, sentiment_outputs = model(batch_sentences)
        loss_a = criterion(class_outputs, batch_class_labels)
        loss_b = criterion(sentiment_outputs, batch_sentiment_labels)
        loss = loss_a + loss_b

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {total_loss:.4f}")

print("Training complete!")


Epoch 1/5, Loss: 4.6553
Epoch 2/5, Loss: 2.2863
Epoch 3/5, Loss: 1.2797
Epoch 4/5, Loss: 0.9284
Epoch 5/5, Loss: 0.6568
Training complete!


## **Key Features of This Training Loop**
**Data Handling:**

* Uses a custom PyTorch Dataset to simulate multi-task learning data.
* Outputs both classification labels and sentiment labels.

**Forward Pass:**

* Computes embeddings using SentenceTransformer.
* Passes embeddings through two separate task-specific heads.

**Loss Computation:**

* Uses CrossEntropyLoss for both tasks.
* Final loss = loss for Task A + loss for Task B (ensuring both tasks contribute to training).

**Optimization:**

* Adam optimizer updates only trainable parameters.
* Uses mini-batches to improve efficiency.

**Performance Metrics:**

* Tracks total loss per epoch.
* Can be extended with accuracy, precision, recall, etc.