# Albert Zhang - ML Apprentice Take Home Exercise
## Sentence Transformers & Multi-Task Learning
---
This notebook is structured to complete the following tasks:
- **Task 1**: Sentence Transformer Implementation
- **Task 2**: Multi-Task Learning Expansion
- **Task 3**: Training Considerations & Transfer Learning
- **Task 4**: Multi-Task Learning Training Loop (Bonus)

---

## Task 1: Sentence Transformer Implementation
We use HuggingFace Transformers to implement a sentence transformer. The model will encode input sentences into fixed-length embeddings.

In [None]:
from transformers import AutoTokenizer, AutoModel
import torch

class SentenceTransformer(torch.nn.Module):
    def __init__(self, model_name='distilbert-base-uncased'):
        super(SentenceTransformer, self).__init__()
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.transformer = AutoModel.from_pretrained(model_name)
    
    def forward(self, sentences):
        tokens = self.tokenizer(sentences, return_tensors='pt', padding=True, truncation=True)
        with torch.no_grad():
            outputs = self.transformer(**tokens)
        return outputs.last_hidden_state[:, 0, :]  # CLS token

# Example usage
model = SentenceTransformer()
embeddings = model(["This is a test sentence.", "I love machine learning!."])
print(embeddings.shape)

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


torch.Size([2, 768])


## Task 2: Multi-Task Learning Expansion
We expand our model to support two tasks:
- **Task A**: Sentence Classification
- **Task B**: Sentiment Analysis 

This is done by adding two linear task-specific heads.

In [2]:
class MultiTaskModel(SentenceTransformer):
    def __init__(self, model_name='distilbert-base-uncased', num_classes_a=3, num_classes_b=2):
        super(MultiTaskModel, self).__init__(model_name)
        self.classifier_a = torch.nn.Linear(self.transformer.config.hidden_size, num_classes_a)
        self.classifier_b = torch.nn.Linear(self.transformer.config.hidden_size, num_classes_b)

    def forward(self, sentences):
        embeddings = super().forward(sentences)
        return {
            'task_a': self.classifier_a(embeddings),
            'task_b': self.classifier_b(embeddings)
        }

## Task 3: Training Considerations
Scenarios discussed:
- **Freezing entire network**: Useful for inference with minimal resources.
- **Freezing backbone only**: Enables quick adaptation via fine-tuning task-specific heads.
- **Freezing task-specific heads**: Can help preserve specific outputs during multi-stage training.

### Transfer Learning Approach:
- Pre-trained Model: `distilbert-base-uncased`
- Frozen Layers: First few transformer layers or entire transformer for faster convergence
- Trainable Layers: Task-specific heads for domain adaptation

## Task 4: Multi-Task Training Loop (Bonus)
We define a simple training loop using synthetic data and illustrate metric tracking and loss handling.

In [3]:
import torch.nn.functional as F

def train_loop(model, data_loader, optimizer):
    model.train()
    for batch in data_loader:
        sentences, labels_a, labels_b = batch
        outputs = model(sentences)
        
        loss_a = F.cross_entropy(outputs['task_a'], labels_a)
        loss_b = F.cross_entropy(outputs['task_b'], labels_b)
        loss = loss_a + loss_b

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()