

# Fetch ML Assessment: Task 1  
## Sentence Transformer Implementation Using Pre-trained BERT


> **Background:**  
> In this assessment, our goal is to implement a sentence transformer model that encodes input sentences into fixed-length embeddings. We will use a pre-trained BERT model (specifically `bert-base-uncased`) and extract the [CLS] token's output as the sentence embedding. This implementation lays the foundation for later extensions into multi-task learning.  
>  
> **Design Priorities:**  
> - **Clarity & Modularity:** The code is split into clearly defined steps, and each function/class is documented.  
> - **Efficiency:** By leveraging a pre-trained model, we ensure that our sentence representations are robust and efficient.  
> - **Reproducibility:** A fixed random seed is set to help reproduce results.

---



In [2]:
# Step 1: Environment Setup

# Install the transformers library if it is not already installed using pip.
# Uncomment the following line if installation is required.
# !pip install transformers

# Import required libraries.
import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer

# Set the random seed for reproducibility.
torch.manual_seed(42)

<torch._C.Generator at 0x796ace016070>

# Code Explanation

> - In this step, we import all the necessary libraries:  
> - **PyTorch**: Used to build our neural network model and handle tensor operations.  
> - **Hugging Face Transformers**: Provides the pre-trained BERT model and its tokenizer.  
> We also set a random seed (`torch.manual_seed(42)`) to ensure that our results remain consistent across runs.


In [3]:
# Step 2: Implementing the Sentence Transformer Class


# Define the SentenceTransformer class which uses a pre-trained BERT model.
class SentenceTransformer(nn.Module):
    """
    A sentence transformer model that converts input sentences into
    fixed-length embeddings.
    Utilizes a pre-trained BERT model and extracts the [CLS] token as the
    sentence embedding.
    """

    def __init__(self, pretrained_model_name='bert-base-uncased'):
        """
        Initializes the SentenceTransformer.

        Args:
            pretrained_model_name (str): Name of the pre-trained BERT
            model to load.
        """
        super(SentenceTransformer, self).__init__()
        # Load the pre-trained BERT model and its corresponding tokenizer.
        self.transformer = BertModel.from_pretrained(pretrained_model_name)
        self.tokenizer = BertTokenizer.from_pretrained(pretrained_model_name)

    def forward(self, input_sentences):
        """
        Forward pass for the model. Converts input sentences into embeddings.

        Args:
            input_sentences (list): List of input sentence strings.

        Returns:
            torch.Tensor: Tensor of shape (batch_size, hidden_size)
            representing the sentence embeddings.
        """
        # Tokenize the input sentences with padding and truncation.
        encoded_input = self.tokenizer(
            input_sentences,
            padding=True,
            truncation=True,
            return_tensors="pt"
        )

        # Move the token tensors to the same device as the model parameters.
        input_ids = encoded_input['input_ids'].to(
            next(self.parameters()).device)
        attention_mask = encoded_input['attention_mask'].to(
            next(self.parameters()).device)

        # Obtain outputs from the BERT transformer.
        outputs = self.transformer(input_ids, attention_mask=attention_mask)

        # Extract the embedding of the [CLS] token (the first token)
        # as the sentence representation.
        cls_embeddings = outputs.last_hidden_state[:, 0, :]
        return cls_embeddings

### **Code Explanation**

> - This cell defines the `SentenceTransformer` class.  
>  
> - **Key Points:**  
> - **Initialization:** The constructor loads the pre-trained BERT model (`bert-base-uncased`) along with its tokenizer.  
> - **Forward Method:**  
>   - Tokenizes input sentences (using padding and truncation) to create input IDs and attention masks.  
>   - Moves these tokenized inputs to the same device (CPU or GPU) as the model.  
>   - Passes the tokens through the BERT model and extracts the [CLS] token's embedding as the fixed-length representation for each sentence.  


In [5]:
# Step 3: Testing the Sentence Transformer

# Set up the device for computation: use GPU if available, otherwise CPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create an instance of the SentenceTransformer and move it to the
# selected device.
model = SentenceTransformer().to(device)

# Define a list of sample sentences to test the model.
sample_sentences = [
    "The quick brown fox jumps over the lazy dog.",
    "Transformers are very effective for NLP tasks."
]

# Generate embeddings for the sample sentences.
embeddings = model(sample_sentences)

# Print the shape of the generated embeddings.
print("Embeddings shape:", embeddings.shape)

# Print the actual embedding values.
print("Embeddings:")
print(embeddings.detach().cpu().numpy())

Embeddings shape: torch.Size([2, 768])
Embeddings:
[[-0.36080578  0.22707793 -0.3029696  ... -0.42242897  0.69488996
   0.62128514]
 [-0.41661704 -0.15820043  0.14587061 ... -0.62564856 -0.02686349
   0.44702652]]



> **Code Explanation**  
> In this testing step:  
> - **Device Selection:** We set the computation device to GPU if available; otherwise, the CPU is used.  
> - **Model Instantiation:** An instance of `SentenceTransformer` is created and moved to the selected device.  
> - **Testing:** Two sample sentences are processed through the model to generate embeddings.  
> - **Output Verification:** The printed output should display a tensor shape of `torch.Size([2, 768])`, assuming BERT's hidden size is 768.  
>  
> This confirms that our model correctly generates fixed-length sentence embeddings.


**Architectural Decisions Outside the Transformer Backbone**

Outside the transformer backbone, we made three key decisions:  
1. We chose to use the [CLS] token for pooling due to its ability to capture sentence-level meaning.  
2. We intentionally avoided adding extra layers to keep the model simple and mitigate overfitting risks.  
3. We implemented device consistency by moving all input tensors to the same device as the model.