<a href="https://www.kaggle.com/code/aisuko/sentence-embeddings-with-transformers?scriptVersionId=168679785" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Overview

In [Semantic Search](https://www.kaggle.com/code/aisuko/semantic-search). We use `sentence-transformers` compute the embeddings of our sentences. In this notebook, let use the Transformers without installing `sentence-transformers`.

In [1]:
%%capture
!pip install transformers==4.35.2

In [2]:
import os

os.environ['MODEL_NAME']='sentence-transformers/all-mpnet-base-v2'

# Loading the tokenizer

We load the tokenizer without any padding.

In [3]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(os.getenv('MODEL_NAME'))

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

# Loading the model

In [4]:
from transformers import AutoModel

model=AutoModel.from_pretrained(os.getenv('MODEL_NAME'))
model.max_seq_length=200
model.to('cuda')

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

MPNetModel(
  (embeddings): MPNetEmbeddings(
    (word_embeddings): Embedding(30527, 768, padding_idx=1)
    (position_embeddings): Embedding(514, 768, padding_idx=1)
    (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): MPNetEncoder(
    (layer): ModuleList(
      (0-11): 12 x MPNetLayer(
        (attention): MPNetAttention(
          (attn): MPNetSelfAttention(
            (q): Linear(in_features=768, out_features=768, bias=True)
            (k): Linear(in_features=768, out_features=768, bias=True)
            (v): Linear(in_features=768, out_features=768, bias=True)
            (o): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (intermediate): MPNetIntermediate(
          (dense): Linear(in_

# Converting the input to tokens

In [5]:
# sentences=[
#     "Which sports venue is a historic landmark in Melbourne?",
#     "What are some of the events hosted in Melbourne throughout the year?"
# ]

sentence = "The weather is so cold in Melbourne today."

encoded_input=tokenizer(sentence, padding=True, truncation=True, max_length=200, return_tensors='pt')
encoded_input.to('cuda')
encoded_input

{'input_ids': tensor([[   0, 2000, 4637, 2007, 2065, 3151, 2003, 4944, 2655, 1016,    2]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

# Computing token embeddings

In [6]:
import torch

with torch.no_grad():
    model_output=model(**encoded_input)

## Mean Pooling

Take attention mask into account for correct averaging.

In [7]:
def mean_pooling(model_output, attention_mask):
    # The first element of model_output contains all token embeddings
    token_embeddings=model_output[0]
    input_mask_expanded=attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings=torch.sum(token_embeddings*input_mask_expanded,1)
    sum_mask=torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return sum_embeddings/sum_mask

sentence_embeddings=mean_pooling(model_output,encoded_input['attention_mask'])

print(sentence_embeddings.shape)

torch.Size([1, 768])
