# Sentence Transformer Demo

In this notebook, we will:

- Load a Sentence Transformer model
- Encode some sample sentences
- Inspect and print the resulting embeddings

In [6]:
import torch
from transformers import AutoTokenizer
from src.model import SentenceTransformerModel

# Create model & tokenizer
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = SentenceTransformerModel(model_name=model_name, pooling='mean')
model.eval()

sentences = [
    "Hello, how are you?",
    "I love exploring deep learning techniques.",
    "Sentence transformers encode text into embeddings."
]

encoded = tokenizer(
    sentences,
    padding=True,
    truncation=True,
    max_length=32,
    return_tensors='pt'
)

with torch.no_grad():
    embeddings = model(encoded['input_ids'], encoded['attention_mask'])

print('Embeddings shape:', embeddings.shape)
for i, sentence in enumerate(sentences):
    print(f"\nSentence: {sentence}")
    print("Embedding (first 5 values):", embeddings[i][:5])

Embeddings shape: torch.Size([3, 256])

Sentence: Hello, how are you?
Embedding (first 5 values): tensor([ 0.0420, -0.1092, -0.0064, -0.0521, -0.0463])

Sentence: I love exploring deep learning techniques.
Embedding (first 5 values): tensor([-0.0459, -0.0822,  0.0737,  0.0821, -0.0132])

Sentence: Sentence transformers encode text into embeddings.
Embedding (first 5 values): tensor([-0.2098, -0.1235,  0.1296,  0.1418,  0.0641])
