# SBERT API Notebook (Sentence-Transformers)
Purpose: Demonstrate the native sentence-transformers API (SBERT) in a small, generic way.

- This notebook is intentionally not tied to the project dataset.
- Project pipeline + results belong in SBERT_Example.ipynb / SBERT_Example.md.

### 1) Imports

We use SentenceTransformer to generate embeddings and util helpers for similarity / semantic search.



In [3]:
%pip install -U tf-keras

Collecting tf-keras
  Using cached tf_keras-2.20.1-py3-none-any.whl.metadata (1.8 kB)
Using cached tf_keras-2.20.1-py3-none-any.whl (1.7 MB)
Installing collected packages: tf-keras
Successfully installed tf-keras-2.20.1
Note: you may need to restart the kernel to use updated packages.


In [4]:
import os
os.environ["TRANSFORMERS_NO_TF"] = "1"
os.environ["TRANSFORMERS_NO_FLAX"] = "1"

In [5]:
from sentence_transformers import SentenceTransformer, util
import numpy as np
import torch

## 2) Load a pretrained SBERT model
We load a common small model: `all-MiniLM-L6-v2` (fast, 384-dim embeddings).

In [6]:
model_name = "sentence-transformers/all-MiniLM-L6-v2"
model = SentenceTransformer(model_name)
print("Loaded:", model_name)

Loaded: sentence-transformers/all-MiniLM-L6-v2


## 3) Encode text into embeddings
The main API call is `model.encode(...)`.


In [7]:
sentences = [
    "The company reported higher profits this quarter.",
    "Earnings increased due to strong sales.",
    "The firm announced layoffs and cost cutting.",
    "Markets were mostly unchanged today.",
]

emb = model.encode(sentences, batch_size=32, convert_to_numpy=True, show_progress_bar=False)
print("Embeddings shape:", emb.shape)
print("First 5 dims of sentence 0:", emb[0][:5])

Embeddings shape: (4, 384)
First 5 dims of sentence 0: [ 0.03116684 -0.00846325 -0.03795664 -0.00757171 -0.01198465]


## 4) Cosine similarity with `util.cos_sim`
We can compare two sentences by encoding them and computing cosine similarity.

In [8]:
a = model.encode(["profit increased"], convert_to_tensor=True)
b = model.encode(["earnings improved"], convert_to_tensor=True)
c = model.encode(["the company went bankrupt"], convert_to_tensor=True)

print("cos(profit increased, earnings improved) =", float(util.cos_sim(a, b)))
print("cos(profit increased, company went bankrupt) =", float(util.cos_sim(a, c)))

cos(profit increased, earnings improved) = 0.6514344811439514
cos(profit increased, company went bankrupt) = 0.3798919916152954


## 5) Semantic search (top-k similar items)
Given a query and a corpus, `util.semantic_search` returns the most similar corpus entries.

In [9]:
query = "strong quarterly results"
corpus = [
    "profits rose significantly",
    "losses widened this quarter",
    "markets were flat",
    "sales increased and margins improved",
]

query_emb = model.encode([query], convert_to_tensor=True)
corpus_emb = model.encode(corpus, convert_to_tensor=True)

hits = util.semantic_search(query_emb, corpus_emb, top_k=2)
for h in hits[0]:
    print("score=%.4f" % h["score"], "|", corpus[h["corpus_id"]])

score=0.4823 | profits rose significantly
score=0.4509 | losses widened this quarter


## 6) Optional wrapper pattern (non-core)
Some projects introduce thin wrapper functions around SentenceTransformer to reduce boilerplate (e.g., model loading, batch encoding).
These wrappers do not modify SBERTâ€™s behavior and are not required to use the API. They simply standardize repeated calls in larger codebases.
