# 2 - Embeddings 

## 2.1 Local SentenceTransformers

SentenceTransformers is a Python framework for state-of-the-art sentence, text, and image embeddings. It allows you to compute dense vector representations for sentences, paragraphs, and images that can be used for various tasks such as semantic search, clustering, and classification.

More info at: https://sbert.net


In [2]:
%pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# The sentences to encode
sentences = ['Gestion des stocks', 'Prévision de la demande', 'Transport multimodal']

# 2. Calculate embeddings by calling model.encode()
emb = model.encode(sentences, convert_to_tensor=True)

# 3. Print the embeddings and their similarity
print('Embeddings shape:', emb.shape)

print(f"Similarité [{sentences[0]}]-[{sentences[1]}] :", util.pytorch_cos_sim(emb[0], emb[1]).item())

# pairwise cosine similarities
cosine_scores = util.pytorch_cos_sim(emb, emb)
print('Matrice de similarité:\n', cosine_scores)


Note: you may need to restart the kernel to use updated packages.
Embeddings shape: torch.Size([3, 384])
Similarité [Gestion des stocks]-[Prévision de la demande] : 0.36206135153770447
Matrice de similarité:
 tensor([[1.0000, 0.3621, 0.0111],
        [0.3621, 1.0000, 0.1350],
        [0.0111, 0.1350, 1.0000]])


## 2.2 - Embeddings endpoint with LMStudio

LMStudio provides an API to generate embeddings using various models. Below is an example of how to create a custom embedding model class that interacts with the LMStudio API to fetch embeddings.

You can install LMStudio on your computer by following the instructions at: https://lmstudio.ai

In [3]:
import requests
import numpy as np

api_url="http://localhost:1234/v1/embeddings"

class LMSEmbeddingModel:
    """
    Un remplaçant léger de SentenceTransformer utilisant un serveur LM Studio.
    Compatible : model.encode(texts)
    """

    def __init__(self, model_name, api_url=api_url):
        self.model_name = model_name
        self.api_url = api_url

    def encode(self, texts, convert_to_tensor=False):
        """
        Encode une liste de textes en embeddings.
        - texts : str ou List[str]
        - convert_to_tensor : ignoré, mais ajouté pour compatibilité
        """

        # Si un seul texte → transformer en liste
        if isinstance(texts, str):
            texts = [texts]

        payload = {
            "model": self.model_name,
            "input": texts
        }

        response = requests.post(self.api_url, json=payload)
        response.raise_for_status()
        data = response.json()

        embeddings = [item["embedding"] for item in data["data"]]
        embeddings = np.array(embeddings)

        return embeddings


In [4]:
from sentence_transformers import util

# 1. Load an embedding model from LM Studio
api_url="http://localhost:1234/v1/embeddings"
model = LMSEmbeddingModel("text-embedding-sentence-transformers_all-minilm-l12-v2", api_url=api_url)

sentences = ["Gestion des stocks", "Prévision de la demande", "Transport multimodal"]

# 2. Calculate embeddings by calling model.encode()
emb = model.encode(sentences, convert_to_tensor=True)

# 3. Print the embeddings and their similarity
print("Shape :", emb.shape)

print(f"Similarité [{sentences[0]}]-[{sentences[1]}] :", util.pytorch_cos_sim(emb[0], emb[1]).item())

cosine_scores = util.pytorch_cos_sim(emb, emb)
print("Matrice de similarité :\n", cosine_scores)

Shape : (3, 384)
Similarité [Gestion des stocks]-[Prévision de la demande] : 0.2928846007211735
Matrice de similarité :
 tensor([[ 1.0000,  0.2929, -0.0111],
        [ 0.2929,  1.0000,  0.0086],
        [-0.0111,  0.0086,  1.0000]], dtype=torch.float64)
