# Embeddings

Embeddings models are neural-based representations of text where words, phrases, or entire documents are mapped into dense vector spaces. These vectors capture semantic meaning, so similar meanings have similar vector representations.

✅ RAG-based chatbot and intent classification, Hugging Face helps with:

- Access to powerful pretrained embedding models
- Easy use of custom fine-tuned LLMs
- Building retrievers and rerankers
- Hosting your own model or dataset

🧠 What Are Embeddings?
Embeddings are vector representations of text. They capture meaning so that similar texts have similar vectors, making them great for:

- Search (RAG)

- Intent classification

- Clustering

- Semantic matching

In [1]:
from sentence_transformers import SentenceTransformer

# Load a multilingual embedding model from HuggingFace Model Hub
model = SentenceTransformer("intfloat/multilingual-e5-base")

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Encode a sentence into an embedding vector
embedding = model.encode("Japan visa requirements for Filipino travelers")
print(embedding.shape)  # Typically (768,) or (384,) depending on model

  return forward_call(*args, **kwargs)


(768,)


In [5]:
model.get_max_seq_length()

512

In [3]:
from sklearn.metrics.pairwise import cosine_similarity

query = model.encode("How to apply for visa?")
documents = model.encode(["Japan visa application", "Flight booking info", "Visa denied reason"])

scores = cosine_similarity([query], documents)
print(scores)  # Which one is most similar

[[0.8928432 0.8220366 0.8586167]]
