### Embedded Technques using Hugging Face

In [6]:
import os
from dotenv import load_dotenv

# Load variables from .env file
load_dotenv()

# Access the Hugging Face token
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")


In [7]:
from langchain.embeddings import HuggingFaceEmbeddings

# Initialize embedding model (runs locally using sentence-transformers)
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)


  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


### Texts to embedded

In [9]:
# Texts to embed
texts = [
    "LangChain is a framework for building LLM-powered apps.",
    "Mount Everest is the tallest mountain."
]

# Generate embeddings
vectors = embedding_model.embed_documents(texts)

# Preview
for i, vec in enumerate(vectors):
    print(f"\n--- Embedding {i+1} ---")
    print(f"Length: {len(vec)}")
    print(f"Preview: {vec[:5]}...")

len(vectors)


--- Embedding 1 ---
Length: 384
Preview: [-0.029253385961055756, -0.03157414123415947, 0.008384971879422665, -0.09047165513038635, 0.020469117909669876]...

--- Embedding 2 ---
Length: 384
Preview: [-0.023237472400069237, 0.06839925050735474, -0.05747893825173378, 0.010138127021491528, -0.061703842133283615]...


2

### FAISS
FAISS stands for Facebook AI Similarity Search.
It is a high-performance library developed by Meta (Facebook AI Research) to:<br>
Store high-dimensional vectors (like text embeddings)<br>
Efficiently search for the most similar ones — even across millions of vectors<br>

FAISS is a vector store that makes it possible to search documents by meaning, not just by words.<br> It’s a key building block for apps using LLMs + retrieval.

In [12]:
from langchain_community.vectorstores import FAISS
from langchain.schema import Document

# 2. Convert texts to LangChain Documents
docs = [Document(page_content=text) for text in texts]



In [13]:
# 4. Build FAISS index from documents
vector_store = FAISS.from_documents(docs, embedding_model)

In [14]:
# 5. Perform a similarity search
query = "What is the tallest mountain?"
results = vector_store.similarity_search(query, k=1)

# 6. Display the result
print("\nTop Match:")
print(results[0].page_content)


Top Match:
Mount Everest is the tallest mountain.


### How FAISS Works in LangChain ??<br> 
🔤 Input text → turned into embeddings using a model (e.g. HuggingFace, OpenAI)

💾 Store those embeddings in FAISS

🔍 Query with another embedding, and FAISS returns the closest matches

### Real-World Example
Imagine you build a chatbot that answers questions over company docs:

You split your documents into chunks

You embed each chunk and save in FAISS

When the user asks, "What’s our refund policy?", LangChain:

Embeds the query

Uses FAISS to find the most relevant document chunks

Feeds those chunks to the LLM to generate an accurate answer

### Additional reading
https://python.langchain.com/docs/integrations/vectorstores/faiss/