### Text Embedding Search with Vector Store and Retriever

This program creates text embeddings to represent words as numbers. It stores the embeddings in a vector store for easy access. Using a vector store retriever, it finds related words based on their meanings. This demonstrates how AI can understand and compare text efficiently.

- Text Embeddings: Words like "car" and "bicycle" are turned into numbers using a model.
- Vector Store: These numbers are saved for future searches.
- Vector Store Retriever: Finds the closest matches to a query like "automobile."
- Purpose: Shows how AI connects related words using embeddings and vector search.

In [3]:
# Install the library needed to turn text into numbers (embeddings) for the program.
# !pip install sentence-transformers

In [None]:
# FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors.
# Instal the FAISS library optimized for CPU usage
# !pip install faiss-cpu

In [1]:
# Suppress a specific warning when using Hugging Face models in environments that donâ€™t support symlinks (like on Windows).
import os
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

In [4]:
# Import necessary libraries
from sentence_transformers import SentenceTransformer  # For creating text embeddings
from langchain.vectorstores import FAISS  # To create and manage a vector store
from langchain.schema import Document  # For creating Document objects
from langchain.embeddings import HuggingFaceEmbeddings  # Use HuggingFace for embeddings

# Step 1: Create a text embedding model using HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Sample text data
texts = ["car", "vehicle", "train", "bicycle", "airplane"]  # Words to embed and compare

# Step 2: Convert text to Document objects
documents = [Document(page_content=text) for text in texts]  # Wrap texts in Document objects

# Step 3: Create a vector store using HuggingFaceEmbeddings
vector_store = FAISS.from_documents(documents, embedding_model)  # Create vector store with embeddings

# Step 4: Use a vector store retriever
query = "automobile"  # Input to find related terms
query_embedding = embedding_model.embed_documents([query])  # Create embedding for the query

# Step 5: Use the retriever to get relevant documents using `invoke` (not deprecated)
retriever = vector_store.as_retriever()  # Initialize retriever from the vector store
results = retriever.invoke(query)  # Use invoke to find matches (replaces get_relevant_documents)

# Step 6: Display results
print(f"Query: {query}")
print("Top matches:")
for result in results:
    print(result.page_content)  # Print the matching document content

Query: automobile
Top matches:
car
vehicle
bicycle
airplane
