# Vector Embeddings with OpenAI - Beginner's Guide

## What are Vector Embeddings?
Think of embeddings as a way to convert words and sentences into numbers that computers can understand. Just like how we might describe a movie with ratings (comedy: 8/10, action: 3/10), embeddings describe text with many numbers.

## Why do we need them?
- **Find similar content**: Discover documents that talk about similar topics
- **Smart search**: Search by meaning, not just matching exact words
- **Organize information**: Group related content automatically

## What you'll learn in this tutorial:
1. How to convert text into numbers (embeddings)
2. How to store these numbers efficiently
3. How to find similar documents
4. How to build a simple Q&A system that can answer questions from your documents

## Setup and Installation

First, we need to import the tools we'll use. Think of this like getting all your ingredients ready before cooking.

In [1]:
# Install required packages (uncomment if needed)
# !pip install langchain langchain-openai langchain-community
# !pip install chromadb faiss-cpu
# !pip install python-dotenv

from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_community.vectorstores import Chroma, FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
import os
from dotenv import load_dotenv

load_dotenv()

True

## 1. Creating Your First Embedding

**What's happening here?**
We're going to take a sentence and convert it into a list of numbers. Each sentence gets its own unique set of numbers, kind of like a fingerprint!

**Why 3072 numbers?**
The AI model we're using describes each sentence with 3072 different measurements. More numbers = more detailed description.

In [None]:
# Step 1: Set up the embedding generator
# This connects to Azure OpenAI to convert text into numbers
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME")# The AI model that creates embeddings
)

# Step 2: Convert one sentence into numbers
text = "LangChain is a framework for developing applications powered by language models."
embedding_vector = embeddings.embed_query(text)

# Step 3: See the results
print(f"Text: {text}")
print(f"Embedding dimension: {len(embedding_vector)}")  # How many numbers describe this sentence
print(f"First 10 values: {embedding_vector[:10]}")  # Just showing the first 10 numbers

Text: LangChain is a framework for developing applications powered by language models.
Embedding dimension: 3072
First 10 values: [-0.04118821769952774, -0.011433425359427929, -0.03338002413511276, 0.0036600905004888773, -0.03455125540494919, -0.01826559379696846, -0.011649545282125473, 0.02858356386423111, -0.022155746817588806, 0.009781155735254288]


## 2. Converting Multiple Sentences

Now let's convert several sentences at once. This is useful when you have many documents to process.

In [3]:
# Create sample documents
texts = [
    "Python is a high-level programming language known for its simplicity.",
    "Machine learning is a subset of artificial intelligence.",
    "LangChain provides tools for building LLM applications.",
    "Vector databases store embeddings for efficient similarity search.",
    "Azure OpenAI Service provides access to powerful language models."
]

# Generate embeddings for multiple documents
document_embeddings = embeddings.embed_documents(texts)

print(f"Number of documents: {len(document_embeddings)}")
print(f"Embedding dimension: {len(document_embeddings[0])}")
print(f"\nFirst document: {texts[0]}")
print(f"Its embedding (first 5 values): {document_embeddings[0][:5]}")

Number of documents: 5
Embedding dimension: 3072

First document: Python is a high-level programming language known for its simplicity.
Its embedding (first 5 values): [-0.031239548698067665, -0.02715788409113884, -0.012222188524901867, 0.02540208399295807, -0.0032550697214901447]


## 3. Storing Embeddings - Meet FAISS

**What is FAISS?**
FAISS is like a super-organized filing cabinet for your embeddings. It stores them in a way that makes finding similar documents really fast.

**Why use it?**
Instead of comparing your search with every single document (which is slow), FAISS finds similar documents almost instantly!

In [4]:
# Step 1: Create a list of sentences (our mini-document collection)
documents = [
    "The quick brown fox jumps over the lazy dog.",
    "Machine learning algorithms learn patterns from data.",
    "Python is widely used in data science and AI.",
    "Natural language processing helps computers understand human language.",
    "Deep learning uses neural networks with multiple layers.",
    "Cloud computing provides on-demand computing resources.",
    "Azure is Microsoft's cloud platform for building and deploying applications."
]

# Step 2: Convert all sentences to embeddings and store them in FAISS
# FAISS will organize them for fast searching
vectorstore_faiss = FAISS.from_texts(documents, embeddings)

# Step 3: Confirm it worked
print("FAISS vector store created successfully!")
print(f"Number of documents in store: {vectorstore_faiss.index.ntotal}")

FAISS vector store created successfully!
Number of documents in store: 7


## 4. Finding Similar Documents

**The Magic Moment!**
Now we can search for documents by meaning, not just exact words. 

For example: If you search "AI learning", it will find documents about "machine learning" even though the exact words don't match!

In [5]:
# Step 1: Write your search question
query = "Tell me about artificial intelligence and learning from data"

# Step 2: Find the 3 most similar documents
# k=3 means "give me the top 3 results"
results = vectorstore_faiss.similarity_search(query, k=3)

# Step 3: Show the results
print(f"Query: {query}\n")
print("Top 3 most similar documents:")
for i, doc in enumerate(results, 1):
    print(f"\n{i}. {doc.page_content}")

Query: Tell me about artificial intelligence and learning from data

Top 3 most similar documents:

1. Machine learning algorithms learn patterns from data.

2. Python is widely used in data science and AI.

3. Deep learning uses neural networks with multiple layers.


## 5. Understanding Similarity Scores

**What are these scores?**
Lower score = More similar (closer match)
Higher score = Less similar (further apart)

Think of it like distance: The closer two things are, the smaller the distance between them!

In [6]:
# Similarity search with scores (lower score = more similar)
results_with_scores = vectorstore_faiss.similarity_search_with_score(query, k=3)

print(f"Query: {query}\n")
print("Results with similarity scores:")
for i, (doc, score) in enumerate(results_with_scores, 1):
    print(f"\n{i}. Score: {score:.4f}")
    print(f"   Document: {doc.page_content}")

Query: Tell me about artificial intelligence and learning from data

Results with similarity scores:

1. Score: 0.9004
   Document: Machine learning algorithms learn patterns from data.

2. Score: 1.1342
   Document: Python is widely used in data science and AI.

3. Score: 1.2108
   Document: Deep learning uses neural networks with multiple layers.


## 6. Another Storage Option - Chroma

**What's different about Chroma?**
Chroma is another way to store embeddings, but it lets you add extra information (metadata) to each document.

**Example**: You can tag documents as "beginner", "intermediate", or "advanced" and then search only within that category!

In [7]:
# Create Chroma vector store with metadata
docs_with_metadata = [
    Document(page_content="Python is great for data science.", metadata={"topic": "programming", "level": "beginner"}),
    Document(page_content="Machine learning requires understanding of statistics.", metadata={"topic": "ML", "level": "intermediate"}),
    Document(page_content="Deep neural networks power modern AI.", metadata={"topic": "AI", "level": "advanced"}),
    Document(page_content="LangChain simplifies LLM application development.", metadata={"topic": "tools", "level": "intermediate"}),
]

# Create Chroma vector store
vectorstore_chroma = Chroma.from_documents(
    documents=docs_with_metadata,
    embedding=embeddings,
    collection_name="langchain_tutorial"
)

print("Chroma vector store created successfully!")
print(f"Number of documents: {vectorstore_chroma._collection.count()}")

Chroma vector store created successfully!
Number of documents: 4


## 7. Filtering Your Search

**Why is this useful?**
Imagine you have 1000 documents. You can search only within documents tagged as "intermediate level" instead of searching through everything!

It's like filtering products by price range when shopping online.

In [8]:
# Search with metadata filtering
query = "artificial intelligence"

# Filter for intermediate level documents only
results = vectorstore_chroma.similarity_search(
    query,
    k=2,
    filter={"level": "intermediate"}
)

print(f"Query: {query}")
print("Filtered results (intermediate level only):\n")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Metadata: {doc.metadata}\n")

Query: artificial intelligence
Filtered results (intermediate level only):

1. Machine learning requires understanding of statistics.
   Metadata: {'level': 'intermediate', 'topic': 'ML'}

2. LangChain simplifies LLM application development.
   Metadata: {'topic': 'tools', 'level': 'intermediate'}



## 8. Breaking Long Documents into Pieces

**The Problem**: Documents can be very long, but embeddings work best with shorter pieces of text.

**The Solution**: Split long documents into smaller "chunks" that overlap a little bit.

**Why overlap?**: So we don't lose important information that might be split between chunks!

In [9]:
# Sample long document
long_text = """
LangChain is a framework for developing applications powered by language models. 
It enables applications that are context-aware and can reason about the provided context.

The framework consists of several key components. First, there are LLM wrappers that 
provide a common interface for different language models. Second, prompt templates help 
structure inputs to language models effectively.

LangChain also provides chains, which are sequences of calls to LLMs or other utilities. 
Agents use LLMs to decide which actions to take and in what order. Memory components 
allow chains and agents to remember previous interactions.

Vector stores and embeddings enable semantic search capabilities. This is crucial for 
building retrieval augmented generation (RAG) systems that can access and use external 
knowledge sources to answer questions more accurately.
"""

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split the text
chunks = text_splitter.split_text(long_text)

print(f"Original text length: {len(long_text)} characters")
print(f"Number of chunks: {len(chunks)}")
print("\nChunks:")
for i, chunk in enumerate(chunks, 1):
    print(f"\nChunk {i} ({len(chunk)} chars):")
    print(chunk)

Original text length: 865 characters
Number of chunks: 7

Chunks:

Chunk 1 (171 chars):
LangChain is a framework for developing applications powered by language models. 
It enables applications that are context-aware and can reason about the provided context.

Chunk 2 (173 chars):
The framework consists of several key components. First, there are LLM wrappers that 
provide a common interface for different language models. Second, prompt templates help

Chunk 3 (48 chars):
structure inputs to language models effectively.

Chunk 4 (174 chars):
LangChain also provides chains, which are sequences of calls to LLMs or other utilities. 
Agents use LLMs to decide which actions to take and in what order. Memory components

Chunk 5 (58 chars):
allow chains and agents to remember previous interactions.

Chunk 6 (173 chars):
Vector stores and embeddings enable semantic search capabilities. This is crucial for 
building retrieval augmented generation (RAG) systems that can access and use external



## 9. Building a Q&A System (RAG)

**What is RAG?**
RAG stands for "Retrieval Augmented Generation". That's a fancy way of saying:
1. **Retrieval**: Find relevant documents from your collection
2. **Generation**: Use AI to write an answer based on those documents

**Think of it like**: 
- You ask a question
- The system finds relevant information in your documents
- AI reads that information and gives you an answer

Let's create our knowledge base first!

In [10]:
# Create knowledge base
knowledge_base = [
    "LangChain was created by Harrison Chase in 2022.",
    "LangChain supports multiple LLM providers including OpenAI, Azure, and Anthropic.",
    "The main components of LangChain are: Models, Prompts, Chains, Agents, and Memory.",
    "Vector stores in LangChain help with semantic search and retrieval.",
    "LangChain Expression Language (LCEL) allows you to build chains using the pipe operator.",
    "Agents in LangChain can use tools to interact with external systems.",
    "Memory components allow chatbots to remember conversation history.",
]

# Create vector store from knowledge base
knowledge_vectorstore = FAISS.from_texts(knowledge_base, embeddings)

print("Knowledge base created successfully!")
print(f"Documents in knowledge base: {len(knowledge_base)}")

Knowledge base created successfully!
Documents in knowledge base: 7


## 10. Creating the Q&A Chain

**What's a "chain"?**
A chain is a series of steps that happen in order:
1. User asks a question
2. System finds 3 most relevant documents
3. AI reads those documents
4. AI writes an answer based only on those documents

**Why "only on those documents"?**
This ensures the AI doesn't make up information - it can only use what's in your documents!

In [11]:
# Step 1: Connect to the AI that will write answers
llm = AzureChatOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    temperature=0  # 0 = more focused answers, 1 = more creative answers
)

# Step 2: Create a template for how to ask questions
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer:"""
prompt = ChatPromptTemplate.from_template(template)

# Step 3: Set up the document finder
# It will search for the top 3 most relevant documents
retriever = knowledge_vectorstore.as_retriever(search_kwargs={"k": 3})

# Step 4: Create a helper function to format documents
def format_docs(docs):
    """Combine multiple documents into one text block"""
    return "\n\n".join(doc.page_content for doc in docs)

# Step 5: Build the complete Q&A chain
# This connects: search â†’ format â†’ ask AI â†’ get answer
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Step 6: Try it out with a question!
question = "Who created LangChain and when?"
answer = rag_chain.invoke(question)

print(f"Question: {question}")
print(f"\nAnswer: {answer}")

# Step 7: Show which documents were used to answer
source_docs = retriever.invoke(question)
print(f"\nSource documents used:")
for i, doc in enumerate(source_docs, 1):
    print(f"{i}. {doc.page_content}")

Question: Who created LangChain and when?

Answer: LangChain was created by Harrison Chase in 2022.

Source documents used:
1. LangChain was created by Harrison Chase in 2022.
2. The main components of LangChain are: Models, Prompts, Chains, Agents, and Memory.
3. LangChain supports multiple LLM providers including OpenAI, Azure, and Anthropic.


## 11. Testing with More Questions

Let's try several different questions to see how well our Q&A system works!

In [12]:
# Test with multiple questions
questions = [
    "What are the main components of LangChain?",
    "What is LCEL?",
    "What LLM providers does LangChain support?"
]

for question in questions:
    answer = rag_chain.invoke(question)
    print(f"\nQ: {question}")
    print(f"A: {answer}")
    print("-" * 80)


Q: What are the main components of LangChain?
A: The main components of LangChain are Models, Prompts, Chains, Agents, and Memory.
--------------------------------------------------------------------------------

Q: What is LCEL?
A: LangChain Expression Language (LCEL) is a feature that allows you to build chains using the pipe operator.
--------------------------------------------------------------------------------

Q: What LLM providers does LangChain support?
A: LangChain supports multiple LLM providers including OpenAI, Azure, and Anthropic.
--------------------------------------------------------------------------------


## 12. Advanced: Getting Diverse Results (MMR)

**The Problem**: Sometimes similarity search returns very similar documents (almost duplicates).

**The Solution**: MMR (Maximal Marginal Relevance) finds documents that are:
- Relevant to your question
- Different from each other

**Example**: Instead of getting 3 documents all saying "Python is popular", you might get:
1. Python is popular
2. Python has many libraries
3. Python is used in AI

See the difference below!

In [13]:
# Create retriever with MMR
retriever_mmr = knowledge_vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "fetch_k": 5}
)

# Compare standard similarity search vs MMR
query = "LangChain components and features"

print("=== Standard Similarity Search ===")
standard_results = knowledge_vectorstore.similarity_search(query, k=3)
for i, doc in enumerate(standard_results, 1):
    print(f"{i}. {doc.page_content}")

print("\n=== MMR Search (more diverse results) ===")
mmr_results = retriever_mmr.invoke(query)
for i, doc in enumerate(mmr_results, 1):
    print(f"{i}. {doc.page_content}")

=== Standard Similarity Search ===
1. LangChain supports multiple LLM providers including OpenAI, Azure, and Anthropic.
2. The main components of LangChain are: Models, Prompts, Chains, Agents, and Memory.
3. LangChain was created by Harrison Chase in 2022.

=== MMR Search (more diverse results) ===
1. LangChain supports multiple LLM providers including OpenAI, Azure, and Anthropic.
2. LangChain Expression Language (LCEL) allows you to build chains using the pipe operator.
3. The main components of LangChain are: Models, Prompts, Chains, Agents, and Memory.


## 13. Saving Your Work

**Why save?**
Creating embeddings takes time and costs money (API calls). By saving them, you can reuse them later without recreating everything!

**When to use this?**
- You have a large document collection
- You want to use the same documents multiple times
- You want to share your vector store with others

In [14]:
# Step 1: Save the vector store to your computer
vectorstore_faiss.save_local("faiss_index")
print("FAISS index saved to 'faiss_index' directory")

# Step 2: Load it back (useful when you restart your program)
loaded_vectorstore = FAISS.load_local(
    "faiss_index", 
    embeddings,
    allow_dangerous_deserialization=True  # Required for security
)
print("FAISS index loaded successfully!")

# Step 3: Test that it still works
test_query = "cloud computing"
results = loaded_vectorstore.similarity_search(test_query, k=2)
print(f"\nTest query: {test_query}")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")

FAISS index saved to 'faiss_index' directory
FAISS index loaded successfully!

Test query: cloud computing
1. Cloud computing provides on-demand computing resources.
2. Azure is Microsoft's cloud platform for building and deploying applications.


## ðŸŽ‰ Congratulations! You've Completed the Tutorial!

### What You Learned:

1. **Embeddings** = Converting text to numbers so computers can understand meaning
2. **Vector Stores** (FAISS & Chroma) = Smart storage for fast searching
3. **Similarity Search** = Finding documents by meaning, not just keywords
4. **Metadata** = Adding tags to filter your searches
5. **Text Splitting** = Breaking long documents into digestible pieces
6. **RAG System** = Building a Q&A bot that answers from your documents
7. **MMR** = Getting diverse, non-repetitive results
8. **Persistence** = Saving your work for later

### ðŸš€ What's Next?

**Easy Next Steps:**
- Try with your own text documents
- Experiment with different chunk sizes (try 100, 300, 500)
- Add more documents to your knowledge base

**Intermediate Challenges:**
- Build a Q&A system for a PDF document
- Create a chatbot that remembers conversation history
- Try different embedding models

**Advanced Ideas:**
- Build a search engine for your personal notes
- Create a recommendation system
- Combine multiple data sources

### ðŸ’¡ Key Takeaway
You now know how to build AI applications that can understand and search through text by meaning, not just keywords. This is the foundation of modern AI search systems!

**Questions?** Review the earlier sections or try running the code again with different examples!