### 📖 Where We Are

**In the last notebook**, we built our first complete, end-to-end RAG system using **ChromaDB**. We learned how a vector store is the core of the retrieval process and how to build different types of RAG chains (simple, conversational, and LCEL-based).

**In this notebook**, we'll reinforce those concepts by building another RAG system, but this time with a different vector store: **FAISS (Facebook AI Similarity Search)**. The goal is to see how LangChain's abstractions make it easy to swap out components and to understand the unique characteristics of FAISS, which is renowned for its incredible speed and efficiency.

### 1. Building a RAG System with Langchain and FAISS

### What is FAISS?

**FAISS** is an open-source library developed by Facebook AI for highly efficient similarity search and clustering of dense vectors. It's not a standalone database server like ChromaDB, but rather a powerful toolkit for building search indices.

**Analogy: A High-Speed In-Memory Index ⚡**

- If **ChromaDB** is like a self-contained, persistent local library on your computer, **FAISS** is like a high-speed, in-memory search index used by major search engines. 
- It's designed for pure, blazing-fast performance. You create the index in your application's memory and can then save it to disk to be reloaded later. This makes it incredibly fast for applications where the entire index can fit in RAM.

**Key Advantages:**
- **Extremely Fast**: Optimized for speed, capable of searching billions of vectors.
- **Memory Efficient**: Uses advanced indexing techniques to reduce memory footprint.
- **Scalable**: Can leverage GPU acceleration for even greater performance.

In [1]:
# --- Load Libraries ---
import os
from dotenv import load_dotenv
import warnings
warnings.filterwarnings('ignore')

# LangChain core components for building chains and prompts.
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage

# LangChain specific components for the RAG pipeline.
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
# This is our new vector store for this notebook.
from langchain_community.vectorstores import FAISS

# Load environment variables from a .env file.
load_dotenv()

True

In [2]:
# --- 1 & 2. Document Loading and Splitting ---
# We will manually create Document objects to focus on the FAISS-specific parts.
sample_documents = [
    Document(
        page_content="Artificial Intelligence (AI) is the simulation of human intelligence in machines.",
        metadata={"source": "AI Introduction", "topic": "AI"}
    ),
    Document(
        page_content="Machine Learning is a subset of AI that enables systems to learn from data.",
        metadata={"source": "ML Basics", "topic": "ML"}
    ),
    Document(
        page_content="Deep Learning is a subset of machine learning based on artificial neural networks.",
        metadata={"source": "Deep Learning", "topic": "DL"}
    ),
    Document(
        page_content="Natural Language Processing (NLP) is a branch of AI that helps computers understand human language.",
        metadata={"source": "NLP Overview", "topic": "NLP"}
    )
]

# Since our documents are already small, we can skip the splitting step for this example.
chunks = sample_documents
print(f"Using {len(chunks)} documents as chunks.")

Using 4 documents as chunks.


In [3]:
# --- 3. Embedding Generation ---
# We'll use the same trusted Hugging Face model as before.
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

### 4. Creating and Persisting a FAISS Vector Store

In [4]:
# --- 4. Vector Storage ---
# The `FAISS.from_documents` method is a convenient way to create an in-memory FAISS index.
# It takes our document chunks, uses the embedding model to create vectors, and stores them in the index.
vectorstore = FAISS.from_documents(documents=chunks, embedding=embeddings)
print(f"FAISS vector store created in memory with {vectorstore.index.ntotal} vectors.")

FAISS vector store created in memory with 4 vectors.


Since FAISS is an in-memory index, we need to explicitly save it to disk if we want to use it later without re-creating it. This is a key difference from database-backed stores like ChromaDB which persist automatically.

In [5]:
# The `save_local` method serializes the index and documents and saves them to a folder.
vectorstore.save_local(folder_path="faiss_index")
print("Vector store saved to 'faiss_index' directory.")

Vector store saved to 'faiss_index' directory.


In [6]:
# To use the saved index, we load it back into memory.
loaded_vectorstore = FAISS.load_local(
    folder_path="faiss_index",
    embeddings=embeddings, 
    # This is required to load the index with pickled documents. Only use with trusted sources.
    allow_dangerous_deserialization=True 
)
print(f"Loaded vector store with {loaded_vectorstore.index.ntotal} vectors.")

Loaded vector store with 4 vectors.


### 5. Similarity Search with FAISS

In [7]:
query = "What is deep learning?"

# The `similarity_search` method works just like it did with ChromaDB.
results = loaded_vectorstore.similarity_search(query, k=2)
print(f"Query: {query}\n")
print("Top 2 similar chunks:")
for doc in results:
    print(f"- {doc.page_content} (Source: {doc.metadata['source']})")

Query: What is deep learning?

Top 2 similar chunks:
- Deep Learning is a subset of machine learning based on artificial neural networks. (Source: Deep Learning)
- Machine Learning is a subset of AI that enables systems to learn from data. (Source: ML Basics)


In [8]:
# `similarity_search_with_score` returns the L2 distance. A smaller score is better.
results_with_scores = loaded_vectorstore.similarity_search_with_score(query=query, k=2)
print("\nSimilarity search with scores:")
for doc, score in results_with_scores:
    print(f"- Score: {score:.4f}, Content: {doc.page_content}")


Similarity search with scores:
- Score: 0.5146, Content: Deep Learning is a subset of machine learning based on artificial neural networks.
- Score: 0.9469, Content: Machine Learning is a subset of AI that enables systems to learn from data.


In [9]:
# FAISS also supports metadata filtering, a powerful feature for refining search results.
filter_dict = {"topic": "ML"}
filtered_results = loaded_vectorstore.similarity_search(
    query="Tell me about learning from data",
    k=1,
    filter=filter_dict
)
print("\nFiltered search results (topic=ML):")
print(filtered_results[0].page_content)


Filtered search results (topic=ML):
Machine Learning is a subset of AI that enables systems to learn from data.


### 6. Building RAG Chains with FAISS and LCEL

In [10]:
# Initialize our LLM, in this case, a fast model from Groq.
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
llm = ChatGroq(model="gemma2-9b-it")

In [11]:
# Convert our loaded FAISS vector store into a retriever.
# The retriever is the standard LangChain interface for fetching documents.
retriever = loaded_vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2} # Fetch the top 2 documents
)

#### Simple RAG Chain
This is the most basic RAG chain. It takes a question, retrieves context, and generates an answer. It is stateless.

In [12]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Define our prompt template using LCEL.
simple_prompt = ChatPromptTemplate.from_template("""Answer the question based only on the following context:
Context: {context}
Question: {question}
Answer:""")

# A helper function to format the retrieved documents.
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build the RAG chain using the pipe operator.
simple_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | simple_prompt
    | llm
    | StrOutputParser()
)

print("--- Simple RAG Chain Test ---")
question = "What is the relationship between AI and Deep Learning?"
answer = simple_chain.invoke(question)
print(f"Q: {question}")
print(f"A: {answer}")

--- Simple RAG Chain Test ---
Q: What is the relationship between AI and Deep Learning?
A: Deep Learning is a subset of Artificial Intelligence. 



#### Streaming RAG Chain
LCEL makes streaming incredibly simple. By using the `.stream()` method instead of `.invoke()`, we can get the response back token by token as it's generated.

In [13]:
# The chain for streaming is identical to the simple chain.
streaming_rag_chain = simple_chain

print("--- Streaming RAG Chain Test ---")
question = "What is NLP?"
print(f"Q: {question}")
print("A: ", end="", flush=True)
# We use the .stream() method to get a token-by-token response.
for chunk in streaming_rag_chain.stream(question):
    print(chunk, end="", flush=True)
print()

--- Streaming RAG Chain Test ---
Q: What is NLP?
A: NLP is a branch of AI that helps computers understand human language.  



#### Conversational RAG Chain
This chain maintains a memory of the conversation, allowing it to understand follow-up questions that refer to previous turns in the dialogue.

In [14]:
# This prompt now includes a placeholder for chat history.
conversational_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the provided context to answer questions."),
    # `MessagesPlaceholder` is a special class that formats a list of messages.
    ("placeholder", "{chat_history}"),
    ("human", "Context: {context}\n\nQuestion: {input}"),
])

# This chain is slightly different: it uses `RunnablePassthrough.assign` to dynamically add context.
# The context is retrieved based on the `input` (the user's question) only.
conversational_rag_chain = (
    RunnablePassthrough.assign(
        context=lambda x: format_docs(retriever.invoke(x["input"])))
    | conversational_prompt
    | llm
    | StrOutputParser()
)

In [15]:
print("--- Conversational RAG Example ---")
# We manually manage the chat history as a list of messages.
chat_history = []

# First question
q1 = "What is machine learning?"
print(f"Q1: {q1}")
a1 = conversational_rag_chain.invoke({"input": q1, "chat_history": chat_history})
print(f"A1: {a1}")

# Update the history with the first interaction.
chat_history.extend([HumanMessage(content=q1), AIMessage(content=a1)])

# Follow-up question
q2 = "What is a subset of it?" # 'it' refers to machine learning
print(f"Q2: {q2}")
# The chain uses the chat_history to understand the context of the follow-up question.
a2 = conversational_rag_chain.invoke({"input": q2, "chat_history": chat_history})
print(f"A2: {a2}")

--- Conversational RAG Example ---
Q1: What is machine learning?
A1: Machine learning is a subset of artificial intelligence (AI) that allows systems to learn from data without being explicitly programmed. 

Q2: What is a subset of it?
A2: Based on the context provided, **Deep Learning** is a subset of machine learning. 



### 🔑 Key Takeaways

* **FAISS is for Speed**: FAISS is a high-performance library for vector search, making it an excellent choice for applications requiring low-latency, in-memory retrieval.
* **In-Memory Index**: Unlike database servers, a FAISS index is an in-memory object that you must explicitly `.save_local()` and `.load_local()` to persist and reuse.
* **LangChain's Abstractions Shine**: The `VectorStore` interface in LangChain makes it trivial to switch from ChromaDB to FAISS. The methods for creating the store (`.from_documents`), searching (`.similarity_search`), and creating a retriever (`.as_retriever()`) are consistent.
* **LCEL is Universal**: The same LCEL chain structure we used with ChromaDB works perfectly with a FAISS retriever, demonstrating the power and flexibility of building various RAG chains, including simple, conversational, and streaming pipelines.