# Langchain RAG Course

source: `https://blog.futuresmart.ai/langchain-rag-from-basics-to-production-ready-rag-chatbot#heading-the-process-typically-involves`

## 1. Introduction
### 1.1 What is Retrieval Augmented Generation (RAG)?

RAG is a technique that enhances language models by combining them with a retrieval system. It allows the model to access and utilize external knowledge when generating responses.

#### The process typically involves:
* Indexing a large corpus of documents
* Retrieving relevant information based on the input query
* Using the retrieved information to augment the prompt sent to the language model

### 1.2 What is Langchain?
Langchain is a framework for developing applications powered by language models. It provides a set of tools and abstractions that make it easier to build complex AI applications. Key features include:

* Modular components for common LLM tasks
* Built-in support for various LLM providers
* Tools for document loading, text splitting, and vector storage
* Abstractions for building conversational agents and question-answering systems

## 2. LangChain Components and Expression Language (LCEL)
LangChain Expression Language (LCEL) is a key feature that makes working with LangChain components flexible and powerful. Let's explore how LCEL is used with various components:

### 2.1. Large Language Model (LLM)
LCEL allows direct invocation of the LLM:

In [None]:
from openai import api_key
!pip install -qU "langchain[google-genai]"

In [None]:
!pip install dotenv

In [None]:
from dotenv import load_dotenv
import os

load_dotenv()

gemini_key = os.getenv('GOOGLE_API_KEY')

In [None]:
from langchain.chat_models import init_chat_model

model = init_chat_model("gemini-2.0-flash", model_provider="google_genai", api_key = gemini_key)

In [None]:
model.invoke('How are you?')

### 2.2 Output Parsers
LCEL lets us chain the output parser directly to the LLM:

In [None]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()
chain = model | output_parser
llm_response = chain.invoke("Tell me a joke")
print(llm_response)

### 3.3 Structured Output
LCEL allows us to create structured output chains:

In [None]:
from typing import List
from pydantic import BaseModel, Field

In [None]:
class MobileReview(BaseModel):
        phone_model: str = Field(description="Name and model of the phone")
        rating: float = Field(description="Overall rating out of 5")
        pros: List[str] = Field(description="List of positive aspects")
        cons: List[str] = Field(description="List of negative aspects")
        summary: str = Field(description="Brief summary of the review")

review_text = """
    Just got my hands on the new Galaxy S21 and wow, this thing is slick! The screen is gorgeous,
    colors pop like crazy. Camera's insane too, especially at night - my Insta game's never been
    stronger. Battery life's solid, lasts me all day no problem.
    Not gonna lie though, it's pretty pricey. And what's with ditching the charger? C'mon Samsung.
    Also, still getting used to the new button layout, keep hitting Bixby by mistake.
    Overall, I'd say it's a solid 4 out of 5. Great phone, but a few annoying quirks keep it from
    being perfect. If you're due for an upgrade, definitely worth checking out!
    """

structured_llm = model.with_structured_output(MobileReview)
output = structured_llm.invoke(review_text)
print(output)
print(output.pros)

### 2.4 Prompt Templates
LCEL shines when working with prompt templates, allowing easy chaining:

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Tell me a short joke about {topic}")
chain = prompt | model | output_parser
result = chain.invoke({"topic": "programming"})

In [None]:
print(result)

### 2.5 LLM Messages

LCEL allows flexible message composition:

In [None]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
        SystemMessage(content="You are a helpful assistant that tells jokes."),
        HumanMessage(content="Tell me about programming")
    ]

response = model.invoke(messages)
print(response)

In [None]:
template = ChatPromptTemplate([
        ("system", "You are a helpful assistant that tells jokes."),
        ("human", "Tell me about {topic}")
    ])
chain = template | model
template = ChatPromptTemplate([
        ("system", "You are a helpful assistant that tells jokes."),
        ("human", "Tell me about {topic}")
    ])
chain = template | model
response = chain.invoke({"topic": "programming"})
print(response)

## 3. Document Processing for RAG Systems
After setting up our LangChain components, the next crucial step in building a RAG system is processing our documents. This involves loading the documents and splitting them into manageable chunks.

### 3.1 Loading Documents
We start by loading documents from various file types:

In [None]:
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing import List
from langchain_core.documents import Document
import os

In [None]:
def load_documents(folder_path: str) -> List[Document]:
    documents = []
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        if filename.endswith('.pdf'):
            loader = PyPDFLoader(file_path)
        elif filename.endswith('.docx'):
            loader = Docx2txtLoader(file_path)
        else:
            print(f"Unsupported file type: {filename}")
            continue
        documents.extend(loader.load())
    return documents

In [None]:
folder_path = "./examples"
documents = load_documents(folder_path=folder_path)

In [None]:
print(f"Loaded {len(documents)} documents from the folder.")

### 3.2 Splitting documents

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    length_function=len
)

splits = text_splitter.split_documents(documents)
print(f"Split the documents into {len(splits)} chunks.")

In [None]:
print(documents[0])

In [None]:
print(splits[1])


In [None]:
print(splits[0].metadata)

This metadata helps us keep track of where each piece of information came from, which can be crucial when the AI is using this information to answer questions.

By processing our documents in this way, we're preparing the groundwork for our RAG system to efficiently retrieve relevant information when answering queries.

## 4. Creating Embeddings for RAG Systems

After processing our documents, the next crucial step is to create embeddings. Embeddings are vector representations of our text chunks that allow for efficient similarity search, which is key to the retrieval part of our RAG system.

### 4.1 Using OpenAI HuggingFaceEmbeddings


In [75]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

document_embeddings = embeddings.embed_documents([split.page_content for split in splits])

print(f"Created embeddings for {len(document_embeddings)} document chunks.")

  embeddings = HuggingFaceEmbeddings(
  from .autonotebook import tqdm as notebook_tqdm
  return forward_call(*args, **kwargs)


Created embeddings for 6 document chunks.


### 4.2 Using SentenceTransformer

In [76]:
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
document_embeddings = embedding_function.embed_documents([split.page_content for split in splits])
print(document_embeddings[0][:5])

[-0.0812104195356369, -0.05342952534556389, -0.044474635273218155, 0.035381168127059937, -0.053680337965488434]


This method uses a pre-trained SentenceTransformer model to create embeddings locally, which can be faster and doesn't require API calls.

The embeddings we've created are dense vector representations of our text chunks. Each vector typically has hundreds of dimensions (though we're only printing the first 5 here). These vectors capture semantic meaning, allowing us to find similar chunks of text by comparing their embeddings.

In our RAG system, these embeddings will be crucial for quickly finding relevant information when answering queries. When a user asks a question, we'll create an embedding for that question and then find the most similar document chunks by comparing embeddings. This allows us to retrieve the most relevant information from our document collection efficiently.

The next step will be to store these embeddings in a vector store, which will allow for fast similarity search during the retrieval phase of our RAG system.

## 5. Setting Up the Vector Store for RAG Systems

Now that we have our document embeddings, we need a way to store and efficiently search through them. This is where a vector store comes in. We'll use Chroma, a popular vector store that integrates well with LangChain.

In [83]:
from langchain_chroma import Chroma

collection_name = "my_collection"
vectorstore = Chroma.from_documents(
    collection_name=collection_name,
    documents=splits,
    embedding=embedding_function,
    persist_directory="./chroma_db"
)
print("Vector store created and persisted to './chroma_db'")


Vector store created and persisted to './chroma_db'


This code creates a Chroma vector store from our document splits. It uses the embedding function we defined earlier to create embeddings for each document chunk. The vector store is then persisted to disk, allowing us to reuse it in future sessions without recomputing the embeddings.

### 5.1 Performing Similarity Search
Now that our vector store is set up, we can perform similarity searches. This is a key component of the retrieval process in our RAG system:

In [85]:
query = "When did Octavian Marina work?"
search_results = vectorstore.similarity_search(query, k=2)
print(f"\nTop 2 most relevant chunks for the query: '{query}'\n")
for i, result in enumerate(search_results, 1):
    print(f"Result {i}:")
    print(f"Source: {result.metadata.get('source', 'Unknown')}")
    print(f"Content: {result.page_content}")
    print()


Top 2 most relevant chunks for the query: 'When did Octavian Marina work?'

Result 1:
Source: ./examples/Resume.pdf
Content: 0740853147
Cluj-Napoca, Romania
octamarina@gmail.com
Octavian Marina
Data Scientist
octavianmarina.com
github.com/OctaMarina
linkedin.com/egorhowell
TECHNICAL SKILLS
Languages Python, SQL, Java, JavaScript
Tech Stack Git, Bash/Zsh, Snowflake, AWS, Jupyter Notebook (Anaconda), LangChain, Flask, HuggingFace
EXPERIENCE
Software Engineer March 2023 — Present
Info World Cluj-Napoca, Romania

Result 2:
Source: ./examples/Resume.pdf
Content: EXPERIENCE
Software Engineer March 2023 — Present
Info World Cluj-Napoca, Romania
• Contributed to DICOM-compliant medical imaging software in the PACS suite, enabling efficient storage, retrieval,
and visualization of radiological data.
• Deployed an abdominal organ segmentation model for CT imaging within the web-based PACS platform, directly
supporting the successful outcome of a competitive bidding process.



This similarity search finds the most relevant document chunks based on our query. The vector store compares the embedding of our query with the embeddings of all document chunks, returning the most similar ones.

### 5.2 Creating a Retriever
We can also create a retriever from our vector store, which will be useful when we build our full RAG chain:

In [86]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
retriever_results = retriever.invoke("When did Octavian start working?")
print(retriever_results)

[Document(id='8e5590f5-8b77-4d78-a6fc-65ec613c4f30', metadata={'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.26 (TeX Live 2024) kpathsea version 6.4.0', 'keywords': '', 'producer': 'pdfTeX-1.40.26', 'trapped': '/False', 'subject': '', 'author': '', 'total_pages': 1, 'moddate': '2025-07-27T19:16:52+00:00', 'creator': 'LaTeX with hyperref', 'title': '', 'page_label': '', 'page': 0, 'source': './examples/Resume.pdf', 'creationdate': '2025-07-27T19:16:52+00:00'}, page_content='0740853147\nCluj-Napoca, Romania\noctamarina@gmail.com\nOctavian Marina\nData Scientist\noctavianmarina.com\ngithub.com/OctaMarina\nlinkedin.com/egorhowell\nTECHNICAL SKILLS\nLanguages Python, SQL, Java, JavaScript\nTech Stack Git, Bash/Zsh, Snowflake, AWS, Jupyter Notebook (Anaconda), LangChain, Flask, HuggingFace\nEXPERIENCE\nSoftware Engineer March 2023 — Present\nInfo World Cluj-Napoca, Romania'), Document(id='a2c73346-5709-46f4-bf60-bb5cedf2048e', metadata={'creator': 'LaTeX with hyperref', 'p

  return forward_call(*args, **kwargs)


## 6. Building the RAG Chain

In [87]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [88]:
template = """Answer the question based only on the following context:
{context}
Question: {question}
Answer: """

In [89]:
prompt = ChatPromptTemplate.from_template(template)

In [90]:
def docs2str(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [91]:
rag_chain = (
    {"context": retriever | docs2str, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

### Using the RAG Chain

In [92]:
question = "Where do Octavian work?"
response = rag_chain.invoke(question)
print(f"Question: {question}")
print(f"Answer: {response}")

  return forward_call(*args, **kwargs)


Question: Where do Octavian work?
Answer: Info World


## 7. Handling Follow-Up Questions

To make our RAG system more conversational, we need to handle follow-up questions effectively. This involves creating a history-aware retriever that can understand context from previous interactions.

### 7.1 Creating a History-Aware Retriever
First, let's set up the components for our history-aware retriever:

In [101]:
from langchain_core.prompts import MessagesPlaceholder
from langchain.chains import create_history_aware_retriever
from langchain.chains.combine_documents import create_stuff_documents_chain

In [102]:
contextualize_q_system_prompt = """
Given a chat history and the latest user question
which might reference context in the chat history,
formulate a standalone question which can be understood
without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is.
"""

In [103]:
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [105]:
contextualize_chain = contextualize_q_prompt | model | StrOutputParser()
print(contextualize_chain.invoke({"input": "Where is it headquartered?", "chat_history": []}))

Where is the company in question headquartered?


In [98]:
from langchain.chains.retrieval import create_retrieval_chain

history_aware_retriever = create_history_aware_retriever(
    model, retriever, contextualize_q_prompt
)

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant. Use the following context to answer the user's question."),
    ("system", "Context: {context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

question_answer_chain = create_stuff_documents_chain(model, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


In [100]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = []
question1 = "Where do Octavian work?"
answer1 = rag_chain.invoke({"input": question1, "chat_history": chat_history})['answer']
chat_history.extend([
    HumanMessage(content=question1),
    AIMessage(content=answer1)
])

print(f"Human: {question1}")
print(f"AI: {answer1}\n")

question2 = "What skills does he use in work?"
answer2 = rag_chain.invoke({"input": question2, "chat_history": chat_history})['answer']
chat_history.extend([
    HumanMessage(content=question2),
    AIMessage(content=answer2)
])

print(f"Human: {question2}")
print(f"AI: {answer2}")


  return forward_call(*args, **kwargs)


Human: Where do Octavian work?
AI: Octavian works as a Software Engineer at Info World in Cluj-Napoca, Romania.



  return forward_call(*args, **kwargs)


Human: What skills does he use in work?
AI: While the document doesn't explicitly state which skills Octavian uses at work, it lists the following technical skills:

*   **Languages:** Python, SQL, Java, JavaScript
*   **Tech Stack:** Git, Bash/Zsh, Snowflake, AWS, Jupyter Notebook (Anaconda), LangChain, Flask, HuggingFace


## 8. Building a Multi-User Chatbot with SQLite Storage
To make our RAG system more practical for real-world applications, we'll create a multi-user chatbot that stores conversation history in an SQLite database. This allows for persistent storage and retrieval of chat history across sessions.

### 8.1 Setting Up the SQLite Database
First, let's set up our SQLite database and create the necessary functions for logging:

In [107]:
import sqlite3
from datetime import datetime
import uuid

In [108]:
DB_NAME = "rag_app.db"

In [109]:
def get_db_connection():
    conn = sqlite3.connect(DB_NAME)
    conn.row_factory = sqlite3.Row
    return conn

In [110]:
def create_application_logs():
    conn = get_db_connection()
    conn.execute('''CREATE TABLE IF NOT EXISTS application_logs
    (id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT,
    user_query TEXT,
    gpt_response TEXT,
    model TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)''')
    conn.close()

In [111]:
def insert_application_logs(session_id, user_query, gpt_response, model):
    conn = get_db_connection()
    conn.execute('INSERT INTO application_logs (session_id, user_query, gpt_response, model) VALUES (?, ?, ?, ?)',
                 (session_id, user_query, gpt_response, model))
    conn.commit()
    conn.close()

In [112]:
def get_chat_history(session_id):
    conn = get_db_connection()
    cursor = conn.cursor()
    cursor.execute('SELECT user_query, gpt_response FROM application_logs WHERE session_id = ? ORDER BY created_at', (session_id,))
    messages = []
    for row in cursor.fetchall():
        messages.extend([
            {"role": "human", "content": row['user_query']},
            {"role": "ai", "content": row['gpt_response']}
        ])
    conn.close()
    return messages