- This code has written by Ahmadreza Attarpour (a.attarpour@mail.utoronto.ca)
- It's a practice to check how we can use LangChain/LangGraph to build RAG

-------------------------------------

# RAG

- Retrieval-Augmented Generation (RAG) is a method that improves AI responses by combining a language model with a search system. Instead of relying only on what the model "knows," RAG looks up relevant information (from documents, websites, or databases) and uses it to give more accurate and current answers. In LangChain, LCEL (LangChain Expression Language) helps connect these steps in a flexible and modular way.

- LCEL is designed to streamline the process of building useful apps with LLMs and combining related components.

- Langgraph, built on top of LCEL, allows for performant orchestrations fo application components while maintaining concise and readable code. 

User Question

     ↓

[ Retriever → Fetch Relevant Info ]

     ↓

[ LLM → Use Retrieved Info to Generate Answer ]

     ↓

AI Response


- RAG operates in one single direction; there is no loop or thinking and that's its disadvantage.

# Simplified I/O Flow:

**Input:** "What is llama3?"

⬇️

1. Convert question to embedding
2. Retrieve top-4 similar chunks
3. Plug into prompt:
   - context = [relevant chunk 1...4]
   - question = original question
4. Feed to LLM

⬇️

**Output:** Final answer generated using retrieved context



In [5]:
import os

In [6]:
# load the environment variables from the .env file
from dotenv import load_dotenv
load_dotenv()

True

In [7]:
# Read the variable from the environment and set it as a env variable
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")
LANGCHAIN_PROJECT = os.getenv("LANGCHAIN_PROJECT")
if GOOGLE_API_KEY:
    os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
if LANGCHAIN_API_KEY:
    os.environ["LANGCHAIN_API_KEY"] = LANGCHAIN_API_KEY
if LANGCHAIN_PROJECT:
    os.environ["LANGCHAIN_PROJECT"] = LANGCHAIN_PROJECT

In [17]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader 
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain import PromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser

In [14]:
# load the LLM and embeddings from langchain_google_genai
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

In [None]:
# reading the txt files from the source directory
loader = DirectoryLoader('./data', glob="./llama*.txt", loader_cls=TextLoader)
docs = loader.load()
print(f"Number of documents loaded: {len(docs)}")
print(f"First document content: {docs[0].page_content[:100]}...")  # Print first 100 characters of the first document
# splitting the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len
)
new_docs = text_splitter.split_documents(docs)
doc_strings = [doc.page_content for doc in new_docs]
print(f"Number of documents after splitting: {len(doc_strings)}")
print(f"First document chunk: {doc_strings[0][:100]}...")  # Print first 100 characters of the first chunk

Number of documents loaded: 1
First document content: Llama (Large Language Model Meta AI) is a family of autoregressive large language models released by...
Number of documents after splitting: 39
First document chunk: Llama (Large Language Model Meta AI) is a family...
[Document(metadata={'source': 'data/llama3.txt'}, page_content='Llama (Large Language Model Meta AI) is a family'), Document(metadata={'source': 'data/llama3.txt'}, page_content='a family of autoregressive large language models'), Document(metadata={'source': 'data/llama3.txt'}, page_content='models released by Meta AI starting in February'), Document(metadata={'source': 'data/llama3.txt'}, page_content='February 2023.[2][3] The latest version is Llama'), Document(metadata={'source': 'data/llama3.txt'}, page_content='is Llama 3 released in April 2024.[4]'), Document(metadata={'source': 'data/llama3.txt'}, page_content='Model weights for the first version of Llama were'), Document(metadata={'source': 'data/llama3.txt'},

In [None]:
# creating Retiever using Embeddings and Chroma
# Each chunk is turned into a vector (embedding) using the embeddings model.
# The Chroma vector store saves these embeddings for future retrieval.
# Creates a retriever to fetch top-4 (k=4) most relevant chunks based on similarity to a query.
db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 4})
print(f"Number of documents in the vector store: {len(db)}")

Number of documents in the vector store: 39


In [32]:
# creating prompt template
template = """Use the following pieces of context to answer the question at the end.
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)
print(f"Prompt template created: {prompt.template[:100]}...")  # Print first 100 characters of the prompt template


Prompt template created: Use the following pieces of context to answer the question at the end.
{context}

Question: {questio...


In [33]:
# LCEL
"""Question → [Retriever gets context]  
        → [PromptTemplate fills {context}, {question}]  
        → [LLM answers based on it]  
        → [OutputParser returns the response]"""

retrieval_chain = (

    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()

)

In [34]:
question = "when did llama release for the first time?"
print(f"Question: {question}")
print(retrieval_chain.invoke(question))

Question: when did llama release for the first time?
Based on the provided text, the first release of Llama was in February 2023.


# Below are some hands on utilities

In [None]:
# with Runnable components we can create a chain
def string_to_uppercase(input_string: str) -> str:
    return input_string.upper()

chain = (
    RunnablePassthrough() # does nothing, just passes the input through
    | RunnableLambda(string_to_uppercase) # applies the string_to_uppercase function
)

print(chain.invoke("hello world"))  # Should print "HELLO WORLD"


HELLO WORLD


In [None]:
# we can create a chain with runnableparallel

chain = RunnableParallel({
    "uppercase": RunnableLambda(string_to_uppercase),
    "lowercase": RunnableLambda(lambda x: x.lower())
})
print(chain.invoke("Hello World"))  # Should print {'uppercase': 'HELLO WORLD', 'lowercase': 'hello world'}

{'uppercase': 'HELLO WORLD', 'lowercase': 'hello world'}


In [39]:
# let's write an example that gets a dictionary from the user and summarize it

chain = RunnableParallel({
    "x": RunnablePassthrough(),  # Just passes the input through),
    "email address": lambda x: x['email_address']
})

print(chain.invoke({"email_address": "user@example.com", "name": "John Doe"}))  # Should print {'x': {'email_address': '


{'x': {'email_address': 'user@example.com', 'name': 'John Doe'}, 'email address': 'user@example.com'}


In [None]:
# we can donwload the model from Hugging Face and use it with langchain
"""
from langchain.embeddings import HuggingFaceBgeEmbeddings
model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = ['device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceBgeEmbeddings(
model_name=model_name, model_kwargs=model_kwargs,
encode_kwargs=encode_kwargs)

"""
