#Building a Retrieval-Augmented Generation (RAG) System with LangChain

**Introduction**

Retrieval-Augmented Generation (RAG) is an advanced AI approach that combines retrieval mechanisms with generative AI models. It enhances the accuracy and relevance of AI-generated responses by incorporating external data sources into the response generation process. In this blog, we will walk through the implementation of a RAG-based system using LangChain and OpenAI’s GPT models.

**Pre-requisite**

Before diving into the implementation, ensure you have the following dependencies installed:

In [None]:
pip install langchain langchain-community langchain-openai faiss-cpu

Additionally, store your OpenAI API key and Langchain API Key in a .env file for secure access.

**Setting Up Environment Variables**

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")

**Setting Up the LLM**

We first initialize an OpenAI-based language model:

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="o1-mini")
print(llm)

We can now invoke this model to generate responses:

In [None]:
result = llm.invoke("What is agentic AI?")
print(result.content)

**Creating a Chat Prompt**

To make interactions more structured, we define a prompt template:

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an expert AI Engineer. Provide an answer based on the question."),
        ("user", "{input}")
    ]
)

This template ensures that the AI model adheres to a specific role while generating responses. Next, we create a chain that connects the prompt and the language model:

In [None]:
llm = ChatOpenAI(model="gpt-4o")
chain = prompt | llm
response = chain.invoke({"input": "Can you tell me about Langsmith?"})
print(response.content)

**Implementing RAG**

RAG combines retrieval mechanisms with generative AI to improve response accuracy. We begin by loading external documents:

In [None]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://python.langchain.com/docs/tutorials/llm_chain/")
document = loader.load()

**Splitting the Document into Chunks**

To efficiently process large documents, we split them into manageable chunks:

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(document)

**Creating a Vector Store**

We then create vector embeddings for the document chunks:

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

**Implementing a Retriever**

A retriever fetches relevant document chunks based on a user query:

In [None]:
retriever = vectorstore.as_retriever()

**Creating the Retrieval Chain**

We integrate the retriever into our chain:

In [None]:
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, chain)

**Running the RAG Pipeline**

Now, we can input a query and retrieve an answer:

In [None]:
result = retrieval_chain.invoke({"input": "Note that ChatModels receive message objects as input"})
print(result['answer'])

**Conclusion**

By implementing this RAG-based system, we enhance the response quality of LLMs by integrating document retrieval with generative AI. This approach is particularly useful for applications that require accurate, context-aware responses, such as customer support chatbots, research assistants, and AI-driven search engines.