# 🚀 Langchain RAG Model 

## 🎯 Our Mission: Building a Smart Q&A System

Welcome, AI explorer! In this notebook, we're embarking on an exciting journey to create a Retrieval-Augmented Generation (RAG) model using Langchain. But what exactly are we building, and why? Let's break it down:

### 🧠 What is RAG?

RAG stands for Retrieval-Augmented Generation. It's a powerful approach that combines:
- 📚 The vast knowledge of large language models
- 🔍 The precision of information retrieval systems

### 🛠️ What We're Building

We're creating a smart question-answering system that can:
1. 📥 Ingest and process web content
2. 🗄️ Store information efficiently in a vector database
3. 🔎 Retrieve relevant information for any given query
4. 🤖 Generate accurate, context-aware responses

### 🌟 Why It Matters

This RAG model will be able to:
- 💡 Answer questions with up-to-date, relevant information
- 🎨 Provide creative, contextual responses (like related poems!)
- 🔄 Suggest intelligent follow-up questions

### 🚀 Let's Get Started!

Throughout this notebook, we'll walk through each step of building our RAG model. By the end, you'll have a powerful, flexible system for answering questions with the added context of a curated knowledge base.

---

---

## 📚 Importing Libraries for Document Processing

In this cell, we're importing essential libraries for loading and processing web documents:

- 🌐 `WebBaseLoader`: Our gateway to fetching web content
- 🍲 `bs4` (BeautifulSoup): Our trusted web scraping companion
- ✂️ `RecursiveCharacterTextSplitter`: Our text chunking expert

---

In [1]:
from langchain_community.document_loaders import WebBaseLoader 
import bs4 # beautifulsoup web scraper library
from langchain.text_splitter import RecursiveCharacterTextSplitter

USER_AGENT environment variable not set, consider setting it to identify your requests.


## 🧠 Importing Embedding and Vector Database Libraries

Here, we're bringing in the tools for creating and managing our knowledge base:

- 🔢 `OpenAIEmbeddings`: Transforming text into numerical representations
- 🗄️ `FAISS`: Our high-performance vector database

---

In [2]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS # vector database

## 🤖 Importing Language Model Libraries

In this cell, we're importing the big guns - our language model interfaces:

- 🖋️ `ChatOpenAI`: Our bridge to OpenAI's powerful chat models
---

In [3]:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

## 🎭 Importing Prompt Template Library

Here, we're bringing in the tools to craft the perfect prompts:

- 📝 `ChatPromptTemplate`: Our secret sauce for guiding AI responses

---

In [4]:
from langchain_core.prompts import ChatPromptTemplate

## 🔧 Importing I/O and Runnable Libraries

Here, we're importing utilities for smooth input/output operations:

- 🔄 `RunnablePassthrough`: Our input processing wizard
- 📊 `StrOutputParser`: Ensuring our outputs are string-perfect

---

In [5]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

## 📋 Importing Response Formatting Libraries

In this cell, we're bringing in tools to structure our AI's responses:

- 🏗️ `BaseModel`, `Field`: Building blocks for well-organized outputs

---

In [6]:
from langchain_core.pydantic_v1 import BaseModel, Field

## Importing essential libraries

In [7]:
import json
import os 

## 🤖 Initializing Our Language Model

In this cell, we're firing up our main language model:

- 🔧 Model: gpt-3.5-turbo
- 🗣️ Verbose mode: On
- 🌡️ Temperature: 0.5 (balanced creativity)
- 🔑 API Key: Securely loaded

---

In [8]:
chatgpt= ChatOpenAI(model='gpt-3.5-turbo',
                   verbose=True,
                   temperature=0.5,
                   openai_api_key=os.getenv('OPENAI_API_KEY'))

## 🌐 Defining Our Knowledge Sources

In this cell, we're specifying the URLs for our RAG model's knowledge base:

- 📚 AI Agents Blog Post
- 📘 Langchain RAG Tutorial

---

In [9]:
urls=["https://lilianweng.github.io/posts/2023-06-23-agent/",
      "https://python.langchain.com/v0.2/docs/tutorials/rag/"]

## 📥 Fetching and Processing Web Content

Here, we're loading and preprocessing our web data:

- 🕸️ Using WebBaseLoader to fetch content
- 🍲 Employing BeautifulSoup for parsing

---

In [10]:

loader = WebBaseLoader(web_paths=(urls),bs_kwargs=dict(parse_only=bs4.SoupStrainer()))

docs = loader.load()

## ✂️ Chunking Our Text Data

In this cell, we're breaking down our text into manageable pieces:

- 📏 Chunk size: 1000 characters
- 🔀 Overlap: 200 characters

---

In [11]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

splits = text_splitter.split_documents(docs)

## 🗄️ Creating Our Vector Database

Here, we're setting up our FAISS vector database:

- 🔢 Creating embeddings with OpenAI
- 📚 Storing document chunks for quick retrieval

---

In [12]:
vectorstore =  FAISS.from_documents(splits, OpenAIEmbeddings())

## 🔍 Setting Up Our Information Retriever

In this cell, we're creating a retriever from our vector database:

- 🕵️ Ready to fetch relevant information for user queries

---

In [13]:
retriever = vectorstore.as_retriever()

## 📝 Crafting Our System Prompt

Here, we're defining the behavior of our AI assistant:

- 🎭 Role: Question-answering assistant
- 📚 Context: Use retrieved information
- 🚫 Honesty: Admit when answer is unknown
- 📏 Conciseness: Max three sentences
- 🔧 Output: JSON format

---

In [14]:
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
    "Your response should be a valid JSON object."
)

## 🧩 Building Our Prompt Template

In this cell, we're creating a structured prompt template:

- 🤖 System message: Defining AI behavior
- 👤 Human input: Placeholder for user queries

---

In [15]:
prompt =ChatPromptTemplate.from_messages(
   [
        ("system", system_prompt),
        ("human", "{input}"),
    ],       
)

## 📊 Defining Our Response Structure

Here, we're creating a format for our AI's responses:

- ✅ Answer: The main response to the query
- 🎵 Poem: A related poetic touch
- ❓ Follow-up: Suggested next question

---

In [16]:
class ResponseFormatter(BaseModel):
    answer: str = Field(description="The answer to the user's question")
    poem: str = Field(description='poem related to the user query')
    followup_question: str = Field(description="A followup question the user could ask")

## 🏗️ Enabling Structured Outputs

In this cell, we're ensuring our LLM provides structured responses:

- 🔧 Binding our ResponseFormatter to the LLM

---

In [17]:
llm=chatgpt.with_structured_output(ResponseFormatter)

## 📄 Creating Document Formatting Function

Here, we're defining a helper function to format retrieved documents:

- 🔗 Joining document contents with newlines

---

In [18]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## 🔄 Defining JSON Conversion Function

In this cell, we're creating a function to convert responses to JSON:

- 📦 Transforming structured output to JSON string

---

In [19]:

def to_json_string(response_formatter: ResponseFormatter) -> str:
    return json.dumps(response_formatter.dict(), indent=2)

## 🔗 Assembling Our RAG Chain

Here, we're putting all the pieces together:

1. 🔍 Retrieve relevant documents
2. 📝 Apply our prompt template
3. 🤖 Process through our LLM
4. 🔄 Convert to JSON string
5. 📊 Parse the output

---

In [20]:
rag_chain = (
    {
        "context": retriever | format_docs, 
        "input": RunnablePassthrough()
    }
    | prompt
    | llm
    | to_json_string
    | StrOutputParser()
)


## 🚀 Testing Our RAG Model

In this cell, we're putting our model to the test:

- ❓ Query: "What is an AGENT?"
- 🧠 Generating a comprehensive response

---

In [25]:
query='What is an AGENT?'
response=rag_chain.invoke(query)
json_response=json.loads(response)

for key, value in json_response.items():
    print(f'{key}:')
    print(value)
    print("\n")

answer:
An agent is a virtual character controlled by a Large Language Model (LLM) in the context of Generative Agents Simulation. These agents exhibit human-like behavior and interact with each other in a sandbox environment.


poem:
An agent in the virtual space, controlled with grace, interacts in a simulated place.


followup_question:
Would you like to know more about how agents are designed and function in autonomous systems?




# 🎉 Congratulations! You've Built a RAG Model! 🚀

---

## 🌟 What We've Accomplished

In this notebook, we've embarked on an exciting journey through the world of Retrieval-Augmented Generation (RAG). Let's recap our amazing achievements:

1. 📚 Imported and organized a powerful set of libraries
2. 🧠 Initialized a state-of-the-art language model
3. 🌐 Loaded and processed web data to create our knowledge base
4. 🗄️ Built a vector database for efficient information retrieval
5. 🎨 Crafted an intelligent prompt system
6. 🔗 Assembled a comprehensive RAG chain
7. 🚀 Tested our model with real-world queries



---

###  Happy coding, and may your AI adventures be ever fruitful! 🌱