<a href="https://colab.research.google.com/github/ahsanrazi/LangChain/blob/main/08_RAG_APP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval Augmented Generation (RAG) App

In [4]:
from google.colab import userdata
gemini_api_key = userdata.get('GEMINI_API_KEY').strip()

In [23]:
!pip install -qU langgraph
!pip install -qU langchain-text-splitters
!pip install -qU langchain-community
!pip install -qU langchain-google-genai
!pip install -qU langchain-pinecone

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.3/1.3 MB[0m [31m72.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/412.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m412.7/412.7 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/427.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m427.3/427.3 kB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.5/87.5 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [1]:
# One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots.
# These are applications that can answer questions about specific source information.
# These applications use a technique known as Retrieval Augmented Generation, or RAG.

# Overview

In [2]:
# A typical RAG application has two main components

# A typical RAG application has two main components:
# Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.
# Retrieval and generation: The actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index,
# then passes that to the model.

# Indexing

In [None]:
# Load: First we need to load our data. This is done with Document Loaders.

# Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and passing it into a model,
# as large chunks are harder to search over and won't fit in a model's finite context window.

# Embed-Store: We need somewhere to store and index our splits, so that they can be searched over later.
# This is often done using a VectorStore and Embeddings model.

# Retrieval and generation

In [None]:
# Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.
# Generate: A ChatModel / LLM produces an answer using a prompt that includes both the question with the retrieved data.

# Chat Model

In [11]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

llm = GoogleGenerativeAIEmbeddings(model = "gemini-2.0-flash-exp", api_key=gemini_api_key)

# Embedding Model

In [13]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key = gemini_api_key)

# Vectore Store

In [24]:
from langchain_pinecone import PineconeVectorStore
from pinecone import Pinecone

index_name = "langchain"
namespace = "RAG"

pc = Pinecone(api_key= userdata.get('PINECONE_API'))
index = pc.Index(index_name)

vector_store = PineconeVectorStore(embedding=embeddings, index=index, namespace=namespace)

# 1. Indexing

### Loading documents

In [14]:
# We need to first load the blog post contents. We can use DocumentLoaders for this, which are objects
# that load in data from a source and return a list of Document objects.

# In this case we’ll use the WebBaseLoader, which uses urllib to load HTML from web URLs and BeautifulSoup to parse it to text.

In [19]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs={"parse_only": bs4_strainer})
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")

Total characters: 43130


In [20]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


### Splitting documents