In [1]:
# Required imports
import os
from dotenv import load_dotenv

In [2]:
# Load environment variables
load_dotenv()
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")
os.environ["LANGCHAIN_TRACING_V2"] = os.getenv("LANGCHAIN_TRACING_V2")

In [3]:
os.environ["LANGCHAIN_PROJECT"]

'Agentic AI Project2'

In [19]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model='o1-mini')
result = llm.invoke("What is the difference between RAG and Agentic RAG?")
print(result)

content="**Retrieval-Augmented Generation (RAG)** and **Agentic RAG** are both approaches that enhance language models by incorporating external information retrieval processes. However, they differ in their scope, functionality, and the level of autonomy they grant to the system. Here's a breakdown of the two:\n\n### **1. Retrieval-Augmented Generation (RAG):**\n\n- **Definition:** RAG is a technique that combines pre-trained language models (like GPT) with a retrieval system. Instead of relying solely on the model's internal knowledge (which is static and limited to its training data), RAG fetches relevant documents or information from external sources in real-time to inform and improve the generation of responses.\n  \n- **How It Works:**\n  1. **Query Formulation:** When a user poses a question or prompt, the system formulates a query based on that input.\n  2. **Retrieval:** The system searches an external knowledge base, database, or document repository to find relevant informati

In [11]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model='o1-mini')
result = llm.invoke("What is the difference between RAG and Agentic RAG?")
print(result.content)


**Retrieval-Augmented Generation (RAG)** and **Agentic RAG** are both frameworks that enhance the capabilities of language models by integrating information retrieval mechanisms. However, they differ in complexity, functionality, and the scope of their applications. Here's a detailed breakdown of each and their key differences:

---

### **1. Retrieval-Augmented Generation (RAG)**

**Definition:**
RAG is a framework that combines retrieval systems with generative language models to produce more accurate and informed responses. Instead of relying solely on the knowledge embedded within the model (which has a fixed cutoff date), RAG retrieves relevant external documents or data in real-time to inform its generation process.

**Components:**
- **Retriever:** Searches a large corpus of documents to find those most relevant to the input query. Commonly uses techniques like dense retrieval (e.g., using embeddings) or sparse retrieval (e.g., TF-IDF).
  
- **Generator:** A language model (like

In [12]:
from langchain_groq import ChatGroq
llm = ChatGroq(model='deepseek-r1-distill-llama-70b')
result = llm.invoke("What is the difference between RAG and Agentic RAG?")
print(result.content)


<think>

</think>

RAG (Retrieval-Augmented Generation) and Agentic RAG are both AI models developed by Meta, but they serve different purposes and have distinct functionalities:

1. **RAG (Retrieval-Augmented Generation)**:
   - RAG is a framework that combines a retrieval system with a generative model. It retrieves relevant information from a database or knowledge base and uses this context to generate more accurate and informative responses.
   - It is particularly useful for tasks that require up-to-date or specific information, as it can pull data from external sources.

2. **Agentic RAG (Agentic Retrieval-Augmented Generation)**:
   - Agentic RAG is an advanced version of the RAG model, designed to perform more complex tasks that require reasoning, planning, and interaction with external systems.
   - It incorporates "agency," meaning it can take actions on behalf of the user, such as accessing the internet, running code, or interacting with other tools and services.
   - Agenti

In [13]:
from langchain_groq import ChatGroq
llm = ChatGroq(model='gemma2-9b-it')
result = llm.invoke("What is the difference between RAG and Agentic RAG?")
print(result.content)

You're asking about two different concepts in the world of AI and language models:

**RAG (Retrieval Augmented Generation)**

* **Core Idea:** RAG combines the strengths of traditional language models (like GPT-3) with the factual accuracy of external knowledge bases. 
* **How it Works:**
    1. A user asks a question.
    2. The model first searches a pre-indexed knowledge base for relevant information.
    3. It then uses this retrieved information to generate a more accurate and informative response.
* **Benefits:**
    * **Factual Accuracy:**  RAG models can access real-world data, reducing the risk of generating incorrect or hallucinated information.
    * **Comprehensiveness:** They can provide more complete answers by drawing upon a wider range of sources.
* **Example:** Imagine asking RAG, "What is the capital of France?". RAG would search its knowledge base, retrieve "Paris", and confidently answer the question.

**Agentic RAG**

* **Core Idea:** Agentic RAG takes RAG a step f

In [14]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system","Act as a  legal advisor to give legal advice to people in the USA. If somebody asks something outside of legal advice, tell them you have no idea because it's out of your scope."),
     ("user","{input}")
     ])


from langchain_groq import ChatGroq
model = ChatGroq(model='gemma2-9b-it')
chain = prompt | model
response = chain.invoke({"input":"What is the difference between RAG and Agentic RAG?"})
print(response.content)

I understand you're interested in the difference between RAG and Agentic RAG.  

While I can access and process information from the real world, I am not qualified to give advice on technical topics like these.  RAG and Agentic RAG seem to be related to artificial intelligence and natural language processing, which are areas outside my legal expertise. 

I recommend you consult with a specialist in artificial intelligence or computer science for more information on these topics. 

Remember, I'm here to provide legal advice within my scope of practice. If you have any legal questions, please don't hesitate to ask! 




In [15]:
from langchain_core.output_parsers import JsonOutputParser
output_parser = JsonOutputParser()
prompt = ChatPromptTemplate.from_messages(
    [("system","Act as an expert travel planner and answer users query regarding travel plans . Always respond to the user query in a valid JSON format. If user asks anything else reply them by saying you don't know "),
     ("user","{input}")])
     
from langchain_groq import ChatGroq
model = ChatGroq(model = "gemma2-9b-it")
chain = prompt | model | output_parser
response = chain.invoke({"input":"Can you give me 5 day udaipur plan during december under 30000. Return theresponse in JSON format with keys 'daynumber','location','detailedplan'."})
print(response)

[{'daynumber': 1, 'location': 'Udaipur City Palace', 'detailedplan': 'Explore the magnificent City Palace, a stunning blend of Rajput and Mughal architecture. Marvel at the intricate courtyards, royal apartments, and museums showcasing the rich history of Udaipur. Take a boat ride on Lake Pichola for breathtaking views of the palace.'}, {'daynumber': 2, 'location': 'Lake Pichola & Jag Mandir', 'detailedplan': 'Embark on a scenic boat ride across Lake Pichola, visiting the picturesque Jag Mandir, a beautiful island palace. Enjoy lunch at a lakeside restaurant and soak in the serene ambiance. In the evening, experience the captivating puppet show at Bagore Ki Haveli.'}, {'daynumber': 3, 'location': 'Saheliyon-ki-Bari & Fateh Sagar Lake', 'detailedplan': 'Explore the charming Saheliyon-ki-Bari, a garden built for the maids of honor of a princess. Admire the fountains, kiosks, and lotus pools. Take a stroll around Fateh Sagar Lake, offering panoramic views of the city. Enjoy a leisurely ev

# RAG Pipeline

## 1. Document Loader | Data Ingestion
- Types of Document Loaders [see document loader types](https://python.langchain.com/docs/integrations/document_loaders/) 

In [24]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("human-rights-statement.pdf")
docs = loader.load()
docs





[Document(metadata={'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2021-05-21T22:12:01+05:30', 'author': 'Tanuja Manohara', 'moddate': '2024-05-23T17:58:25+05:30', 'source': 'human-rights-statement.pdf', 'total_pages': 8, 'page': 0, 'page_label': '1'}, page_content='Infosys Limited Ver.Rev.1.3  Page 1 of 8 \n   \n  \n   \n \n \nQUALITY SYSTEM DOCUMENTATION \n \n \n \n \n \n \nHuman Rights Policy Statement \n2021 \n \n \n \n \n \n \n \n \n \nINFOSYS LIMITED \nBengaluru'),
 Document(metadata={'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2021-05-21T22:12:01+05:30', 'author': 'Tanuja Manohara', 'moddate': '2024-05-23T17:58:25+05:30', 'source': 'human-rights-statement.pdf', 'total_pages': 8, 'page': 1, 'page_label': '2'}, page_content='Human Rights Policy Statement \nInfosys Limited Ver.Rev.1.3  Page 2 of 8 \n   \n  \n   \nCOPYRIGHT NOTICE  \nThis Quality S

In [26]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://www.infosys.com/about/esg/social/employee-wellbeing/human-rights.html")
docs = loader.load()
docs

[Document(metadata={'source': 'https://www.infosys.com/about/esg/social/employee-wellbeing/human-rights.html', 'title': 'Human Rights', 'description': 'No description found.', 'language': 'en'}, page_content="\n\n\n\n\n\nHuman Rights\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n \n\n\n\n Navigate your next \n Infosys Knowledge Institute \n Investors \n Careers \n\n\n\n\n\n\n\n\n\nsearch\n\n\n\ncross\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSearch\nHit enter to search or ESC to close\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n   \n\n Navigate your\r\n                                next   \n Industries\r\n                                  \n Services   \n Platforms\r\n                                  \n Infosys\r\n                                Knowledge Institute  \n\n About Us   \n Investors   \n Careers  \n\n Newsroom   \n Contact Us  

## 2. Chunking
Documents need to be broken further into smaller blocks. Why?
- RAG Flow: User query -> Semantic Similarity Matching -> Retrieve Relevant Context -> Pass to LLM for Processing
- LLM has limited context window
- LLM performs better if context is smaller ==> Less chance of hallucination
- Reduce Processing Cost


**Chunking Approaches:** [Langchain Text Splitters](https://python.langchain.com/docs/concepts/text_splitters/)
1. **Length based Chunking:** Simplest approach, splits in fixed-size lengths
    - CharacterTextSplitter
2. **Text-structure based Chunking:** Breaks into Paragraphs, Sentences, Words.
    - RecursiveCharacterTextSplitter
3. **Document-structure based Chunking:** For web pages, code, JSON files, . Splits in Headings, HTML tags, etc.
4. **Semantic meaning based Chunking:** Uses embedding model to compare similarity distance between sentences and splits if distance is above a certain threshhold. Threshhold specified in the kwarg "breakpoint"
    - SemanticChunker(OpenAIEmbeddings(),breakpoint_threshold_type="percentile")


In [28]:
# RecursiveCharacter Chunking
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("human-rights-statement.pdf")
pdf_docs = loader.load()

from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
final_docs = text_splitter.split_documents(pdf_docs)
final_docs

[Document(metadata={'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2021-05-21T22:12:01+05:30', 'author': 'Tanuja Manohara', 'moddate': '2024-05-23T17:58:25+05:30', 'source': 'human-rights-statement.pdf', 'total_pages': 8, 'page': 0, 'page_label': '1'}, page_content='Infosys Limited Ver.Rev.1.3  Page 1 of 8 \n   \n  \n   \n \n \nQUALITY SYSTEM DOCUMENTATION \n \n \n \n \n \n \nHuman Rights Policy Statement \n2021 \n \n \n \n \n \n \n \n \n \nINFOSYS LIMITED \nBengaluru'),
 Document(metadata={'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2021-05-21T22:12:01+05:30', 'author': 'Tanuja Manohara', 'moddate': '2024-05-23T17:58:25+05:30', 'source': 'human-rights-statement.pdf', 'total_pages': 8, 'page': 1, 'page_label': '2'}, page_content='Human Rights Policy Statement \nInfosys Limited Ver.Rev.1.3  Page 2 of 8 \n   \n  \n   \nCOPYRIGHT NOTICE  \nThis Quality S

In [30]:
print(f"No. of PDF docs: {len(pdf_docs)}")
print(f"No. of Chunk docs: {len(final_docs)}")

No. of PDF docs: 8
No. of Chunk docs: 26


In [31]:
from langchain_community.document_loaders import TextLoader
loader = TextLoader("random_text.txt")
text_docs = loader.load()
text_docs

[Document(metadata={'source': 'random_text.txt'}, page_content="5 Levels Of Text Splitting\nIn this tutorial we are reviewing the 5 Levels Of Text Splitting. This is an unofficial list put together for fun and educational purposes.\n\nEver try to put a long piece of text into ChatGPT but it tells you it’s too long? Or you're trying to give your application better long term memory, but it’s still just not quite working.\n\nOne of the most effective strategies to improve performance of your language model applications is to split your large data into smaller pieces. This is call splitting or chunking (we'll use these terms interchangeably). In the world of multi-modal, splitting also applies to images.\n\nWe are going to cover a lot, but if you make it to the end, I guarantee you’ll have a solid grasp on chunking theory, strategies, and resources to learn more.\n\nLevels Of Text Splitting\n\nLevel 1: Character Splitting - Simple static character chunks of data\nLevel 2: Recursive Charact

In [36]:
# Length based Chunking

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

# Load the documents
loader = TextLoader("random_text.txt")
text_docs = loader.load()  # list of Document objects

# Extract text content from documents
full_text = "\n".join(doc.page_content for doc in text_docs)

# Split the text
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    encoding_name="cl100k_base", chunk_size=50, chunk_overlap=3
)
texts = text_splitter.split_text(full_text)
texts


Created a chunk of size 53, which is longer than the specified 50
Created a chunk of size 141, which is longer than the specified 50
Created a chunk of size 53, which is longer than the specified 50
Created a chunk of size 75, which is longer than the specified 50


['5 Levels Of Text Splitting\nIn this tutorial we are reviewing the 5 Levels Of Text Splitting. This is an unofficial list put together for fun and educational purposes.',
 "Ever try to put a long piece of text into ChatGPT but it tells you it’s too long? Or you're trying to give your application better long term memory, but it’s still just not quite working.",
 "One of the most effective strategies to improve performance of your language model applications is to split your large data into smaller pieces. This is call splitting or chunking (we'll use these terms interchangeably). In the world of multi-modal, splitting also applies to images.",
 'We are going to cover a lot, but if you make it to the end, I guarantee you’ll have a solid grasp on chunking theory, strategies, and resources to learn more.\n\nLevels Of Text Splitting',
 'Level 1: Character Splitting - Simple static character chunks of data\nLevel 2: Recursive Character Text Splitting - Recursive chunking based on a list of 

In [37]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("random_text.txt")
text_doc = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500 , chunk_overlap = 50)
text_new = text_splitter.split_documents(text_doc)
text_new


[Document(metadata={'source': 'random_text.txt'}, page_content="5 Levels Of Text Splitting\nIn this tutorial we are reviewing the 5 Levels Of Text Splitting. This is an unofficial list put together for fun and educational purposes.\n\nEver try to put a long piece of text into ChatGPT but it tells you it’s too long? Or you're trying to give your application better long term memory, but it’s still just not quite working."),
 Document(metadata={'source': 'random_text.txt'}, page_content="One of the most effective strategies to improve performance of your language model applications is to split your large data into smaller pieces. This is call splitting or chunking (we'll use these terms interchangeably). In the world of multi-modal, splitting also applies to images.\n\nWe are going to cover a lot, but if you make it to the end, I guarantee you’ll have a solid grasp on chunking theory, strategies, and resources to learn more.\n\nLevels Of Text Splitting"),
 Document(metadata={'source': 'ra

In [39]:
print(f"First Chunk: {text_new[0]}")
print(f"Second Chunk: {text_new[1]}")

First Chunk: page_content='5 Levels Of Text Splitting
In this tutorial we are reviewing the 5 Levels Of Text Splitting. This is an unofficial list put together for fun and educational purposes.

Ever try to put a long piece of text into ChatGPT but it tells you it’s too long? Or you're trying to give your application better long term memory, but it’s still just not quite working.' metadata={'source': 'random_text.txt'}
Second Chunk: page_content='One of the most effective strategies to improve performance of your language model applications is to split your large data into smaller pieces. This is call splitting or chunking (we'll use these terms interchangeably). In the world of multi-modal, splitting also applies to images.

We are going to cover a lot, but if you make it to the end, I guarantee you’ll have a solid grasp on chunking theory, strategies, and resources to learn more.

Levels Of Text Splitting' metadata={'source': 'random_text.txt'}


## 3. Embedding
- Converting text to numeric format for computer to understand
- Uses LLM

In [41]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model='text-embedding-3-large')
text = "This is a tutorial on openai embedding"
query_result = embeddings.embed_query(text)
query_result


[0.0012720703380182385,
 0.034759171307086945,
 -0.012477999553084373,
 -0.043530430644750595,
 0.031500499695539474,
 -0.0003948170051444322,
 0.016877207905054092,
 0.053143516182899475,
 -0.008234936743974686,
 -0.0005728135001845658,
 -0.020950548350811005,
 0.004901586566120386,
 -0.007793657947331667,
 -0.012294699437916279,
 0.013367345556616783,
 0.01968781277537346,
 0.022063927724957466,
 0.04879862070083618,
 -0.025118932127952576,
 -0.02351675182580948,
 0.020692570134997368,
 -0.0003814513620454818,
 -0.04385630041360855,
 -0.031636279076337814,
 0.02725064754486084,
 -0.02107274904847145,
 0.014677603729069233,
 0.04165669530630112,
 -0.018805256113409996,
 0.051296934485435486,
 0.013530279509723186,
 -0.019945790991187096,
 0.0013178953668102622,
 -0.010570318438112736,
 -0.015750249847769737,
 0.02611011266708374,
 0.03788206726312637,
 0.04467096924781799,
 -0.0501292422413826,
 0.016578495502471924,
 0.03084876574575901,
 0.0011244117049500346,
 -0.002299740212038159

In [18]:
# Use free embedding model from Hugginface

from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')
text = "This is a tutorial on openai embedding"
query_result = embeddings.embed_query(text)
query_result

  return forward_call(*args, **kwargs)


RuntimeError: Numpy is not available

In [17]:
import numpy as np

# 4. Vector DB
Storage Choices:
- In memory (RAM) - Data is lost when system restarts (FAISS,ChromaDB)
- Local (ROM) - DB is stored as a file (FAISS,ChromaDB)
- Cloud hosted - Adv: Scaling, multi-team access, APIs (Pinecone, Weaviate, Milvus).
    - AWS --> OpenSearch. Based on ElasticSearch
    - Azure --> Azure AI Search
    - GCP --> Vertex AI Vector Search


Search Algo:
- Exact - Flat Index
- Approximate Nearest Neighbor (ANN)
- Hierarchical Navigable Small World (HNSW) - Graph based

Similarity Matching:
- Cosine similarty
- Euclidean similarity
- Jaccard similarity


