In [66]:
print("all ok")

all ok


## Model Configuration

Imports and initializes the Google Gemini LLM (gemini-1.5-flash) for text generation. Tests the model with a greeting.

In [67]:
# Config the model
from langchain_google_genai import ChatGoogleGenerativeAI
model=ChatGoogleGenerativeAI(model='gemini-1.5-flash')
output=model.invoke("hi")
print(output.content)

Hi there! How can I help you today?


## Embedding Model Setup

Imports and initializes the HuggingFace embedding model (BAAI/bge-small-en) for vector search. Tests embedding output size.

In [68]:
# Config the embedding model
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en")
len(embeddings.embed_query("hi"))

384

## Import Document and Vector Store Utilities

Imports document loaders, vector store (Chroma), and text splitter for document processing.

In [69]:
# lets take a data embedd it and store in VDB
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

## Directory Loader Setup

Initializes a DirectoryLoader to load all .txt files from the data2 directory.

In [70]:
loader=DirectoryLoader("/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2",glob="./*.txt",loader_cls=TextLoader)

## Load Documents

Loads the documents from the specified directory and displays the loaded Document objects.

In [71]:
docs=loader.load()
docs

[Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content="🇺🇸 Overview of the U.S. Economy\nThe United States of America possesses the largest economy in the world in terms of nominal GDP, making it the most powerful economic force globally. It operates under a capitalist mixed economy, where the private sector dominates, but the government plays a significant regulatory and fiscal role. With a population of over 335 million people and a high level of technological advancement, the U.S. economy thrives on a foundation of consumer spending, innovation, global trade, and financial services. It has a highly diversified structure with strong sectors in technology, healthcare, finance, real estate, defense, and agriculture.\n\nU.S. GDP – Size, Composition, and Global Share\nAs of 2024, the United States’ nominal GDP is estimated to be around $28 trillion USD, accounting for approximately 25% of the global economy. It ranks #1 in the world by n

## Inspect Document Content

Displays the content of the first loaded document for verification.

In [72]:
docs[0].page_content

"🇺🇸 Overview of the U.S. Economy\nThe United States of America possesses the largest economy in the world in terms of nominal GDP, making it the most powerful economic force globally. It operates under a capitalist mixed economy, where the private sector dominates, but the government plays a significant regulatory and fiscal role. With a population of over 335 million people and a high level of technological advancement, the U.S. economy thrives on a foundation of consumer spending, innovation, global trade, and financial services. It has a highly diversified structure with strong sectors in technology, healthcare, finance, real estate, defense, and agriculture.\n\nU.S. GDP – Size, Composition, and Global Share\nAs of 2024, the United States’ nominal GDP is estimated to be around $28 trillion USD, accounting for approximately 25% of the global economy. It ranks #1 in the world by nominal GDP, far ahead of China (which ranks 2nd). The U.S. GDP per capita is also among the highest, hover

## Text Splitter Configuration

Configures a RecursiveCharacterTextSplitter to break documents into overlapping chunks for better retrieval.

In [73]:
text_splitter=RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=50
)

## Split Documents

Splits the loaded documents into smaller chunks using the configured text splitter.

In [74]:
new_docs=text_splitter.split_documents(documents=docs)

## Inspect Chunks

Displays the resulting document chunks to verify splitting.

In [75]:
new_docs

[Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='🇺🇸 Overview of the U.S. Economy'),
 Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='The United States of America possesses the largest economy in the world in terms of nominal GDP, making it the most powerful economic force globally. It operates under a capitalist mixed economy,'),
 Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='It operates under a capitalist mixed economy, where the private sector dominates, but the government plays a significant regulatory and fiscal role. With a population of over 335 million people and a'),
 Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='a population of over 335 million people and a high level of technological advancement, the U.S. economy thrives on a foundation of cons

## Extract Chunk Strings

Creates a list of chunk strings from the split documents for further processing.

In [76]:
doc_string=[doc.page_content for doc in new_docs]

## Inspect Chunk Strings

Displays the list of chunk strings to verify extraction.

In [77]:
doc_string

['🇺🇸 Overview of the U.S. Economy',
 'The United States of America possesses the largest economy in the world in terms of nominal GDP, making it the most powerful economic force globally. It operates under a capitalist mixed economy,',
 'It operates under a capitalist mixed economy, where the private sector dominates, but the government plays a significant regulatory and fiscal role. With a population of over 335 million people and a',
 'a population of over 335 million people and a high level of technological advancement, the U.S. economy thrives on a foundation of consumer spending, innovation, global trade, and financial services.',
 'innovation, global trade, and financial services. It has a highly diversified structure with strong sectors in technology, healthcare, finance, real estate, defense, and agriculture.',
 'U.S. GDP – Size, Composition, and Global Share',
 'As of 2024, the United States’ nominal GDP is estimated to be around $28 trillion USD, accounting for approximately 

## Count Chunks

Counts the number of document chunks created.

In [78]:
len(doc_string)

55

## Create Vector Database

Creates a Chroma vector database from the document chunks and embeddings.

In [79]:
db=Chroma.from_documents(new_docs,embeddings)

## Create Retriever

Initializes a retriever from the Chroma vector database for semantic search.

In [80]:
retriever=db.as_retriever(search_kwargs={"k": 3})

## Test Retriever

Tests the retriever by querying for 'industrial growth of usa?'. Displays the top matching chunks.

In [81]:
retriever.invoke("industrial growth of usa?")

[Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='Looking forward, the U.S. economy is expected to grow at a moderate pace, powered by innovation in AI, green energy, robotics, biotech, and quantum computing. The Biden administration’s Inflation'),
 Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='Looking forward, the U.S. economy is expected to grow at a moderate pace, powered by innovation in AI, green energy, robotics, biotech, and quantum computing. The Biden administration’s Inflation'),
 Document(metadata={'source': '/Users/kausik/Desktop/Agentic-AI/3-langgraph/data2/usa.txt'}, page_content='🇺🇸 Overview of the U.S. Economy')]

## Import State and Prompt Utilities

Imports Pydantic, LangChain, and LangGraph utilities for state management, prompt templates, and output parsing.

In [82]:
# creation of pydantic class
import operator
from typing import List
from pydantic import BaseModel , Field
from langchain.prompts import PromptTemplate
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.messages import HumanMessage, AIMessage
from langgraph.graph import StateGraph,END

## Define TopicSelectionParser

Defines a Pydantic model for structured output from the topic classification step.

In [83]:
class TopicSelectionParser(BaseModel):
    Topic:str=Field(description="selected topic")
    Reasoning:str=Field(description='Reasoning behind topic selection')

## Import PydanticOutputParser

Imports the PydanticOutputParser for parsing model outputs into the defined schema.

In [84]:
from langchain.output_parsers import PydanticOutputParser

## Initialize Output Parser

Initializes the output parser with the TopicSelectionParser schema.

In [85]:
parser=PydanticOutputParser(pydantic_object=TopicSelectionParser)

In [86]:
parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"Topic": {"description": "selected topic", "title": "Topic", "type": "string"}, "Reasoning": {"description": "Reasoning behind topic selection", "title": "Reasoning", "type": "string"}}, "required": ["Topic", "Reasoning"]}\n```'

In [87]:
# this below agentstate is just for the explnation like how state works

Agentstate={}

In [88]:
Agentstate["messages"]=[]

In [89]:
Agentstate

{'messages': []}

In [90]:
Agentstate["messages"].append("hi how are you?")

In [91]:
Agentstate

{'messages': ['hi how are you?']}

In [92]:
Agentstate["messages"].append("what are you doing?")

In [93]:

Agentstate

{'messages': ['hi how are you?', 'what are you doing?']}

In [94]:
Agentstate["messages"].append("i hope everything fine")

In [95]:
Agentstate

{'messages': ['hi how are you?',
  'what are you doing?',
  'i hope everything fine']}

In [96]:
Agentstate["messages"][0]

'hi how are you?'

In [97]:
# this agentstate class you need to inside the stategraph
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]

In [98]:
state={"messages":["hi"]}

In [99]:
def function_1(state:AgentState):
    
    question=state["messages"][-1]
    
    print("Question",question)
    
    template="""
    Your task is to classify the given user query into one of the following categories: [USA,Not Related]. 
    Only respond with the category name and nothing else.

    User query: {question}
    {format_instructions}
    """
    
    prompt= PromptTemplate(
        template=template,
        input_variable=["question"],
        partial_variables={"format_instructions": parser.get_format_instructions()}
    )
    
    
    chain= prompt | model | parser
    
    response = chain.invoke({"question":question})
    
    print("Parsed response:", response)
    
    return {"messages": [response.Topic]}

In [100]:
state={"messages":["what is a today weather?"]}

In [101]:
state={"messages":["what is a GDP of usa??"]}

In [102]:
function_1(state)

Question what is a GDP of usa??
Parsed response: Topic='USA' Reasoning='The query explicitly asks for the GDP of the USA.'


{'messages': ['USA']}

In [103]:
class TopicSelectionParser(BaseModel):
    Topic:str=Field(description="selected topic")
    Reasoning:str=Field(description='Reasoning behind topic selection')

In [104]:
def router(state:AgentState):
    print("-> ROUTER ->")
    
    last_message=state["messages"][-1]
    print("last_message:", last_message)
    
    if "usa" in last_message.lower():
        return "RAG Call"
    else:
        return "LLM Call"

In [105]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [106]:
# RAG Function
def function_2(state:AgentState):
    print("-> RAG Call ->")
    
    question = state["messages"][0]
    
    prompt=PromptTemplate(
        template="""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:""",
        
        input_variables=['context', 'question']
    )
    
    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | model
        | StrOutputParser()
    )
    result = rag_chain.invoke(question)
    return  {"messages": [result]}

In [107]:

# LLM Function
def function_3(state:AgentState):
    print("-> LLM Call ->")
    question = state["messages"][0]
    
    # Normal LLM call
    complete_query = "Anwer the follow question with you knowledge of the real world. Following is the user question: " + question
    response = model.invoke(complete_query)
    return {"messages": [response.content]}

In [108]:
from langgraph.graph import StateGraph,END

In [109]:
workflow=StateGraph(AgentState)

In [110]:
workflow.add_node("Supervisor",function_1)

<langgraph.graph.state.StateGraph at 0x16942a510>

In [111]:
workflow.add_node("RAG",function_2)

<langgraph.graph.state.StateGraph at 0x16942a510>

In [112]:

workflow.add_node("LLM",function_3)

<langgraph.graph.state.StateGraph at 0x16942a510>

In [113]:

workflow.set_entry_point("Supervisor")

<langgraph.graph.state.StateGraph at 0x16942a510>

In [114]:
workflow.add_conditional_edges(
    "Supervisor",
    router,
    {
        "RAG Call": "RAG",
        "LLM Call": "LLM",
    }
)

<langgraph.graph.state.StateGraph at 0x16942a510>

In [115]:
workflow.add_edge("RAG",END)
workflow.add_edge("LLM",END)

<langgraph.graph.state.StateGraph at 0x16942a510>

In [116]:
app=workflow.compile()

In [117]:
state={"messages":["hi"]}

In [118]:
app.invoke(state)

Question hi
Parsed response: Topic='Not Related' Reasoning="The query 'hi' is a greeting and does not relate to the USA."
-> ROUTER ->
last_message: Not Related
-> LLM Call ->
Parsed response: Topic='Not Related' Reasoning="The query 'hi' is a greeting and does not relate to the USA."
-> ROUTER ->
last_message: Not Related
-> LLM Call ->


{'messages': ['hi', 'Not Related', 'Hi there!']}

In [119]:
state={"messages":["what is a gdp of usa?"]}

app.invoke(state)

Question what is a gdp of usa?
Parsed response: Topic='USA' Reasoning='The query explicitly asks for the GDP of the USA.'
-> ROUTER ->
last_message: USA
-> RAG Call ->
Parsed response: Topic='USA' Reasoning='The query explicitly asks for the GDP of the USA.'
-> ROUTER ->
last_message: USA
-> RAG Call ->


{'messages': ['what is a gdp of usa?',
  'USA',
  "The nominal GDP of the USA is estimated to be around $28 trillion USD as of 2024.  This represents approximately 25% of the global economy.  The US has the world's largest nominal GDP."]}

In [120]:
state={"messages":["can you tell me the industrial growth of world's most powerful economy?"]}
app.invoke(state)

Question can you tell me the industrial growth of world's most powerful economy?
Parsed response: Topic='USA' Reasoning="The query asks about the industrial growth of the world's most powerful economy, which is generally considered to be the USA."
-> ROUTER ->
last_message: USA
-> RAG Call ->
Parsed response: Topic='USA' Reasoning="The query asks about the industrial growth of the world's most powerful economy, which is generally considered to be the USA."
-> ROUTER ->
last_message: USA
-> RAG Call ->


{'messages': ["can you tell me the industrial growth of world's most powerful economy?",
  'USA',
  "The U.S. economy, the world's largest with a $28 trillion GDP, is a major driver of global growth.  Its strength is attributed to innovation, financial power, and a robust institutional framework.  I don't have specific industrial growth figures."]}

In [121]:
state={"messages":["can you tell me the industrial growth of world's poor economy?"]}
result=app.invoke(state)

Question can you tell me the industrial growth of world's poor economy?
Parsed response: Topic='Not Related' Reasoning="The query asks about the industrial growth of the world's poor economies, which is a global economic issue not specifically related to the USA."
-> ROUTER ->
last_message: Not Related
-> LLM Call ->


In [122]:
result["messages"][-1]

'There\'s no single, easily quantifiable measure of "industrial growth of the world\'s poor economies."  The term "poor economies" itself is vague and encompasses a vast and diverse range of countries with vastly different levels of development, industrial bases, and growth trajectories.  However, we can make some general observations:\n\n* **Uneven Growth:** Industrial growth in poorer economies is highly uneven. Some countries have experienced significant industrialization and economic growth (e.g., parts of Southeast Asia, particularly Vietnam), while others remain largely agrarian with limited industrial sectors.  Factors like political stability, access to resources, infrastructure, education, and global trade policies heavily influence this.\n\n* **Shifting Manufacturing:**  A significant portion of manufacturing has shifted from developed nations to developing nations in recent decades, often driven by lower labor costs. This has led to industrial growth in some poorer economies