# 📘 USA Classifier Workflow with LangGraph

This notebook demonstrates a simple decision-based agent workflow using LangGraph. It includes:
- A supervisor node to classify questions
- A router function to decide between RAG or LLM
- Two tool nodes (RAG and LLM)

We'll walk through the data preparation, model setup, and LangGraph orchestration.

## ✅ Step 1: Load Model and Embeddings

In [17]:
from langchain_google_genai import ChatGoogleGenerativeAI
model = ChatGoogleGenerativeAI(model='gemini-1.5-flash')
output = model.invoke("Hi")
output

AIMessage(content='Hi there! How can I help you today?', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-1.5-flash', 'safety_ratings': []}, id='run--4f72d9c3-58af-4406-8935-117addebafd0-0', usage_metadata={'input_tokens': 1, 'output_tokens': 11, 'total_tokens': 12, 'input_token_details': {'cache_read': 0}})

In [18]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en")
len(embeddings.embed_query("hi"))

384

## 📄 Step 2: Load and Embed Documents

In [8]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader("/Users/ankita/Documents/Krish Naik Academy/Agentic Batch 2/3-Langraph/data", glob="./*.txt", loader_cls=TextLoader)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=50)
new_docs = text_splitter.split_documents(docs)

In [9]:
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 3})
retriever.invoke("industrial growth of usa?")

[Document(metadata={'source': '/Users/ankita/Documents/Krish Naik Academy/Agentic Batch 2/3-Langraph/data/usa.txt'}, page_content='Looking forward, the U.S. economy is expected to grow at a moderate pace, powered by innovation in AI, green energy, robotics, biotech, and quantum computing. The Biden administration’s Inflation'),
 Document(metadata={'source': '/Users/ankita/Documents/Krish Naik Academy/Agentic Batch 2/3-Langraph/data/usa.txt'}, page_content='🇺🇸 Overview of the U.S. Economy'),
 Document(metadata={'source': '/Users/ankita/Documents/Krish Naik Academy/Agentic Batch 2/3-Langraph/data/usa.txt'}, page_content='The U.S. economy remains the engine of global growth, backed by unmatched innovation, financial dominance, and a strong institutional framework. Its $28 trillion GDP and influence over global')]

## 🧱 Step 3: Define Agent State and Pydantic Parser

In [10]:
import operator
from pydantic import BaseModel, Field
from typing import TypedDict, Annotated, Sequence
from langchain.output_parsers import PydanticOutputParser
from langchain_core.messages import BaseMessage

class TopicSelectionParser(BaseModel):
    Topic: str = Field(description="selected topic")
    Reasoning: str = Field(description="Reasoning behind topic selection")

parser = PydanticOutputParser(pydantic_object=TopicSelectionParser)

class AgentState(TypedDict):
    messages: Annotated[Sequence[str], operator.add]

## 🧠 Step 4: Supervisor Node (Topic Classifier)

In [11]:
from langchain_core.prompts import PromptTemplate

def function_1(state: AgentState):
    question = state["messages"][-1]
    print("Question:", question)

    template = '''
    Your task is to classify the given user query into one of the following categories: [USA, Not Related].
    Only respond with the category name and nothing else.

    User query: {question}
    {format_instructions}
    '''

    prompt = PromptTemplate(
        template=template,
        input_variable=["question"],
        partial_variables={"format_instructions": parser.get_format_instructions()}
    )
    chain = prompt | model | parser
    response = chain.invoke({"question": question})
    print("Parsed response:", response)

    return {"messages": [response.Topic]}

## 🔀 Step 5: Router Function

In [12]:
def router(state: AgentState):
    print("-> ROUTER ->")
    last_message = state["messages"][-1]
    if "usa" in last_message.lower():
        return "RAG Call"
    else:
        return "LLM Call" 

## 📚 Step 6: RAG Node

In [13]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def function_2(state: AgentState):
    print("-> RAG Call ->")
    question = state["messages"][0]

    prompt = PromptTemplate(
        template="""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum.\nQuestion: {question} \nContext: {context} \nAnswer:""",
        input_variables=["context", "question"]
    )

    rag_chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | model
        | StrOutputParser()
    )
    result = rag_chain.invoke(question)
    return {"messages": [result]}

## 🤖 Step 7: LLM Node

In [14]:
def function_3(state: AgentState):
    print("-> LLM Call ->")
    question = state["messages"][0]
    complete_query = "Answer the following question with your real-world knowledge: " + question
    response = model.invoke(complete_query)
    return {"messages": [response.content]}

## 🔁 Step 8: Build LangGraph Workflow

In [15]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)
workflow.add_node("Supervisor", function_1)
workflow.add_node("RAG", function_2)
workflow.add_node("LLM", function_3)
workflow.set_entry_point("Supervisor")

workflow.add_conditional_edges("Supervisor", router, {
    "RAG Call": "RAG",
    "LLM Call": "LLM"
})

workflow.add_edge("RAG", END)
workflow.add_edge("LLM", END)

app = workflow.compile()

## 🚀 Step 9: Run Sample Queries

In [16]:
state = {"messages": ["What is the GDP of USA?"]}
result = app.invoke(state)
print("✅ Final Answer:", result["messages"][-1])

Question: What is the GDP of USA?
Parsed response: Topic='USA' Reasoning='The query explicitly asks for the GDP of the USA.'
-> ROUTER ->
-> RAG Call ->
✅ Final Answer: The nominal GDP of the USA is estimated to be around $28 trillion USD as of 2024.  This represents about 25% of the global economy.  The US has the world's largest nominal GDP.


## ✅ Summary

In this notebook, you:
- Created a LangGraph state machine with Supervisor, RAG, and LLM nodes
- Used a router to decide paths based on classification
- Connected it to Chroma vectorstore for retrieval

### ❓ Is it an Agent?
Yes ✅ — it qualifies as a **basic decision-based agent**. The supervisor + routing + tool calling behavior is agentic, but not yet fully reactive or memory-enhanced.

### We use PydanticOutputParser in LangChain when:
* You want structured output from an LLM
* You want type-safe access to fields (like Topic, Reasoning, Summary, etc.)
* You want to validate the output against a schema