# Final Agent
Okay, now we're cooking. Our agent can access some data, that's super cool! But it's not the best now, is it? It can access data, but it loads the ENTIRE dataset into the chat! That's not practical. Especially if our dataset as very large.

What we'll do next is to QUERY data SMARTLY! We'll have agents write the SQL to query our data for us. And that will be the final form of our agent.

---

In this module, we'll make up some large mock dataset, and have the agent query from that as necessary with the SQL query that seems appropriate.

## 1 Configs

### 1.1 Installs
We'll be installig `duckdb` so we can do sql queries on pandas dataframes, again, educational purposes

In [0]:
%pip install "langgraph==0.0.36" "langchain>=0.1.20,<0.2.0" "langchain-core>=0.1.20,<0.2.0" requests duckdb

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
dbutils.library.restartPython() # Necessary for clearing cache and whatnots



### 1.2 Imports
Importing `pandas`, `pandassql`, and `numpy` to generate and query the data

In [0]:
from langgraph.graph import StateGraph, END
from langchain_core.runnables import RunnableLambda
from typing import Dict, List, Optional, TypedDict
import requests, json, textwrap, datetime
import pandas as pd # <-- New Import
import duckdb       # <-- New Import
import numpy as np  # <-- New Import

### 1.3 Config Variables

In [0]:
# Chat Model.
CHAT_ENDPOINT = "databricks-llama-4-maverick"
# Instruct Model.
INSTRUCT_ENDPOINT = "databricks-meta-llama-3-1-8b-instruct" 
# The Base URL at the top, this is my URL, it won't work for you.
DATABRICKS_URL = "https://dbc-864a442b-39b8.cloud.databricks.com" 
# Your own token, to get this, to to your prfile (top right) -> Settings -> Developer -> Access Tokens (Manage) -> Generate new token.
DATABRICKS_TOKEN = "<MY_DATABRICKS_TOKEN>" 
# This is my token, it won't work for you.
DATABRICKS_TOKEN = "dapi763c08facfcf240733ac46730443c6cf"

# Global toggle to see hidden debugging outputs
VERBOSE = True  

### 1.4 - Mock Data
Some fake sales data we can play with.

In [0]:
def make_up_mock_data(n = 1000):
    clients = ["Grandma's Cookies", "VolksWagen", "The President of the United States", "Carl", "Blockbuster"]
    products = ["Cookies", "Jet Engine", "Enriched Uranium", "Dirt", "Fake Promises"]
    df = pd.DataFrame({
        "client": np.random.choice(clients, n),
        "product": np.random.choice(products, n),
        "quantity": np.random.randint(0, 100, n),
        # Float between 0 and 1000
        "price": np.random.rand(n) * 1000,
        "date": pd.date_range(start="2024-01-01", end="2024-12-31", periods=n)
    })
    return df
sales_data = make_up_mock_data(1000)

schema_catalog = {
    "sales_data": (
        "The 'sales_data' table has the following columns:\n"
        "- client (string): the customer name\n"
        "- product (string): the item sold\n"
        "- quantity (int): number of units sold\n"
        "- price (float): price per unit\n"
        "- date (datetime): date of transaction\n"
    )
}

sales_data

Unnamed: 0,client,product,quantity,price,date
0,The President of the United States,Jet Engine,36,791.771551,2024-01-01 00:00:00.000000000
1,VolksWagen,Jet Engine,28,431.421164,2024-01-01 08:46:07.567567567
2,Blockbuster,Cookies,82,872.171541,2024-01-01 17:32:15.135135135
3,Carl,Cookies,91,469.004449,2024-01-02 02:18:22.702702702
4,VolksWagen,Fake Promises,24,301.048121,2024-01-02 11:04:30.270270270
...,...,...,...,...,...
995,Carl,Jet Engine,74,966.105294,2024-12-29 12:55:29.729729728
996,VolksWagen,Jet Engine,44,99.508595,2024-12-29 21:41:37.297297296
997,Grandma's Cookies,Enriched Uranium,75,979.062153,2024-12-30 06:27:44.864864864
998,Carl,Cookies,11,105.744224,2024-12-30 15:13:52.432432432


## 2 Defining Functions and Classes

Now here we'll change some things. 

The AgentState will include a new field called `schema_catalog`, so that any node can look at the data catalog and know its organized.

Instead of the router routing the state to a tool, it will route it to another agent (`sales_data_agent`), which will look at the state, and write a proper SQL query to use against the data, and send it to the `sales_data_tool` to apply it.

### 2.1 Classes

In [0]:
class AgentState(TypedDict, total=False):
    """Conversation state passed between graph nodes."""
    chat_history: List[Dict[str, str]]   # chat history in OpenAI‑style format
    verbose: bool                    # toggle debug prints
    output: Optional[str]            # assistant response
    available_tools: Optional[Dict[str, str]]   # names and descriptions of tools the router can pick
    tool_context: Optional[str]                 # extra context (cleared each turn)
    schema_catalog: Optional[Dict[str, str]]    # catalog of schemas and their descriptions

### 2.2 - Connection Function

This one remains unchanged, nothing new

In [0]:
def databricks_llm(chat_history, model_endpoint, verbose=False):
    """Call a Databricks serving endpoint that follows the OpenAI chat format."""
    if verbose:
        print("\n=== LLM CALL →", model_endpoint, "===")
        for m in chat_history:
            print(f"{m['role'].upper()}: {m['content']}")

    headers = {
        "Authorization": f"Bearer {DATABRICKS_TOKEN}",
        "Content-Type":  "application/json"
    }
    body = {
        "messages":   chat_history,
        "temperature": 0.7,
        "max_tokens":  1000
    }

    resp = requests.post(f"{DATABRICKS_URL}/serving-endpoints/{model_endpoint}/invocations", headers=headers, json=body)
    resp.raise_for_status()
    content = resp.json()["choices"][0]["message"]["content"]

    if verbose: print("LLM RESPONSE:", content[:300] + ("…" if len(content) > 300 else ""))
    if verbose: print("=== LLM CALL END ===")
    return content


### 2.3 - Defining Agents and Tools as Functions

In [0]:
def router_agent(state: AgentState) -> AgentState:
    if state["verbose"]: print("\n--- ROUTER AGENT NODE ---")

    # Build tool list with descriptions
    tool_lines = [
        f"- {name}: {desc}"
        for name, desc in (state["available_tools"] or {}).items()
    ]
    tool_catalog = "\n".join(tool_lines) or "none"

    # The router agent has its own system prompt
    router_system_prompt = (
        "You are an AI router. Choose the single best tool for answering the user's latest message.\n\n"
        f"Available tools:\n{tool_catalog}\n\n"
        "Return ONLY a JSON object like {\"tool\": \"chat\"} or {\"tool\": \"sales_data\"}."
    )

    # Ignores all system prompts from the chat history
    modified_chat_hisotry = [{"role": "system", "content": router_system_prompt}] + [m for m in state["chat_history"] if m["role"] != "system"]

    # Getting the response from the LLM, should be something like: {"tool": "chat"}
    llm_response = databricks_llm(
        modified_chat_hisotry,
        model_endpoint=INSTRUCT_ENDPOINT, # Using the instruct endpoint
        verbose=state["verbose"]
    )

    start = llm_response.rfind("{")
    end = llm_response.rfind("}")
    decision_json = llm_response[start:end + 1]
    decision = json.loads(decision_json)

    if state["verbose"]: print(f"Extracted decision: {decision}")

    # Stash the JSON string in output; graph edges will parse it
    state["output"] = json.dumps(decision)

    if state["verbose"]: print("\n--- ROUTER AGENT NODE END ---")

    # Returns updated version of state
    return state


In [0]:
def sales_data_agent(state):
    if state["verbose"]: print("\n--- SALES DATA AGENT NODE ---")

    schema = state["schema_catalog"]["sales_data"]

    sales_system_prompt = (
        "You are an assistant that writes SQL queries to analyze sales data.\n\n"
        f"{schema}\n\n"
        "Based on the conversation below, write a SQL query that answers the user's request.\n"
        "Respond with ONLY the query, wrapped like this:\n"
        "```sql\nSELECT ...\n```"
    )

    # Ignores all system prompts from the chat history
    modified_chat_hisotry = [{"role": "system", "content": sales_system_prompt}] + [m for m in state["chat_history"] if m["role"] != "system"]

    llm_response = databricks_llm(
        modified_chat_hisotry,
        model_endpoint=CHAT_ENDPOINT,
        verbose=state.get("verbose", False),
    )

    start = llm_response.find("```")
    end = llm_response.rfind("```")
    sql = llm_response[start + 3:end].strip()
    if sql.lower().startswith("sql"):
        sql = sql[3:].strip()

    if state["verbose"]: print("Generated SQL:\n", sql)

    state["output"] = sql

    if state["verbose"]: print("\n--- SALES DATA AGENT NODE END ---")

    # Returns updated version of state
    return state

In [0]:
def sales_data_tool(state):
    if state["verbose"]:print("\n--- SALES DATA TOOL NODE ---")

    query = state["output"]

    if state["verbose"]: print("Executing SQL with DuckDB:\n", query)
        
    # run against the in-memory DataFrame
    result_df = duckdb.query_df(sales_data, "sales_data", query).to_df()
    context = result_df.to_string(index=False)

    state["tool_context"] = context
    state["output"]       = context

    if state["verbose"]:  print("SQL result:\n", context)

    if state["verbose"]: print("\n--- SALES DATA TOOL NODE END ---")

    # Returns updated version of state
    return state

In [0]:
# The chat agent will be the same as before
def chat_agent(state):
    if state["verbose"]: print("\n--- CHAT AGENT NODE ---")

    # We'll create a new variable for the chat history. The regular hitory, plut whatever contexts we get from the tools.
    appended_chat_history = state["chat_history"] + [{"role":"user", "content":f"TOOLS CONTEXT:\n{state['tool_context']}"}]

    reply = databricks_llm(
        appended_chat_history,
        model_endpoint=CHAT_ENDPOINT,
        verbose=state["verbose"]
    )

    state["chat_history"].append({"role": "assistant", "content": reply})
    state["output"]   = reply

    if state["verbose"]: print("\n--- CHAT AGENT NODE END ---")

    # Returns updated version of state
    return state

## 3 Initializing Chat

### 3.1 - Defining Graph

Similar logic as last time, but now we'll connect the `sales_data_agent` to `sales_data_tool`, which will then go to `chat_agent`

In [0]:
g = StateGraph(AgentState)

g.add_node("router_agent", RunnableLambda(router_agent))
g.add_node("sales_data_agent", RunnableLambda(sales_data_agent))
g.add_node("sales_data_tool", RunnableLambda(sales_data_tool))
g.add_node("chat_agent", RunnableLambda(chat_agent))

g.set_entry_point("router_agent")

def pick_next(state: AgentState) -> str:
    return json.loads(state["output"])["tool"]

g.add_conditional_edges(
    "router_agent",
    pick_next,
    {
        "chat": "chat_agent",
        "sales_data": "sales_data_agent"
    },
)

g.add_edge("sales_data_agent", "sales_data_tool")
g.add_edge("sales_data_tool", "chat_agent")
g.add_edge("chat_agent", END)

assistant_graph = g.compile()

### 3.2 - Chat Loop

In [0]:
chat_history = [
    {'role': 'system', 'content': 'You are a helpful AI Agent. You have access to sales data if needed.'}
]

available_tools = {
    "chat": "Continue the conversation naturally",
    "sales_data": "Query the internal sales database using SQL"
}

state = AgentState(
    chat_history=chat_history,
    verbose=VERBOSE,
    output=None,
    available_tools=available_tools,
    tool_context=None,
    schema_catalog=schema_catalog
)

while True:
    # Gets the user's prompt
    user_text = input("You: ").strip()
    # Exit strategy
    if user_text == "exit":
        break

    # Append the user's message to the chat history of the state
    state["chat_history"].append({"role": "user", "content": user_text})

    # Updates the state after going through the graph
    state = assistant_graph.invoke(state)

    # Resets the tool context once it's not longer needed
    state["tool_context"] = None

    print("Assistant:", state["output"])


You:  Good morning


--- ROUTER AGENT NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct ===
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message.

Available tools:
- chat: Continue the conversation naturally
- sales_data: Query the internal sales database using SQL

Return ONLY a JSON object like {"tool": "chat"} or {"tool": "sales_data"}.
USER: Good morning
LLM RESPONSE: {"tool": "chat"}
=== LLM CALL END ===
Extracted decision: {'tool': 'chat'}

--- ROUTER AGENT NODE END ---

--- CHAT AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick ===
SYSTEM: You are a helpful AI Agent. You have access to sales data if needed.
USER: Good morning
USER: TOOLS CONTEXT:
None
LLM RESPONSE: Good morning! How can I help you today?
=== LLM CALL END ===

--- CHAT AGENT NODE END ---
Assistant: Good morning! How can I help you today?


You:  What are all my clients in my database?


--- ROUTER AGENT NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct ===
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message.

Available tools:
- chat: Continue the conversation naturally
- sales_data: Query the internal sales database using SQL

Return ONLY a JSON object like {"tool": "chat"} or {"tool": "sales_data"}.
USER: Good morning
ASSISTANT: Good morning! How can I help you today?
USER: What are all my clients in my database?
LLM RESPONSE: {"tool": "sales_data"}
=== LLM CALL END ===
Extracted decision: {'tool': 'sales_data'}

--- ROUTER AGENT NODE END ---

--- SALES DATA AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick ===
SYSTEM: You are an assistant that writes SQL queries to analyze sales data.

The 'sales_data' table has the following columns:
- client (string): the customer name
- product (string): the item sold
- quantity (int): number of units sold
- price (float): price per unit
- date (datetime): date 

You:  How much money did I make from the president?


--- ROUTER AGENT NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct ===
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message.

Available tools:
- chat: Continue the conversation naturally
- sales_data: Query the internal sales database using SQL

Return ONLY a JSON object like {"tool": "chat"} or {"tool": "sales_data"}.
USER: Good morning
ASSISTANT: Good morning! How can I help you today?
USER: What are all my clients in my database?
ASSISTANT: It looks like I have access to your client database. According to the data, your clients are:

1. Carl
2. The President of the United States
3. VolksWagen
4. Blockbuster
5. Grandma's Cookies

Is there anything specific you'd like to know or do with this information?
USER: How much money did I make from the president?
LLM RESPONSE: {"tool": "sales_data"}
=== LLM CALL END ===
Extracted decision: {'tool': 'sales_data'}

--- ROUTER AGENT NODE END ---

--- SALES DATA AGENT NODE ---

=== LLM CA

You:  Awesome, what was the most popular product sold to the president?


--- ROUTER AGENT NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct ===
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message.

Available tools:
- chat: Continue the conversation naturally
- sales_data: Query the internal sales database using SQL

Return ONLY a JSON object like {"tool": "chat"} or {"tool": "sales_data"}.
USER: Good morning
ASSISTANT: Good morning! How can I help you today?
USER: What are all my clients in my database?
ASSISTANT: It looks like I have access to your client database. According to the data, your clients are:

1. Carl
2. The President of the United States
3. VolksWagen
4. Blockbuster
5. Grandma's Cookies

Is there anything specific you'd like to know or do with this information?
USER: How much money did I make from the president?
ASSISTANT: It seems like I can access some sales data. According to the data, the total revenue generated from transactions related to "The President of the United States" is

You:  exit