# Final Agent
Okay, now we're cooking. Our agent can access some data, that's super cool! But it's not the best now, is it? It can access data, but it loads the ENTIRE dataset into the chat! That's not practical. Especially if our dataset as very large.

What we'll do next is to QUERY data SMARTLY! We'll have agents write the SQL to query our data for us. And that will be the final form of our agent.

---

In this module, we'll make up some large mock dataset, and have the agent query from that as necessary with the SQL query that seems appropriate.

## 1 Configs

### 1.1 Installs
We'll be installig `duckdb` so we can do sql queries on pandas dataframes, again, educational purposes

In [0]:
%pip install "langgraph==0.0.36" "langchain>=0.1.20,<0.2.0" "langchain-core>=0.1.20,<0.2.0" requests duckdb

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
dbutils.library.restartPython() # Necessary for clearing cache and whatnots



### 1.2 Imports
Importing `pandas`, `pandassql`, and `numpy` to generate and query the data

In [0]:
from langgraph.graph import StateGraph, END
from langchain_core.runnables import RunnableLambda
from typing import Dict, List, Optional, TypedDict
import requests, json, textwrap, datetime
import json
import pandas as pd # <-- New Import
import duckdb # <-- New Import
import numpy as np # <-- New Import

### 1.3 Config Variables

In [0]:
CHAT_ENDPOINT = "databricks-llama-4-maverick" # Chat Model
INSTRUCT_ENDPOINT = "databricks-meta-llama-3-1-8b-instruct" # Instruct Model
DATABRICKS_URL = "https://dbc-864a442b-39b8.cloud.databricks.com" # The Base URL at the top
DATABRICKS_TOKEN = "dapi763c08facfcf240733ac46730443c6cf" # Your own token
VERBOSE = True  # global toggle to see hidden outputs

### 1.4 - Mock Data
Some fake sales data we can play with.

In [0]:
def make_up_mock_data(n = 1000):
    clients = ["Grandma's Cookies", "VolksWagen", "The President of the United States", "Carl", "Blockbuster"]
    products = ["Cookies", "Jet Engine", "Enriched Uranium", "Dirt", "Fake Promises"]
    df = pd.DataFrame({
        "client": np.random.choice(clients, n),
        "product": np.random.choice(products, n),
        "quantity": np.random.randint(0, 100, n),
        # Float between 0 and 1000
        "price": np.random.rand(n) * 1000,
        "date": pd.date_range(start="2024-01-01", end="2024-12-31", periods=n)
    })
    return df
sales_data = make_up_mock_data(1000)

schema_catalog = {
    "sales_data": (
        "The 'sales_data' table has the following columns:\n"
        "- client (string): the customer name\n"
        "- product (string): the item sold\n"
        "- quantity (int): number of units sold\n"
        "- price (float): price per unit\n"
        "- date (datetime): date of transaction\n"
    )
}

## 2 Defining Functions and Classes

Now here we'll change some things. 

The AgentState will include a new field called `schema_catalog`, so that any node can look at the data catalog and know its organized.

Instead of the router routing the state to a tool, it will route it to another agent (`sales_data_agent`), which will look at the state, and write a proper SQL query to use against the data, and send it to the `sales_data_tool` to apply it.

### 2.1 Classes

In [0]:
class AgentState(TypedDict, total=False):
    """Conversation state passed between graph nodes."""
    messages: List[Dict[str, str]]   # chat history in OpenAI‑style format
    verbose: bool                    # toggle debug prints
    output: Optional[str]            # assistant response
    available_tools: Optional[Dict[str, str]]   # names and descriptions of tools the router can pick
    tool_context: Optional[str]                 # extra context (cleared each turn)
    schema_catalog: Optional[Dict[str, str]]    # catalog of schemas and their descriptions

### 2.2 - Connection Function

This one remains unchanged, nothing new

In [0]:
# The databricks function we know well. I added some type restrictions so there's no confusion, but that's not
def databricks_llm(messages: List[Dict[str, str]], *, model_endpoint: str = CHAT_ENDPOINT, verbose: bool = False) -> str:
    """Call a Databricks serving endpoint that follows the OpenAI chat format."""
    if verbose:
        print("\n=== LLM CALL →", model_endpoint)
        for m in messages:
            print(f"{m['role'].upper()}: {textwrap.shorten(m['content'], width=120)}")

    headers = {
        "Authorization": f"Bearer {DATABRICKS_TOKEN}",
        "Content-Type":  "application/json"
    }
    body = {
        "messages":   messages,
        "temperature": 0.7,
        "max_tokens":  1000
    }

    resp = requests.post(f"{DATABRICKS_URL}/serving-endpoints/{model_endpoint}/invocations", headers=headers, json=body)
    resp.raise_for_status()
    content = resp.json()["choices"][0]["message"]["content"]

    if verbose:
        print("LLM RESPONSE:", content[:300] + ("…" if len(content) > 300 else ""))
    return content

### 2.3 - Defining Agents and Tools as Functions

In [0]:
# The chat agent will be the same as before, except we'll append the tool context to the chat prompt
def chat_agent(state: AgentState) -> AgentState:
    if state.get("verbose"): print("\n--- CHAT AGENT NODE ---")

    msgs = state["messages"]

    reply = databricks_llm(
        msgs + [{"role":"user", "content":f"TOOLS CONTEXT:\n{state['tool_context']}"}],
        model_endpoint=CHAT_ENDPOINT,
        verbose=state.get("verbose", False),
    )

    state["messages"] = state["messages"] + [{"role": "assistant", "content": reply}]
    state["output"]   = reply
    return state

In [0]:
def sales_data_agent(state: AgentState) -> AgentState:
    if state.get("verbose"): print("\n--- SALES DATA AGENT NODE ---")

    schema = state.get("schema_catalog", {}).get("sales_data", "")

    system_prompt = (
        "You are an assistant that writes SQL queries to analyze sales data.\n\n"
        f"{schema}\n\n"
        "Based on the conversation below, write a SQL query that answers the user's request.\n"
        "Respond with ONLY the query, wrapped like this:\n"
        "```sql\nSELECT ...\n```"
    )

    messages = [{"role": "system", "content": system_prompt}] + [
        m for m in state["messages"] if m["role"] != "system"
    ]

    llm_response = databricks_llm(
        messages,
        model_endpoint=CHAT_ENDPOINT,
        verbose=state.get("verbose", False),
    )

    try:
        start = llm_response.find("```")
        end = llm_response.rfind("```")
        sql = llm_response[start + 3:end].strip()
        if sql.lower().startswith("sql"):
            sql = sql[3:].strip()
    except Exception:
        sql = "(invalid SQL format)"

    if state.get("verbose"): print("Generated SQL:\n", sql)

    state["output"] = sql
    return state


In [0]:
def sales_data_tool(state: AgentState) -> AgentState:
    if state.get("verbose"):
        print("\n--- SALES DATA TOOL NODE ---")

    raw_sql = str(state["output"]).strip("`").strip()
    if raw_sql.lower().startswith("sql"):
        raw_sql = raw_sql[3:].strip()

    # DuckDB happily ignores a trailing semicolon
    query = raw_sql.rstrip(";")

    if state.get("verbose"):
        print("Executing SQL with DuckDB:\n", query)

    # run against the in-memory DataFrame
    try:
        result_df = duckdb.query_df(sales_data, "sales_data", query).to_df()
        context = result_df.to_string(index=False)
    except Exception as e:
        raise RuntimeError(f"SQL execution failed: {e}")  # halt on error

    if state.get("verbose"):
        print("SQL result:\n", context)

    state["tool_context"] = context
    state["output"]       = context
    return state


In [0]:
def router_agent(state: AgentState) -> AgentState:
    if state.get("verbose"): print("\n--- ROUTER NODE ---")

    tool_lines = [
        f"- {name}: {desc}"
        for name, desc in (state.get("available_tools") or {}).items()
    ]
    tool_catalog = "\n".join(tool_lines) or "none"

    system_prompt = (
        "You are an AI router. Choose the single best tool for answering the user's latest message.\n\n"
        f"Available tools:\n{tool_catalog}\n\n"
        "Return ONLY a JSON object like {\"tool\": \"chat\"} or {\"tool\": \"sales_data\"}."
    )

    llm_response = databricks_llm(
        [{"role": "system", "content": system_prompt}] +
        [m for m in state["messages"] if m["role"] != "system"],
        model_endpoint=INSTRUCT_ENDPOINT,
        verbose=state.get("verbose", False),
    )

    try:
        start = llm_response.rfind("{")
        end = llm_response.rfind("}")
        decision_json = llm_response[start:end + 1]
        decision = json.loads(decision_json)
    except Exception:
        decision = {"tool": "chat"}

    if state.get("verbose"): print(f"Routing decision: {decision}")
    state["output"] = json.dumps(decision)
    return state


## 3 Initializing Chat

### 3.1 - Defining Graph

Similar logic as last time, but now we'll connect the `sales_data_agent` to `sales_data_tool`, which will then go to `chat_agent`

In [0]:
g = StateGraph(AgentState)

g.add_node("router_agent", RunnableLambda(router_agent))
g.add_node("sales_data_agent", RunnableLambda(sales_data_agent))
g.add_node("sales_data_tool", RunnableLambda(sales_data_tool))
g.add_node("chat_agent", RunnableLambda(chat_agent))

g.set_entry_point("router_agent")

def pick_next(state: AgentState) -> str:
    return json.loads(state["output"])["tool"]

g.add_conditional_edges(
    "router_agent",
    pick_next,
    {
        "chat": "chat_agent",
        "sales_data": "sales_data_agent"
    },
)

g.add_edge("sales_data_agent", "sales_data_tool")
g.add_edge("sales_data_tool", "chat_agent")
g.add_edge("chat_agent", END)

assistant_graph = g.compile()

### 3.2 - Chat Loop

In [0]:
chat_history = [
    {'role': 'system', 'content': 'You are a helpful AI Agent. You have access to sales data if needed.'}
]
while True:
    user_text = input("You: ").strip()
    chat_history.append({"role": "user", "content": user_text})

    state: AgentState = {
        "messages": chat_history,
        "verbose": VERBOSE,
        "available_tools": {
            "chat": "Continue the conversation naturally",
            "sales_data": "Query the internal sales database using SQL"
        },
        "tool_context": None,
        "schema_catalog": schema_catalog
    }

    result = assistant_graph.invoke(state)
    chat_history.append({"role": "assistant", "content": result["output"]})
    if VERBOSE: print("\n---\n")
    print("Assistant:", result["output"])


You:  How many sales to Blockbuster do I have?


--- ROUTER NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message. Available tools: - [...]
USER: How many sales to Blockbuster do I have?
LLM RESPONSE: {"tool": "sales_data"}
Routing decision: {'tool': 'sales_data'}

--- SALES DATA AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick
SYSTEM: You are an assistant that writes SQL queries to analyze sales data. The 'sales_data' table has the following [...]
USER: How many sales to Blockbuster do I have?
LLM RESPONSE: ```sql
SELECT COUNT(*) 
FROM sales_data 
WHERE client = 'Blockbuster';
```
Generated SQL:
 SELECT COUNT(*) 
FROM sales_data 
WHERE client = 'Blockbuster';

--- SALES DATA TOOL NODE ---
Executing SQL with DuckDB:
 SELECT COUNT(*) 
FROM sales_data 
WHERE client = 'Blockbuster'

--- CHAT AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick
SYSTEM: You are a helpful AI Agent. You have access to sales data if nee

You:  Awesome, can you give me the total profit for those sales?


--- ROUTER NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message. Available tools: - [...]
USER: How many sales to Blockbuster do I have?
ASSISTANT: You have 211 sales to Blockbuster.
USER: Awesome, can you give me the total profit for those sales?
LLM RESPONSE: {"tool": "sales_data"}
Routing decision: {'tool': 'sales_data'}

--- SALES DATA AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick
SYSTEM: You are an assistant that writes SQL queries to analyze sales data. The 'sales_data' table has the following [...]
USER: How many sales to Blockbuster do I have?
ASSISTANT: You have 211 sales to Blockbuster.
USER: Awesome, can you give me the total profit for those sales?
LLM RESPONSE: To get the total profit from sales to Blockbuster, we first need to identify the total number of sales to Blockbuster and then calculate the total revenue. 

Let's directly write the SQL query fo

You:  What are all the clients I have?


--- ROUTER NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message. Available tools: - [...]
USER: How many sales to Blockbuster do I have?
ASSISTANT: You have 211 sales to Blockbuster.
USER: Awesome, can you give me the total profit for those sales?
ASSISTANT: The total profit for your 211 sales to Blockbuster is approximately $4,596,505.
USER: What are all the clients I have?
LLM RESPONSE: {"tool": "sales_data"}
Routing decision: {'tool': 'sales_data'}

--- SALES DATA AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick
SYSTEM: You are an assistant that writes SQL queries to analyze sales data. The 'sales_data' table has the following [...]
USER: How many sales to Blockbuster do I have?
ASSISTANT: You have 211 sales to Blockbuster.
USER: Awesome, can you give me the total profit for those sales?
ASSISTANT: The total profit for your 211 sales to Blockbuster is approximately 

You:  What's the most popular item sold to the president?


--- ROUTER NODE ---

=== LLM CALL → databricks-meta-llama-3-1-8b-instruct
SYSTEM: You are an AI router. Choose the single best tool for answering the user's latest message. Available tools: - [...]
USER: How many sales to Blockbuster do I have?
ASSISTANT: You have 211 sales to Blockbuster.
USER: Awesome, can you give me the total profit for those sales?
ASSISTANT: The total profit for your 211 sales to Blockbuster is approximately $4,596,505.
USER: What are all the clients I have?
ASSISTANT: You have the following clients: 1. Carl 2. Grandma's Cookies 3. The President of the United States 4. VolksWagen [...]
USER: What's the most popular item sold to the president?
LLM RESPONSE: {"tool": "sales_data"}
Routing decision: {'tool': 'sales_data'}

--- SALES DATA AGENT NODE ---

=== LLM CALL → databricks-llama-4-maverick
SYSTEM: You are an assistant that writes SQL queries to analyze sales data. The 'sales_data' table has the following [...]
USER: How many sales to Blockbuster do I have?
AS

You:  

com.databricks.backend.common.rpc.CommandCancelledException
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$5(SequenceExecutionState.scala:132)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3(SequenceExecutionState.scala:132)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3$adapted(SequenceExecutionState.scala:129)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.databricks.spark.chauffeur.SequenceExecutionState.cancel(SequenceExecutionState.scala:129)
	at com.databricks.spark.chauffeur.ExecContextState.cancelRunningSequence(ExecContextState.scala:715)
	at com.databricks.spark.chauffeur.ExecContextState.$anonfun$cancel$1(ExecContextState.scala:435)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.ExecContextState.cancel(ExecContextState.scala:435)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.can