# Evaluate and Trace LangGraph MCP Tool Selection

This notebook demonstrates how to use TruLens to trace and evaluate LangGraph applications that use Model Context Protocol (MCP) tools.

## Overview

We'll build a health research agent that uses MCP tools to query:
- PubMed for medical literature
- Clinical trials databases for trial information

TruLens will automatically trace all tool calls, showing:
- Which MCP tools are being called
- Input arguments and outputs
- Execution time and errors
- Full conversation flow in the dashboard

## Setup

First, start by updating your account settings to create an allow-all network policy, and enable cross-region inference for calling `claude-sonnet-4-5`. You can do this by copying and running `alter_account_settings.sql` in a Snowflake SQL worksheet.


## Get AI-ready data

Then, in your snowflake accoount access the Clinical Trials and PubMed listings and load them to your account following the below steps:

1. Sign in to Snowsight.

2. In the navigation menu, select Marketplace.

3. Search or browse to the listing you want to access.

4. Select Get to access a listing already available in your region. A dialog opens with details about the listing. If you have to request the listing to be replicated to your region, select Request.

5. (Optional) Specify a database name for the data in the listing.

6. (Optional) Add roles to grant access to the database created from the listing.

7. Select Get.

8. In the confirmation dialog that appears, select Open to open a Snowsight worksheet with an example query in a new tab, or select Done.

In [None]:
import os
# Configure API keys and MCP server connection
os.environ["OPENAI_API_KEY"] = "sk-proj-..."
os.environ["SNOWFLAKE_PAT"] = "ey..."
os.environ["SNOWFLAKE_ACCOUNT"] = "..."
os.environ["SNOWFLAKE_USER"] = "..."
os.environ["SNOWFLAKE_MCP_SERVER_URL"] = "https://<account-id>.snowflakecomputing.com/api/v2/databases/health_db/schemas/mcp/mcp-servers/health_mcp_server"

In [None]:
from snowflake.snowpark import Session
import os

snowflake_connection_parameters = {
    "account": os.getenv("SNOWFLAKE_ACCOUNT"),
    "user": os.getenv("SNOWFLAKE_USER"),
    "password": os.getenv("SNOWFLAKE_PAT"),
    "database": "health_db",
    "schema": "public",
}

snowpark_session = Session.builder.configs(
    snowflake_connection_parameters
).create()

## Create MCP Client and Get Tools

We'll use the `MultiServerMCPClient` from `langchain_mcp_adapters` to connect to the health research MCP server and retrieve available tools.


In [None]:
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain_snowflake import ChatSnowflake
from langchain_openai import ChatOpenAI
from langgraph.graph import START
from langgraph.graph import MessagesState
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition

client = MultiServerMCPClient({
    "health_research": {
        "transport": "streamable_http",
        "url": os.environ["SNOWFLAKE_MCP_SERVER_URL"],
        "headers": {"Authorization": f"Bearer {os.environ['SNOWFLAKE_PAT']}"},
    }
})
tools = await client.get_tools()
model = ChatOpenAI(model="gpt-4o")
# model = ChatSnowflake(model="claude-sonnet-4-5", snowpark_session=snowpark_session)

## Build the LangGraph Agent

Now we'll create a LangGraph application with:
1. **call_model** node - The LLM that decides which tools to use
2. **tools** node - Executes the selected MCP tools
3. **tools_condition** - Routes between the model and tools

The graph will loop between the model and tools until the agent has enough information to answer the question.


In [None]:
import asyncio

# Define the call_model function
async def call_model(state: MessagesState):
    response = await model.bind_tools(tools).ainvoke(state["messages"])
    return {"messages": response}


# Create the StateGraph
builder = StateGraph(MessagesState)
builder.add_node(call_model)
builder.add_node(ToolNode(tools))
builder.add_edge(START, "call_model")
builder.add_conditional_edges(
    "call_model",
    tools_condition,
)
builder.add_edge("tools", "call_model")
graph = builder.compile()

class Agent:
    def __init__(self, graph):
        import nest_asyncio
        nest_asyncio.apply()
        self.graph = graph

    async def ainvoke(self, messages):
        """Async version"""
        return await self.graph.ainvoke({"messages": messages})
    
    def invoke(self, messages):
        """Sync wrapper around async method"""
        return asyncio.run(self.ainvoke(messages))

agent = Agent(graph)

In [None]:
# response = agent.invoke("How do semaglutide and tirzepatide compare in published studies, and what head-to-head clinical trials are recruiting patients?")

## Initialize TruLens Session

Set up TruLens to store traces and evaluations in Snowflake.

In [None]:
from snowflake.snowpark import Session
from trulens.connectors.snowflake import SnowflakeConnector

sf_connector = SnowflakeConnector(snowpark_session=snowpark_session)

## Create Tool Selection Evaluations

In [None]:
from trulens.core.feedback.custom_metric import MetricConfig
from trulens.core.feedback.selector import Selector
from trulens.providers.cortex import Cortex

provider = Cortex(
    model_engine="claude-sonnet-4-5", snowpark_session=snowpark_session
)
f_tool_selection = MetricConfig(
    metric_name = "Tool Selection",
    metric_implementation = provider.tool_selection_with_cot_reasons,
    selectors={
        "trace": Selector(trace_level=True),
    },
)

f_tool_calling = MetricConfig(
    metric_name = "Tool Calling",
    metric_implementation = provider.tool_calling_with_cot_reasons,
    selectors={
        "trace": Selector(trace_level=True),
    },
)

metrics_to_compute = [
    f_tool_selection,
    f_tool_calling,
]

## Record Agent Execution with TruLens

Wrap the LangGraph application with `TruGraph` to automatically instrument and trace all executions.

TruLens will capture:
- Each node execution in the graph
- MCP tool calls with their names (e.g., `pubmed_search`, `clinical_trials_search`)
- Input/output states at each step
- LLM generation calls
- Tool routing decisions

The trace will show the complete flow of the agent's reasoning and tool usage.


In [None]:
from trulens.apps.langgraph import TruGraph

tru_app = TruGraph(
    app=agent,
    app_name="healthagent8",
    app_version="base",
    main_method=agent.invoke,
    connector=sf_connector,
)

In [None]:
import pandas as pd

queries = ["How do semaglutide and tirzepatide compare in published studies, and what head-to-head clinical trials are recruiting patients?",
"What are the latest clinical trials for Alzheimer's disease?",
"What is the primary indicator for the drug Xeljanz?"]

queries_df = pd.DataFrame(queries, columns=["query"])

In [None]:
import uuid

from trulens.core.run import Run
from trulens.core.run import RunConfig

run_name = f"health_queries_run_{uuid.uuid4()}"

run_config = RunConfig(
    run_name=run_name,
    dataset_name="health_research_queries",
    source_type="DATAFRAME",
    dataset_spec={"RECORD_ROOT.INPUT": "query"},
)

run: Run = tru_app.add_run(run_config=run_config)

In [None]:
run.start(input_df=queries_df)

## Compute Metrics

In [None]:
import time

while run.get_status() != "INVOCATION_COMPLETED":
    time.sleep(3)

run.compute_metrics(metrics_to_compute)

In [None]:
run.get_status()

## View the Agent Response

Let's see what the agent found:
