## Summary: Observability with MLflow and LangGraph  
Overview  
This demo introduces MLflow observability into LangGraph workflows. By tracing and logging each step of a workflow—including LLM invocations and tool usage—developers can monitor, debug, and analyze their pipelines directly from the MLflow UI.

Key Steps Covered  
1. MLflow Setup  
a. Tracking Configuration
A local MLflow server is assumed to be running at http://127.0.0.1:5000.

This address is set as the MLflow tracking URI.


mlflow.set_tracking_uri("http://127.0.0.1:5000")
The experiment is named "udacity".

mlflow.set_experiment("udacity")
b. Manual Trace Example
A simple add() function is traced with @mlflow.trace.

Inputs and outputs are automatically logged to the MLflow UI.


@mlflow.trace

def add(a, b):

  return a + b

add(1, 2)

In [1]:
import mlflow
import os
from typing import Dict
from tavily import TavilyClient
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import MessagesState
from langgraph.graph import START, END, StateGraph
from langgraph.prebuilt import ToolNode
from langchain_core.messages import HumanMessage, SystemMessage
from IPython.display import Image, display

In [2]:
tracking_uri = "http://127.0.0.1:5000"
mlflow.set_tracking_uri(tracking_uri)

In [3]:
experiment = mlflow.set_experiment("udacity")

# doesn't work outside of the training? need to 

MlflowException: API request to http://127.0.0.1:5000/api/2.0/mlflow/experiments/get-by-name failed with exception HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: /api/2.0/mlflow/experiments/get-by-name?experiment_name=udacity (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001D5892D2030>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

In [None]:
@mlflow.trace
def add(a, b):
    return a + b

In [None]:
add(1, 2)

## 3. LangChain Autologging Integration  
mlflow.langchain.autolog() is called to enable automatic logging of LangChain events:

LLM inputs and outputs

Tool call traces

Message sequences

Token usage and performance metrics


mlflow.langchain.autolog()

In [None]:
from dotenv import load_dotenv
load_dotenv()
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.0,
)

In [None]:
@tool
def web_search(question:str)->Dict:
    """
    Return top search results for a given search query
    """
    tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
    response = tavily_client.search(question)
    return response

In [None]:
class State(MessagesState):
    question: str
    answer: str

In [None]:
llm_with_tools = llm.bind_tools([web_search])

In [None]:
def entry_point(state: State):
    question = state["question"]
    system_message = SystemMessage("You conduct web search to respond to user's questions")
    human_message = HumanMessage(question)
    messages = [system_message, human_message]
    return {"messages": messages}

In [None]:
def agent(state: State):
    messages = state["messages"]
    ai_message = llm_with_tools.invoke(messages)
    return {"messages": ai_message, "answer": ai_message.content}

In [None]:
def router(state: MessagesState):
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"

    return END

In [None]:
workflow = StateGraph(State)
workflow.add_node("entry_point", entry_point)
workflow.add_node("agent", agent)
workflow.add_node("tools", ToolNode([web_search]))

workflow.add_edge(START, "entry_point")
workflow.add_edge("entry_point", "agent")
workflow.add_conditional_edges(
    source="agent", 
    path=router, 
    path_map=["tools", END]
)
workflow.add_edge("tools", "agent")

In [None]:
memory = MemorySaver()
graph = workflow.compile(
    interrupt_before=["tools"], 
    checkpointer=memory
)

In [None]:
display(
    Image(
        graph.get_graph().draw_mermaid_png()
    )
)

In [None]:
mlflow.langchain.autolog()

## 4. Execution Example  
Input question: "What is the capital of Brazil?"

Initial invocation:  

System and human messages are appended.

The agent node recognizes the need for a tool call (no direct answer yet).

MLflow logs the trace up to the tool call breakpoint.
  
a. Tool Node Execution  
The tool node (Tavily web search) is executed.

Output: Top results related to the question are logged.

b. Final Agent Node Execution  
With the web search response in memory, the agent generates a complete answer:

"The capital of Brazil is Brasília..."

Includes citations or source links when available.

In [None]:
input_question = {"question": "what's the capital of Brazil?"}
config = {"configurable": {"thread_id": 1}}

In [None]:
for event in graph.stream(input=input_question, config=config, stream_mode="values"):
    if not event['messages']:
        continue
    event['messages'][-1].pretty_print()

In [None]:
state = graph.get_state(config=config)

In [None]:
state.next

In [None]:
for event in graph.stream(input=None, config=config, stream_mode="values"):
    if not event['messages']:
        continue
    event['messages'][-1].pretty_print()

## 5. Reviewing the Trace in MLflow UI  
Each node’s inputs and outputs are logged:

Entry point: initial user input and message formatting.

Agent: LLM messages and tool call info.

Tool: external API invocation and response data.

Final agent call: formatted answer to the user.

MLflow panels show:

Run timeline

Artifact logs

Token counts

Inputs/outputs per node

## 6. Key Concepts Highlighted  
Breakpoints provide pause-and-inspect control.

Traces log the full context of decision-making and tool use.

MLflow + LangChain combination brings transparency and observability to LLM-based workflows.

Easy to debug issues, understand performance, and trace final outputs to original prompts or tool responses.

## 7. Conclusion  
Integrating MLflow with LangGraph and LangChain gives developers critical insight into how AI workflows behave.

Each component’s behavior becomes traceable, auditable, and optimizable.

This observability is crucial for safe, production-grade LLM applications.