# MLflow 08: Advanced Function-Calling and Agent-to-Agent Protocols in LLM Apps

Welcome to Notebook 8 of our MLflow series! In [Notebook 7](MLflow_07_Tool_Calling_Agents_with_LangGraph_Ollama_and_MLflow.ipynb), we built our first tool-calling agent using LangGraph and Ollama, tracing its actions with MLflow. Now, we're taking it a step further by exploring:

1.  **Advanced Function Calling:** Using Pydantic models to define robust schemas for our tools, enabling LLMs to generate more structured and reliable arguments for function execution.
2.  **Agent-to-Agent (A2A) Interaction Concepts:** Designing systems where multiple specialized agents (or agentic components) collaborate to achieve a common goal. We'll implement a hierarchical agent system using LangGraph.

We'll continue to use **LangGraph** for building these sophisticated agentic workflows, **Ollama** for local LLM capabilities (with models like `phi3:mini` or `Qwen3-0.6B`), and **MLflow Tracing** to meticulously capture and visualize these complex interactions.

![Agent Collaboration Concept](https://www.researchgate.net/publication/344421802/figure/fig1/AS:942773028655106@1601794726089/Conceptual-model-of-a-multi-agent-system-MAS-architecture-The-MAS-is-composed-of.png)

Let's dive into building more intelligent and collaborative LLM applications!

---

## Table of Contents

1. [Recap: Tool-Calling Agents](#recap-tool-calling)
2. [Advanced Function Calling with Pydantic](#advanced-function-calling-pydantic)
    - [Why Pydantic for Tool Schemas?](#why-pydantic)
    - [Defining Tools with Pydantic Schemas](#defining-pydantic-tools)
    - [Integrating Pydantic Tools with LangChain/LangGraph](#integrating-pydantic-tools)
3. [Setting Up the Multi-Agent Environment](#setting-up-multi-agent-env)
    - [Installing Libraries](#installing-libraries-multi-agent)
    - [Ollama and LLM Setup](#ollama-llm-setup-multi-agent)
    - [MLflow Configuration (Tracing Focus)](#mlflow-config-multi-agent)
4. [Building a Hierarchical Multi-Agent System with LangGraph](#building-hierarchical-mas)
    - [Scenario: Automated Content Generation Assistant](#scenario-content-gen)
    - [Defining Agent Roles and Tools (with Pydantic)](#defining-agent-roles-tools)
        - Tool 1: Market Research Tool (mock)
        - Tool 2: Slogan Generation Tool (mock LLM call)
    - [Defining the Overall Agent State](#defining-overall-agent-state)
    - [Supervisor Agent Logic (Router)](#supervisor-agent-logic)
    - [Worker Agent Nodes (Tool Executor, specialized LLM calls)](#worker-agent-nodes)
    - [Constructing the Multi-Agent Graph](#constructing-multi-agent-graph)
5. [Running and Tracing the Multi-Agent System](#running-tracing-multi-agent)
    - [Invoking the Multi-Agent System](#invoking-multi-agent)
    - [Analyzing Multi-Agent Traces in MLflow UI](#analyzing-multi-agent-traces)
6. [Conceptual Overview: Other Agent-to-Agent Protocols](#overview-a2a-protocols)
    - [Message Passing, Shared State, Blackboard Systems](#message-passing-shared-state)
    - [Protocols like MCP (Multi-Agent Communication Protocol)](#mcp-concept)
7. [Key Takeaways](#key-takeaways-multi-agent)
8. [Engaging Resources and Further Reading](#resources-further-reading-multi-agent)

---

## 1. Recap: Tool-Calling Agents

In [Notebook 7](MLflow_07_Tool_Calling_Agents_with_LangGraph_Ollama_and_MLflow.ipynb), we built an agent that could decide to use simple tools based on user queries. Key components included:
- An LLM (from Ollama) for reasoning and tool selection.
- Tools defined as Python functions (e.g., weather, calculator).
- LangGraph to orchestrate the flow: LLM call -> (optional) Tool Execution -> LLM call for final response.
- MLflow to trace these interactions.

Now, we'll enhance the robustness of tool definitions and explore how multiple agent-like components can collaborate.

---

## 2. Advanced Function Calling with Pydantic

LLMs that support function calling (or tool calling) are trained to generate a structured JSON object containing the name of the function to call and the arguments to pass to it. Providing a clear and precise schema for these functions is crucial for reliable performance.

### Why Pydantic for Tool Schemas?
**Pydantic** is a data validation and settings management library using Python type annotations. Using Pydantic models to define the expected arguments for your tools offers several advantages:
- **Clear Schemas:** Type hints define the expected data types, descriptions, and whether fields are required or optional.
- **Automatic JSON Schema Generation:** Pydantic models can automatically generate JSON Schemas, which is the format most LLMs expect for tool definitions.
- **Data Validation:** When the LLM generates arguments, Pydantic can validate them against your schema before your tool code even runs, catching errors early.
- **IDE Support:** Better autocompletion and type checking in your development environment.

LangChain has excellent integration with Pydantic for defining tool arguments [1, 2].

### Defining Tools with Pydantic Schemas
Let's redefine a simple tool using Pydantic for its arguments.

In [None]:
from pydantic import BaseModel, Field
from langchain_core.tools import tool # Re-import for clarity if needed

class SearchToolInput(BaseModel):
    query: str = Field(description="The search query string for information retrieval.")
    num_results: int = Field(default=3, description="The maximum number of search results to return.")

@tool(args_schema=SearchToolInput)
def web_search_mock(query: str, num_results: int = 3) -> str:
    """Simulates a web search for a given query and returns a specified number of mock results."""
    print(f"--- Tool Called: web_search_mock(query='{query}', num_results={num_results}) ---")
    # In a real scenario, this would call a search API (Google, Bing, Tavily, etc.)
    mock_results = [
        f"Mock result 1 for '{query}': The importance of AI in modern technology.",
        f"Mock result 2 for '{query}': Recent advancements in renewable energy sources.",
        f"Mock result 3 for '{query}': A guide to effective project management.",
        f"Mock result 4 for '{query}': The history of the internet.",
        f"Mock result 5 for '{query}': Understanding climate change impacts."
    ]
    return f"Found {min(num_results, len(mock_results))} results for '{query}':\n" + "\n".join(mock_results[:num_results])

print("Web search tool with Pydantic schema defined.")
# You can inspect the JSON schema LangChain generates for the LLM:
# from langchain_core.utils.function_calling import convert_to_openai_tool
# print(convert_to_openai_tool(web_search_mock))

When this tool is provided to an LLM capable of function calling, the LLM will receive the JSON schema derived from `SearchToolInput`. If it decides to call `web_search_mock`, it will attempt to generate arguments matching this schema (e.g., `{"query": "AI benefits", "num_results": 2}`).

---

## 3. Setting Up the Multi-Agent Environment

### Installing Libraries

In [None]:
!pip install --quiet mlflow langchain langgraph langchain_community langchain_core langchain_ollama pydantic tiktoken

import mlflow
import os
import operator
from typing import TypedDict, Annotated, List, Union, Optional, Sequence
import json # For pretty printing tool calls

from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, ToolMessage, SystemMessage
# from langchain_core.tools import tool # Already imported
from langchain_ollama.chat_models import ChatOllama
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver # If needed for more complex state persistence

print(f"MLflow Version: {mlflow.__version__}")
import langchain
print(f"Langchain Version: {langchain.__version__}") 
import langgraph
print(f"LangGraph Version: {langgraph.__version__}")

### Ollama and LLM Setup
We'll use `phi3:mini` again, as it's efficient and supports function calling well. Ensure Ollama is running and the model is pulled.

In [None]:
ollama_model_name = "phi3:mini" # Or "qwen3:0.6b" / "llama3:8b"
llm = None
try:
    llm = ChatOllama(
        model=ollama_model_name, 
        temperature=0, # Deterministic for agent decisions
        keep_alive="5m"
    )
    response_test = llm.invoke("Test connection to Ollama.")
    print(f"Ollama ({ollama_model_name}) connected. Test response: {response_test.content[:50]}...")
except Exception as e:
    print(f"Error connecting to Ollama ({ollama_model_name}): {e}. Ensure Ollama is running and model is pulled.")

### MLflow Configuration (Tracing Focus)

In [None]:
mlflow.set_tracking_uri('mlruns')
experiment_name = "LangGraph_Advanced_FunctionCalling_A2A"
mlflow.set_experiment(experiment_name)

mlflow.langchain.autolog(log_models=False, 
                         log_input_examples=True, 
                         log_output_examples=True, 
                         extra_tags={"agent_type": "HierarchicalLangGraph"})

print(f"MLflow Experiment set to: {experiment_name}. Autologging enabled.")

---

## 4. Building a Hierarchical Multi-Agent System with LangGraph

We'll simulate a multi-agent system where a "Supervisor" agent coordinates tasks between specialized "Worker" agents. This is a common and powerful pattern for building complex applications.

### Scenario: Automated Content Generation Assistant
**User Request:** "Draft an email to potential investors about our new AI-powered recipe generation app. Highlight its benefits, mention the current market size for food tech apps (use web search), and include a short, catchy slogan for the app."

**Agent Roles:**
- **Supervisor Agent:** Receives the main request, decides which worker agent needs to act next (researcher or slogan writer), routes tasks, and aggregates results for final drafting.
- **Market Researcher Agent (Worker 1):** Uses the `web_search_mock` tool to find market size information.
- **Slogan Writer Agent (Worker 2):** Generates a catchy slogan (we'll simulate this with a dedicated LLM call, or a simple tool).
- **Email Drafter (Final Step):** An LLM call that takes all gathered information and drafts the final email.

### Defining Agent Roles and Tools (with Pydantic)

#### Tool 1: Market Research Tool (already defined `web_search_mock`)

#### Tool 2: Slogan Generation Tool

In [None]:
class SloganToolInput(BaseModel):
    product_name: str = Field(description="The name of the product or app.")
    product_description: str = Field(description="A brief description of the product or app.")

@tool(args_schema=SloganToolInput)
def generate_slogan_mock(product_name: str, product_description: str) -> str:
    """Generates a catchy slogan for a given product name and description. (Mock implementation)"""
    print(f"--- Tool Called: generate_slogan_mock(product_name='{product_name}', description='{product_description[:30]}...') ---")
    # In a real scenario, this could be another LLM call with a specific prompt for slogan generation.
    if "recipe" in product_description.lower() or "recipe" in product_name.lower():
        return f"'{product_name}: Your Kitchen's AI Companion!' or '{product_name}: Cook Smarter, Not Harder!'"
    else:
        return f"'{product_name}: Innovation Delivered!'"

all_tools = [web_search_mock, generate_slogan_mock]
if llm:
    llm_with_all_tools = llm.bind_tools(all_tools)
    print("LLM bound with all defined tools.")
else:
    llm_with_all_tools = None

### Defining the Overall Agent State
The state needs to hold the initial request, intermediate results from worker agents (market research, slogan), and the final drafted email.

In [None]:
class MasterAgentState(TypedDict):
    original_request: str
    messages: Annotated[List[BaseMessage], operator.add] # Conversation history, tool calls/responses
    market_research_data: Optional[str] = None
    slogan_text: Optional[str] = None
    drafted_email: Optional[str] = None
    next_action: Optional[str] = None # To guide the supervisor: 'research', 'slogan', 'draft', 'finish'

print("MasterAgentState defined.")

### Supervisor Agent Logic (Router)
The supervisor decides the next step based on the current state.

In [None]:
def supervisor_router_logic(state: MasterAgentState) -> str:
    """Decides the next action based on what information is still needed."""
    print("--- Node: supervisor_router_logic ---")
    if state.get("market_research_data") is None:
        print("  Decision: Need market research.")
        return "route_to_researcher"
    elif state.get("slogan_text") is None:
        print("  Decision: Need slogan.")
        return "route_to_slogan_writer"
    elif state.get("drafted_email") is None:
        print("  Decision: Ready to draft email.")
        return "route_to_drafter"
    else:
        print("  Decision: All tasks complete. Finish.")
        return END # All done

print("Supervisor router logic defined.")

### Worker Agent Nodes

In [None]:
def market_researcher_node(state: MasterAgentState) -> dict:
    """Worker agent node that performs market research using the web_search_mock tool."""
    print("--- Node: market_researcher_node ---")
    if not llm_with_all_tools:
        return {"market_research_data": "Error: LLM not initialized for researcher."}
        
    # Craft a specific prompt for the LLM to call the web_search_mock tool
    research_prompt = HumanMessage(content=f"Find market size information for food tech apps. Original request: {state['original_request']}")
    print(f"  Researcher sending to LLM: {research_prompt.content}")
    
    # LLM decides to call the tool
    ai_response = llm_with_all_tools.invoke([research_prompt])
    print(f"  Researcher LLM response (tool call expected): {ai_response}")
    
    if ai_response.tool_calls:
        tool_call = ai_response.tool_calls[0] # Assume first tool call is the relevant one
        if tool_call['name'] == web_search_mock.name:
            tool_output = web_search_mock.invoke(tool_call['args'])
            print(f"  Market research tool output: {tool_output}")
            return {"market_research_data": str(tool_output), "messages": [ai_response, ToolMessage(content=str(tool_output), tool_call_id=tool_call['id'])]}
    return {"market_research_data": "Market research failed or tool not called.", "messages": [ai_response]}

def slogan_writer_node(state: MasterAgentState) -> dict:
    """Worker agent node that generates a slogan using the generate_slogan_mock tool."""
    print("--- Node: slogan_writer_node ---")
    if not llm_with_all_tools:
        return {"slogan_text": "Error: LLM not initialized for slogan writer."}

    slogan_prompt = HumanMessage(content=f"Generate a catchy slogan for an AI-powered recipe generation app. Original request: {state['original_request']}")
    print(f"  Slogan writer sending to LLM: {slogan_prompt.content}")
    
    ai_response = llm_with_all_tools.invoke([slogan_prompt])
    print(f"  Slogan writer LLM response (tool call expected): {ai_response}")

    if ai_response.tool_calls:
        tool_call = ai_response.tool_calls[0]
        if tool_call['name'] == generate_slogan_mock.name:
            tool_output = generate_slogan_mock.invoke(tool_call['args'])
            print(f"  Slogan generation tool output: {tool_output}")
            return {"slogan_text": str(tool_output), "messages": [ai_response, ToolMessage(content=str(tool_output), tool_call_id=tool_call['id'])]}
    return {"slogan_text": "Slogan generation failed or tool not called.", "messages": [ai_response]}

def email_drafter_node(state: MasterAgentState) -> dict:
    """Node that drafts the final email using all gathered information."""
    print("--- Node: email_drafter_node ---")
    if not llm: # Use the base LLM without tools for drafting, or llm_with_all_tools if it's general purpose
        return {"drafted_email": "Error: LLM not initialized for drafting."}

    draft_prompt_text = (
        f"Draft a compelling email to potential investors about our new AI-powered recipe generation app. "
        f"Original user request: '{state['original_request']}'.\n"
        f"Key Benefits: (The app is AI-powered, helps generate recipes, etc. - infer from original request or add more context here).\n"
        f"Market Size Information: {state['market_research_data']}\n"
        f"Catchy Slogan: {state['slogan_text']}\n\n"
        f"Please write the full email now based on this information."
    )
    print(f"  Email drafter sending to LLM: {draft_prompt_text[:200]}...")
    
    # Direct LLM call for drafting
    final_email_response = llm.invoke([HumanMessage(content=draft_prompt_text)])
    drafted_email = final_email_response.content
    print(f"  Drafted Email: {drafted_email}")
    return {"drafted_email": drafted_email, "messages": [final_email_response]}

print("Worker agent nodes defined.")

### Constructing the Multi-Agent Graph

In [None]:
multi_agent_workflow = StateGraph(MasterAgentState)

multi_agent_workflow.add_node("supervisor", supervisor_router_logic)
multi_agent_workflow.add_node("research_worker", market_researcher_node)
multi_agent_workflow.add_node("slogan_worker", slogan_writer_node)
multi_agent_workflow.add_node("drafting_worker", email_drafter_node)

multi_agent_workflow.set_entry_point("supervisor")

# Conditional edges from supervisor to workers or END
multi_agent_workflow.add_conditional_edges(
    start_node_name="supervisor",
    condition=lambda state: state["next_action"], # This assumes supervisor_router_logic sets 'next_action'
                                                  # Let's refine supervisor_router_logic to set next_action or directly return node name
                                                  # For simplicity, we'll use the direct return value from supervisor_router_logic
    condition=supervisor_router_logic, # The condition function itself returns the name of the next node or END
    conditional_edge_mapping={
        "route_to_researcher": "research_worker",
        "route_to_slogan_writer": "slogan_worker",
        "route_to_drafter": "drafting_worker",
        END: END
    }
)

# Edges from workers back to the supervisor to re-evaluate state
multi_agent_workflow.add_edge("research_worker", "supervisor")
multi_agent_workflow.add_edge("slogan_worker", "supervisor")
multi_agent_workflow.add_edge("drafting_worker", "supervisor") # After drafting, supervisor will see all fields are filled and END

if llm: # Only compile if LLM is available
    multi_agent_app = multi_agent_workflow.compile()
    print("Multi-agent LangGraph app compiled.")
    # from IPython.display import Image, display
    # try: display(Image(multi_agent_app.get_graph().draw_mermaid_png()))
    # except: print("Graphviz not installed, skipping graph draw.")
else:
    print("LLM not initialized, skipping multi-agent graph compilation.")
    multi_agent_app = None

![MLFlow Workflow](https://mlflow.org/docs/latest/assets/images/learn-core-components-b2c38671f104ca6466f105a92ed5aa68.png)

---

## 5. Running and Tracing the Multi-Agent System

### Invoking the Multi-Agent System
We'll provide the initial user request. MLflow autologging should capture the entire trace.

In [None]:
user_query_complex = "Draft an email to potential investors about our new AI-powered recipe generation app called 'ChefMate'. Highlight its benefits like personalized meal plans and easy grocery list generation. Mention the current market size for food tech apps (use web search for 'food tech app market size'), and include a short, catchy slogan for ChefMate."

if multi_agent_app:
    print(f"\n--- Running Multi-Agent System for Query: '{user_query_complex[:50]}...' ---")
    initial_state = {
        "original_request": user_query_complex,
        "messages": [HumanMessage(content=user_query_complex)]
        # market_research_data, slogan_text, drafted_email will be filled by the graph
    }
    
    # Each invoke of the compiled LangGraph app will be traced by MLflow
    # We wrap it in a parent MLflow run for overall task context
    with mlflow.start_run(run_name="MultiAgent_ContentGeneration_Run") as run:
        mlflow.log_param("initial_user_query", user_query_complex)
        mlflow.log_param("ollama_model_used", ollama_model_name)
        mlflow.set_tag("system_type", "Hierarchical Multi-Agent")
        
        try:
            final_state_multi_agent = multi_agent_app.invoke(initial_state, config={"recursion_limit": 15})
            
            print("\n--- Multi-Agent System Final State ---")
            print(f"Original Request: {final_state_multi_agent.get('original_request')}")
            print(f"Market Research: {final_state_multi_agent.get('market_research_data')}")
            print(f"Slogan: {final_state_multi_agent.get('slogan_text')}")
            print(f"Drafted Email:\n{final_state_multi_agent.get('drafted_email')}")
            
            mlflow.log_text(str(final_state_multi_agent.get('market_research_data', '')), "market_research_output.txt")
            mlflow.log_text(str(final_state_multi_agent.get('slogan_text', '')), "slogan_output.txt")
            mlflow.log_text(str(final_state_multi_agent.get('drafted_email', '')), "final_drafted_email.txt")
            mlflow.set_tag("overall_outcome", "success")

        except Exception as e:
            print(f"Error invoking multi-agent system: {e}")
            mlflow.log_text(str(e), "multi_agent_error.txt")
            mlflow.set_tag("overall_outcome", "error")
else:
    print("Skipping multi-agent system run as the app was not compiled (likely LLM issue).")

### Analyzing Multi-Agent Traces in MLflow UI
Go to the MLflow UI (`mlflow ui`):
- Open the `LangGraph_Advanced_FunctionCalling_A2A` experiment.
- Find the run `MultiAgent_ContentGeneration_Run` (and its child trace from autologging).
- **Inspect the Trace:** Observe how the `supervisor` node routes to `research_worker`, then `slogan_worker`, then `drafting_worker`. 
- Click on individual spans (e.g., `research_worker`'s LLM call) to see its specific prompt, the tool call it generated (with Pydantic-structured args), and the tool's output.
- See how the state (market research data, slogan) gets populated and used by downstream nodes.

![MLFlow UI](https://blog.min.io/content/images/2025/03/Screenshot-2025-03-10-at-3.30.33-PM.png)

This detailed tracing is crucial for debugging and understanding the flow in such complex, multi-step, multi-component systems.

---

## 6. Conceptual Overview: Other Agent-to-Agent Protocols

While our hierarchical system demonstrates one way for agents to collaborate (coordinated by a supervisor), other A2A patterns and protocols exist:

- **Message Passing:** Agents communicate by sending explicit messages to each other (e.g., via a message bus or direct calls if they expose APIs). The content and format of messages are key.
- **Shared State / Blackboard Systems:** Agents read from and write to a common, shared data structure (the "blackboard"). This allows for more decoupled interaction, as agents react to changes in the shared state.
- **Peer-to-Peer Collaboration:** Agents might negotiate tasks, share partial results, or critique each other's work without a central supervisor.
- **Formal Protocols (e.g., FIPA, MCP):**
    - **FIPA (Foundation for Intelligent Physical Agents):** Defines standards for agent communication languages (ACL), interaction protocols, and architectures.
    - **MCP (Multi-Agent Communication Protocol):** A newer initiative aiming to standardize how LLM-based agents communicate, potentially using JSON-based message formats for requests, responses, errors, etc. While not a library itself, it's a specification that future agent frameworks might adopt.

LangGraph's flexibility allows you to model many of these patterns. For example, the `AgentState` can act as a form of shared state, and the graph structure defines the message flow and control logic.

---

## 7. Key Takeaways

This notebook has equipped you with more advanced agent-building techniques:

- **Robust Function Calling with Pydantic:** Using Pydantic models for tool argument schemas improves reliability and clarity in how LLMs interact with your tools.
- **Hierarchical Multi-Agent Systems:** You've built a system where a supervisor agent coordinates tasks among specialized worker agents using LangGraph. This pattern is scalable and modular.
- **Complex Workflow Orchestration:** LangGraph provides the control flow (nodes, conditional edges) needed to manage intricate interactions between multiple LLM calls, tool executions, and state updates.
- **Deep Tracing with MLflow:** MLflow's autologging for LangChain/LangGraph provides essential visibility into these complex multi-step and multi-component executions, crucial for debugging and optimization.
- **Foundation for Advanced A2A:** While we implemented a hierarchical system, the concepts extend to more decentralized or formal A2A protocols.

Building effective multi-agent systems requires careful design of agent roles, responsibilities, communication pathways (state updates in our LangGraph case), and robust tool interfaces.

---

## 8. Engaging Resources and Further Reading

- **LangGraph & LangChain Documentation:**
    - [LangGraph Multi-Agent Collaboration Examples](https://python.langchain.com/docs/langgraph#multi-agent-collaboration) (and other examples like Agent Supervisor)
    - [LangChain Pydantic Tools](https://python.langchain.com/docs/modules/agents/tools/custom_tools#structuredtool-and-pydantic)
- **Pydantic:**
    - [Pydantic Documentation](https://docs.pydantic.dev/latest/)
- **Multi-Agent Systems Theory & Practice:**
    - "Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations" by Yoav Shoham and Kevin Leyton-Brown (Comprehensive textbook).
    - Search for recent research papers on LLM-based multi-agent systems.
- **Agent Communication Protocols:**
    - [FIPA Standards](http://www.fipa.org/specs/index.html)
    - Keep an eye on emerging protocols like MCP if they gain traction.

--- 

Fantastic work on completing this notebook! You're now well on your way to designing and tracing sophisticated, collaborative LLM applications.

**Coming Up Next (Notebook 9):** We'll look at implementing custom metrics and more nuanced evaluation strategies for generative AI tasks, building on our earlier evaluation work but focusing on the unique challenges of generative models.

![Keep Learning](https://memento.epfl.ch/image/23136/1440x810.jpg)