# Building a LangGraph Agent that Uses A2A Protocol

## Overview

This notebook demonstrates how to build a **LangGraph-based agent** that communicates with an A2A-compliant agent service. This is an example of **agent-to-agent (A2A) communication** where one agent can use another agent as a tool.

### What You'll Learn

1. **A2A Protocol Integration**: How to connect to and discover A2A agent capabilities
2. **Tool Creation**: How to wrap an A2A client as a LangChain tool
3. **LangGraph Agent**: How to build an agent that autonomously decides when to delegate tasks
4. **Async Execution**: Proper async/await patterns for non-blocking I/O
5. **Multi-Turn Conversations**: How to maintain context across multiple interactions

### Architecture

```
User Query
    ‚Üì
LangGraph Agent (Client)
    ‚Üì
  Decision: Use tool or answer directly?
    ‚Üì
A2A Protocol (HTTP/JSON-RPC)
    ‚Üì
A2A Agent Server (localhost:10000)
    ‚îú‚îÄ Web Search
    ‚îú‚îÄ ArXiv Search  
    ‚îî‚îÄ RAG Search
    ‚Üì
Response to User
```

### Prerequisites

**Before running this notebook:**

1. **Start the A2A agent service:**
   ```bash
   cd app && uv run python -m app
   ```

2. **Verify it's running:**
   - The server should be accessible at http://localhost:10000
   - You should see log output indicating the server started successfully

3. **Environment setup:**
   - Ensure your `.env` file contains `OPENAI_API_KEY`
   - Optionally set `TOOL_LLM_NAME` (defaults to `gpt-4o-mini`)

## Step 1: Import Required Libraries

We'll import all necessary libraries for building our agent. The A2A protocol specific imports include

### A2A Protocol Imports
- **`httpx`**: Async HTTP client (required for non-blocking I/O)
- **`A2ACardResolver`**: Discovers agent capabilities via AgentCards
- **`ClientFactory, ClientConfig`**: Creates properly configured A2A clients

### Why Async?
The A2A client uses `httpx.AsyncClient` for async HTTP requests. This means:
- Our tool must be async (`async def`)
- We use `await` for async operations
- LangGraph must use `astream()` instead of `stream()`
- Better performance through non-blocking I/O

In [2]:
import os
import logging
from uuid import uuid4

from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver

import httpx
from a2a.client import A2ACardResolver, ClientFactory, ClientConfig
from dotenv import load_dotenv

load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

print("‚úÖ All imports successful!")

‚úÖ All imports successful!


## Step 2: Initialize the A2A Client

This step establishes a connection to the A2A agent and discovers its capabilities.

### What Happens Here:

1. **Create Async HTTP Client**
   ```python
   httpx_client = httpx.AsyncClient(timeout=60.0)
   ```
   - Timeout set to 60 seconds (LLMs can take time for complex queries)
   - Async client allows non-blocking operations

2. **Create AgentCard Resolver**
   ```python
   resolver = A2ACardResolver(...)
   ```
   - AgentCard = machine-readable description of agent capabilities (like OpenAPI spec)
   - Located at `/.well-known/agent-card.json` (standard A2A endpoint)

3. **Fetch the AgentCard**
   ```python
   agent_card = await resolver.get_agent_card()
   ```
   - Makes HTTP GET request to discover agent
   - Returns structured data about:
     - Agent name and description
     - Available skills (web search, arxiv, RAG)
     - Capabilities (streaming, notifications)
     - Protocol version and transport method

4. **Create A2A Client**
   ```python
   factory = ClientFactory(...)
   a2a_client = factory.create(card=agent_card)
   ```
   - Factory pattern creates properly configured client
   - Client knows how to communicate with this specific agent
   - Ready to send messages and receive responses

### Expected Output:
You should see the agent name and its available skills:
- Web Search Tool
- Academic Paper Search
- Document Retrieval

In [3]:
# Initialize A2A client
A2A_URL = 'http://localhost:10000'
httpx_client = httpx.AsyncClient(timeout=httpx.Timeout(60.0))
resolver = A2ACardResolver(httpx_client=httpx_client, base_url=A2A_URL)
agent_card = await resolver.get_agent_card()
factory = ClientFactory(ClientConfig(httpx_client=httpx_client))
a2a_client = factory.create(card=agent_card)

print(f"‚úÖ Connected to: {agent_card.name}")
print(f"   Skills: {[s.name for s in agent_card.skills]}")

INFO:httpx:HTTP Request: GET http://localhost:10000/.well-known/agent-card.json "HTTP/1.1 200 OK"
INFO:a2a.client.card_resolver:Successfully fetched agent card data from http://localhost:10000/.well-known/agent-card.json: {'capabilities': {'pushNotifications': True, 'streaming': True}, 'defaultInputModes': ['text', 'text/plain'], 'defaultOutputModes': ['text', 'text/plain'], 'description': 'A helpful AI assistant with web search, academic paper search, and document retrieval capabilities', 'name': 'General Purpose Agent', 'preferredTransport': 'JSONRPC', 'protocolVersion': '0.3.0', 'skills': [{'description': 'Search the web for current information', 'examples': ['What are the latest news about AI?'], 'id': 'web_search', 'name': 'Web Search Tool', 'tags': ['search', 'web', 'internet']}, {'description': 'Search for academic papers on arXiv', 'examples': ['Find recent papers on large language models'], 'id': 'arxiv_search', 'name': 'Academic Paper Search', 'tags': ['research', 'papers', '

‚úÖ Connected to: General Purpose Agent
   Skills: ['Web Search Tool', 'Academic Paper Search', 'Document Retrieval']


## Step 3: Create the A2A Tool

This is the **most important step** - we wrap the A2A client as a LangChain tool that our LangGraph agent can use.

### The `@tool` Decorator

The `@tool` decorator automatically:
- Converts the function into a LangChain tool
- Generates tool schema from function signature and docstring
- Makes it available to the LLM for autonomous use

The LLM sees:
```json
{
  "name": "query_a2a_agent",
  "description": "Query the A2A agent (web search, arxiv, RAG).",
  "parameters": {
    "query": {"type": "string", "description": "Question to ask"}
  }
}
```

### How It Works:

1. **Construct A2A Message**
   - Format: `{role, parts, message_id}`
   - Each message needs unique ID (A2A protocol requirement)

2. **Send Message and Stream Response**
   - `async for chunk in a2a_client.send_message(message)`
   - Processes streaming chunks as they arrive

3. **Parse Response Structure** (Critical!)
   - A2A responses: `(task, payload)` tuples
   - Text location: `payload.artifact.parts[0].root.text`
   
   Structure:
   ```
   chunk (tuple)
     ‚îú‚îÄ task (Task metadata)
     ‚îî‚îÄ payload (TaskArtifactUpdateEvent)
         ‚îî‚îÄ artifact (Artifact object)
             ‚îî‚îÄ parts (List[Part])
                 ‚îî‚îÄ [0].root.text ‚Üê THE ACTUAL TEXT!
   ```

4. **Return Result**
   - Accumulated text from all chunks
   - LangGraph passes this back to the LLM

### Error Handling:
- Catches network errors, parsing errors, etc.
- Returns error as string (LLM can see what went wrong)

In [4]:
@tool
async def query_a2a_agent(query: str) -> str:
    """
    Query the A2A agent (web search, arxiv, RAG).
    
    Args:
        query: Question to ask
    Returns:
        Response from A2A agent
    """
    try:
        message = {
            "role": "user",
            "parts": [{"kind": "text", "text": query}],
            "message_id": uuid4().hex,
        }
        
        logger.info(f"Querying A2A: {query}")
        response_text = ""
        
        async for chunk in a2a_client.send_message(message):
            # A2A returns (task, payload) tuples
            if isinstance(chunk, tuple) and len(chunk) == 2:
                task, payload = chunk
                
                # Check for artifact updates (where the response is)
                if payload and hasattr(payload, 'artifact'):
                    artifact = payload.artifact
                    if hasattr(artifact, 'parts'):
                        for part in artifact.parts:
                            # Each part has a 'root' attribute with the TextPart
                            if hasattr(part, 'root') and hasattr(part.root, 'text'):
                                response_text += part.root.text
        
        result = response_text.strip()
        logger.info(f"Got response: {result[:100]}...")
        return result if result else "No response from A2A agent"
        
    except Exception as e:
        logger.error(f"Error: {e}")
        return f"Error: {e}"

print(f"‚úÖ Tool created: {query_a2a_agent.name}")

‚úÖ Tool created: query_a2a_agent


## Step 4: Build the LangGraph Agent

Now we create the **decision-making agent** that autonomously chooses when to use the A2A tool.

### Components:

#### 1. **Initialize the LLM**
```python
llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)
```
- Temperature 0 = deterministic responses (no randomness)
- This is the "brain" that makes decisions

#### 2. **Bind Tools to LLM**
```python
llm_with_tools = llm.bind_tools([query_a2a_agent])
```
- LLM can now "see" the A2A tool
- Decides: "Should I use this tool or answer directly?"

#### 3. **Create Memory**
```python
memory = MemorySaver()
```
- Stores conversation history
- Each conversation has unique `thread_id`
- Enables multi-turn context

### Graph Architecture:

```
START
  ‚Üì
agent (LLM analyzes & decides)
  ‚Üì
  ‚îú‚îÄ‚Üí tools (execute A2A query)
  ‚îÇ      ‚Üì
  ‚îÇ    agent (process results)
  ‚îÇ      ‚Üì
  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚Üí END (return answer)
```

### Decision Flow Example:

**User asks:** "What are the latest AI developments?"

1. **Agent node**: LLM thinks "I need current info ‚Üí use tool"
2. **Tools node**: Execute `query_a2a_agent("latest AI developments")`
3. **Agent node**: LLM formats the response nicely
4. **END**: Return formatted answer

**User asks:** "What is 2+2?"

1. **Agent node**: LLM thinks "I know this ‚Üí answer directly"
2. **END**: Return "2 + 2 equals 4"

### Routing Logic:

The `should_continue` function decides the next step:
- **Has tool_calls?** ‚Üí Go to "tools" node
- **No tool_calls?** ‚Üí Go to "end" (we're done)

In [5]:
# Build LangGraph agent
llm = ChatOpenAI(model=os.getenv('TOOL_LLM_NAME', 'gpt-4o-mini'), temperature=0)
tools = [query_a2a_agent]
llm_with_tools = llm.bind_tools(tools)
memory = MemorySaver()

def agent_node(state: MessagesState) -> MessagesState:
    """Agent node: LLM analyzes messages and decides what to do."""
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

def should_continue(state: MessagesState) -> str:
    """Routing logic: Continue to tools or end?"""
    last = state["messages"][-1]
    return "tools" if (hasattr(last, "tool_calls") and last.tool_calls) else "end"

# Build the graph
workflow = StateGraph(MessagesState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", ToolNode(tools))
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", "end": END})
workflow.add_edge("tools", "agent")

# Compile with memory for multi-turn conversations
graph = workflow.compile(checkpointer=memory)
print("‚úÖ LangGraph agent ready!")

‚úÖ LangGraph agent ready!


## Step 5: Create Helper Function for Testing

The `chat()` function simplifies running queries and displaying results.

### Key Features:

1. **Async Streaming**: Uses `graph.astream()` (required for async tools)
2. **Thread Management**: Each conversation has a `thread_id` for context
3. **Pretty Display**: Shows what's happening at each step

### What You'll See:

- üîß **Tool calls**: When the agent decides to use the A2A tool
- üìä **Tool results**: Response from the A2A agent (truncated if long)
- ü§ñ **Final answer**: LLM's formatted response to the user

### Important: Why `astream()`?

We use `graph.astream()` instead of `graph.stream()` because:
- Our A2A tool is async (uses `async def`)
- Sync `stream()` would cause: `NotImplementedError: StructuredTool does not support sync invocation`
- Async execution allows non-blocking HTTP requests

In [7]:
# Helper function to run tests
async def chat(query: str, thread_id: str = "default"):
    """Send a query and display results."""
    print(f"\n{'='*60}")
    print(f"üë§ User: {query}")
    print('='*60)
    
    config = {"configurable": {"thread_id": thread_id}}
    input_msg = {"messages": [HumanMessage(content=query)]}
    
    # IMPORTANT: Use astream() for async tools!
    async for chunk in graph.astream(input_msg, config, stream_mode="values"):
        if "messages" in chunk:
            msg = chunk["messages"][-1]
            if isinstance(msg, AIMessage) and msg.content:
                print(f"\nü§ñ Agent: {msg.content}")
            elif isinstance(msg, AIMessage) and hasattr(msg, 'tool_calls') and msg.tool_calls:
                print(f"\nüîß Using: {[tc['name'] for tc in msg.tool_calls]}")
            elif isinstance(msg, ToolMessage):
                preview = msg.content[:150] + "..." if len(msg.content) > 150 else msg.content
                print(f"\nüìä Result: {preview}")

print("‚úÖ Chat function ready")

‚úÖ Chat function ready


## Test 1: Web Search Query

This test demonstrates the agent using the A2A tool for web search.

### What Should Happen:

1. **User query**: "What are the latest developments in large language models?"
2. **LLM decision**: "This needs current information ‚Üí I should use the A2A tool"
3. **Tool execution**: A2A agent performs web search
4. **Response**: Recent articles and news about LLMs
5. **LLM formatting**: Presents results in a clear, organized way

### Behind the Scenes:

```
User query
  ‚Üì
LangGraph Agent analyzes
  ‚Üì
Calls query_a2a_agent("latest developments in large language models 2023")
  ‚Üì
A2A Protocol ‚Üí A2A Agent ‚Üí Web Search Tool ‚Üí Tavily API
  ‚Üì
Returns: Recent LLM news/articles
  ‚Üì
LLM formats into bullet points
  ‚Üì
Display to user
```

### Key Observation:
Notice the LLM adds context to the query (e.g., "2023") to get more relevant results.

In [8]:
await chat("What are the latest developments in large language models?")


üë§ User: What are the latest developments in large language models?


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:__main__:Querying A2A: latest developments in large language models 2023
INFO:httpx:HTTP Request: POST http://localhost:10000/ "HTTP/1.1 200 OK"



üîß Using: ['query_a2a_agent']


INFO:a2a.client.client_task_manager:New task created with id: 7a9ccdc0-c995-4c0c-92e0-48a5f1fa4602
INFO:__main__:Got response: Here are some of the latest developments in large language models (LLMs) as of 2023:

1. **Ethical C...



üìä Result: Here are some of the latest developments in large language models (LLMs) as of 2023:

1. **Ethical Considerations in Psychotherapy**: A systematic rev...


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



ü§ñ Agent: Here are some of the latest developments in large language models (LLMs) as of 2023:

1. **Ethical Considerations in Psychotherapy**: A systematic review has examined the opportunities and risks of using LLMs in mental health, discussing their potential benefits and harms in digital mental health applications. This includes the design of AI-based conversational agents aimed at promoting mental health and well-being.

2. **Backdoor Vulnerabilities**: Research from Anthropic and other institutions has revealed that LLMs can develop backdoor vulnerabilities from a surprisingly small number of malicious documents. This study indicates that even large models can be compromised with as few as 250 corrupted documents, raising concerns about the security of training data.

3. **Cognitive Decline from Low-Quality Data**: A study found that LLMs trained on low-quality, high-engagement social media content experience a decline in cognitive abilities, akin to 'brain rot'. This suggest

## Test 2: Academic Paper Search

This test shows the agent using ArXiv search via the A2A agent.

### What Should Happen:

1. **User query**: "Find recent papers on transformer architectures"
2. **LLM decision**: "This needs academic papers ‚Üí use A2A tool"
3. **Tool execution**: A2A agent queries ArXiv
4. **Response**: List of recent research papers with:
   - Title
   - Authors
   - Publication date
   - Abstract/summary
5. **LLM formatting**: Organizes papers into readable list

### Behind the Scenes:

```
User query
  ‚Üì
query_a2a_agent("recent papers on transformer architectures 2023")
  ‚Üì
A2A Agent ‚Üí ArXiv Tool ‚Üí ArXiv API
  ‚Üì
Returns: Paper metadata (titles, authors, abstracts)
  ‚Üì
LLM creates formatted list with key details
```

### Different Tool, Same Agent:
The A2A agent internally decides to use ArXiv instead of web search based on the query!

In [9]:
await chat("Find recent papers on transformer architectures")


üë§ User: Find recent papers on transformer architectures


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:__main__:Querying A2A: recent papers on transformer architectures 2023
INFO:httpx:HTTP Request: POST http://localhost:10000/ "HTTP/1.1 200 OK"



üîß Using: ['query_a2a_agent']


INFO:a2a.client.client_task_manager:New task created with id: 4fa8feaa-809d-4fc7-ab23-174eab8bd737
INFO:__main__:Got response: Here are some recent papers on transformer architectures published in 2023:

1. **Title:** TurboViT:...



üìä Result: Here are some recent papers on transformer architectures published in 2023:

1. **Title:** TurboViT: Generating Fast Vision Transformers via Generativ...


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



ü§ñ Agent: Here are some recent papers on transformer architectures published in 2023:

1. **TurboViT: Generating Fast Vision Transformers via Generative Architecture Search**  
   - **Authors:** Alexander Wong, Saad Abbasi, Saeejith Nair  
   - **Published:** 2023-08-22  
   - **Summary:** This paper explores the generation of fast vision transformer architectures using generative architecture search (GAS) to balance accuracy and computational efficiency. The TurboViT architecture is introduced, achieving significant reductions in architectural complexity and improved performance compared to existing models.

2. **Transformers are Universal Predictors**  
   - **Authors:** Sourya Basu, Moulik Choraria, Lav R. Varshney  
   - **Published:** 2023-07-15  
   - **Summary:** This study investigates the limits of the Transformer architecture for language modeling, demonstrating its universal prediction capabilities.

3. **A survey of the Vision Transformers and their CNN-Transformer based

## Test 3: Simple Query (No Tool Call)

This test demonstrates **intelligent routing** - the agent answers directly without using tools.

### What Should Happen:

1. **User query**: "What is 2 + 2?"
2. **LLM decision**: "I know this answer ‚Üí no tools needed"
3. **Direct response**: "2 + 2 equals 4."
4. **No tool calls**: Notice no üîß or üìä symbols!

### Why This Matters:

This shows the agent is **truly autonomous**:
- It doesn't blindly call tools for every query
- It evaluates whether a tool is actually needed
- It can answer directly when appropriate
- This saves time and API costs

### Decision Flow:

```
"What is 2 + 2?"
  ‚Üì
Agent analyzes: "Simple math question"
  ‚Üì
Decision: No external knowledge needed
  ‚Üì
Returns AIMessage with content="2 + 2 equals 4."
  ‚Üì
No tool_calls in message
  ‚Üì
should_continue() returns "end"
  ‚Üì
Skips tools node entirely
```

### Contrast with Test 1 & 2:
- Test 1: LLM needed current info ‚Üí used tool
- Test 2: LLM needed academic papers ‚Üí used tool  
- Test 3: LLM has knowledge ‚Üí answered directly

In [10]:
await chat("What is 2 + 2?")


üë§ User: What is 2 + 2?


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



ü§ñ Agent: 2 + 2 equals 4.


## Test 4: Multi-Turn Conversation

This test demonstrates **context maintenance** across multiple turns.

### What Should Happen:

**Turn 1:**
1. User: "Tell me about RAG"
2. Agent uses A2A to get RAG explanation
3. Response stored in `thread_id="conversation1"`

**Turn 2:**
1. User: "Find papers on this topic"
2. Agent reads conversation history
3. Understands "this topic" = RAG (from Turn 1!)
4. Queries for RAG papers

### How Context Works:

```python
# Turn 1 - stored in memory
await chat("Tell me about RAG", "conversation1")
# Memory now contains:
# - HumanMessage("Tell me about RAG")
# - AIMessage(tool_call to A2A)
# - ToolMessage(RAG explanation)
# - AIMessage(formatted explanation)

# Turn 2 - reads from same memory
await chat("Find papers on this topic", "conversation1")
# LLM sees ALL previous messages
# Knows "this topic" refers to RAG
```

### Key Mechanism:

1. **Same thread_id**: Both calls use `"conversation1"`
2. **MemorySaver**: Persists messages between calls
3. **LLM context**: Full conversation history sent to LLM
4. **Reference resolution**: LLM resolves "this topic" ‚Üí "RAG"

### What If We Used Different thread_ids?

```python
await chat("Tell me about RAG", "thread_A")
await chat("Find papers on this topic", "thread_B")  # ‚ùå No context!
```

The second query would fail because there's no shared history.

In [11]:
await chat("Tell me about RAG", "conversation1")
await chat("Find papers on this topic", "conversation1")


üë§ User: Tell me about RAG


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:__main__:Querying A2A: What is RAG (Retrieval-Augmented Generation)?
INFO:httpx:HTTP Request: POST http://localhost:10000/ "HTTP/1.1 200 OK"



üîß Using: ['query_a2a_agent']


INFO:a2a.client.client_task_manager:New task created with id: adedea68-b7ae-433c-afb4-aced5cb2dc92
INFO:__main__:Got response: Retrieval-Augmented Generation (RAG) is a framework that combines retrieval-based methods with gener...



üìä Result: Retrieval-Augmented Generation (RAG) is a framework that combines retrieval-based methods with generative models to enhance the performance of natural...


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



ü§ñ Agent: Retrieval-Augmented Generation (RAG) is a framework that enhances natural language processing tasks by combining retrieval-based methods with generative models. The main idea is to leverage external knowledge sources, such as large databases or documents, to improve the quality and relevance of generated responses.

A typical RAG setup involves two main components:

1. **Retrieval**: The model first retrieves relevant documents or information from a large corpus based on the input query. This can be done using techniques like dense retrieval or traditional keyword-based search.

2. **Generation**: After retrieving the relevant information, a generative model (often a transformer-based model) uses this information to produce a coherent and contextually appropriate response.

RAG models are particularly useful in scenarios where the knowledge required to answer a question is not contained within the model's training data, allowing them to access up-to-date or specialized inf

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:__main__:Querying A2A: Retrieval-Augmented Generation RAG research papers
INFO:httpx:HTTP Request: POST http://localhost:10000/ "HTTP/1.1 200 OK"



üîß Using: ['query_a2a_agent']


INFO:a2a.client.client_task_manager:New task created with id: e6cc2fb2-a49c-43f2-9af0-5bc0512d4c5c
INFO:__main__:Got response: Here are some recent research papers on Retrieval-Augmented Generation (RAG):

1. **RAG-Stack: Co-Op...



üìä Result: Here are some recent research papers on Retrieval-Augmented Generation (RAG):

1. **RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vect...


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



ü§ñ Agent: Here are some recent research papers on Retrieval-Augmented Generation (RAG):

1. **RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vector Database Perspective**  
   - **Authors:** Wenqi Jiang  
   - **Published:** 2025-10-23  
   - **Summary:** This paper discusses the integration of documents retrieved from a vector database into the prompts of large language models (LLMs) to enhance content generation. It presents RAG-Stack, a blueprint for optimizing both system performance and generation quality in RAG systems, comprising three components: RAG-IR, RAG-CM, and RAG-PE.

2. **Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks**  
   - **Authors:** Yunfan Gao, Yun Xiong, Meng Wang, Haofen Wang  
   - **Published:** 2024-07-26  
   - **Summary:** This paper introduces a modular framework for RAG systems, allowing for a more flexible and reconfigurable design. It decomposes complex RAG systems into independent modules and explores 

## Cleanup

Always close the async HTTP client to free resources.

### Why This Matters:

- **Resource cleanup**: Closes TCP connections
- **Best practice**: Prevents resource leaks
- **Required for async**: Async contexts should be properly closed

In [12]:
# Cleanup
await httpx_client.aclose()
print("‚úÖ Done!")

‚úÖ Done!


## Summary and Key Learnings

### What This Notebook Demonstrated:

1. **A2A Protocol Integration**
   - Connecting to A2A-compliant agents
   - Using AgentCards for service discovery
   - Parsing A2A response structure (`payload.artifact.parts[0].root.text`)

2. **LangGraph Agent Architecture**
   - Building a graph with nodes (agent, tools) and edges (flow)
   - Autonomous decision-making (when to use tools)
   - State management with MessagesState

3. **Async Execution Patterns**
   - Using `async def` for async operations
   - Using `await` for async calls
   - Using `graph.astream()` instead of `graph.stream()`
   - Using `async for` for streaming

4. **Tool Abstraction**
   - Converting Python functions to tools with `@tool`
   - Automatic schema generation
   - LLM-driven tool selection

5. **Multi-Turn Context**
   - Using `thread_id` to maintain conversations
   - MemorySaver for persistence
   - Reference resolution across turns

### Key Technical Points:

| Aspect | Implementation | Why It Matters |
|--------|----------------|----------------|
| **A2A Response** | `(task, payload)` tuples | Protocol standard |
| **Text Location** | `payload.artifact.parts[0].root.text` | Where actual content lives |
| **Async Tool** | `async def query_a2a_agent()` | Required for async client |
| **Graph Execution** | `graph.astream()` | Works with async tools |
| **Memory** | `MemorySaver()` + `thread_id` | Enables multi-turn |
| **Routing** | `should_continue()` function | Controls flow |


### Resources:

- [A2A Protocol Specification](https://github.com/missingstudio/a2a-protocol)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [LangChain Tools Guide](https://python.langchain.com/docs/modules/tools/)
- [httpx Async Client](https://www.python-httpx.org/async/)