# Working with MCP Servers

In this section, we will explore the **Model Context Protocol (MCP)** and how Llama Stack enables the seamless integration of external services as powerful tools for your agents. You will learn how to bridge the gap between your LLM applications and domain-specific functionalities or real-time data sources.

By the end of this section, you will be able to:

* **Understand MCP Servers:** Learn what MCP is and how it allows external services to expose their functionalities to Llama Stack agents.
* **Run an MCP-enabled container:** Deploy a pre-built MCP server container that exposes a weather API.
* **Register MCP tools:** Register the weather-related functionalities (like `get_alerts` and `get_forecast`) from the MCP server with Llama Stack.
* **Utilize MCP tools with a Llama Stack Agent:** Observe how a Llama Stack agent can discover and use these newly registered MCP tools to answer questions requiring external data.

This section will demonstrate how Llama Stack extends the capabilities of your AI agents by allowing them to interact with a diverse ecosystem of specialized services, bringing dynamic and real-world data into their decision-making process.

**Large Language Models (LLMs)** are incredibly powerful for understanding and generating human-like text. However, by themselves, they are limited to the knowledge they were trained on and cannot interact with the real world, access real-time data, or perform specific actions like looking up weather forecasts or checking inventory.

This is where **Model Context Protocols (MCP)** come in. Imagine MCP as a standardized way for *any external service* to describe its functionalities and make them available to an LLM or an AI agent. Think of it like a common language that allows your AI application to say, "Hey, I need to get the current temperature in London," and a separate weather service to understand that request and provide the data, regardless of how that weather service is built internally.

Traditionally, connecting an LLM to an external tool involved writing custom wrappers and integration code for each tool and often for each specific LLM framework. This becomes cumbersome and difficult to scale as you add more tools or switch between different AI models or platforms.

**MCP addresses this by providing a clear, discoverable interface.** External services, or "Model Context Protocol Servers," expose their capabilities (like a `get_forecast` function or a `check_inventory` function) in a structured, machine-readable format. Llama Stack, for example, can then consume this definition, automatically understand what the tool does, what inputs it needs, and what outputs to expect.

This standardized approach offers significant benefits for building scalable AI agents and applications:

* **Plug-and-Play Tooling:** Just as you can plug different peripherals into a computer, MCP allows you to "plug in" various services as tools for your agents without extensive custom coding for each.
* **Enhanced Agent Intelligence:** Agents can transcend their static training data, performing real-time actions, fetching live information, and interacting with enterprise systems, making them far more dynamic and useful.
* **Modularity and Maintainability:** Your specialized services (like a weather API or an inventory management system) can evolve independently, and as long as they adhere to the MCP, your AI agents can continue to use them without disruption.
* **Scalability:** As your application grows, you can easily add more MCP-enabled services, expanding the agent's capabilities by simply registering the new tools with Llama Stack, rather than rewriting large portions of your AI application.

In essence, MCP unlocks the full potential of AI agents by giving them standardized access to the vast world of external data and services, making your AI applications more powerful, adaptable, and easier to manage at scale.

### Setting up the environment for our experiment
We will now start our first MCP Server by running the `mcp-weather` container. This server will expose weather-related functionalities that our Llama Stack agent can use as tools.

In [1]:
!podman run -d --replace --name mcp-weather --network=host quay.dev.demo.redhat.com/rhdp/mcp-weather:latest --port 8005 

!podman ps

Trying to pull quay.dev.demo.redhat.com/rhdp/mcp-weather:latest...
Getting image source signatures
Copying blob 3e62d0f9a5d9 [-------------] 0.0b / 11.2KiB (skipped: 0.0b = 0.00%)
Copying blob 3e62d0f9a5d9 [---------------------------] 0.0b / 11.2KiB | 0.0 b/s
Copying blob 7f48b8fefae7 [------------] 0.0b / 200.6MiB (skipped: 0.0b = 0.00%)
Copying blob 7f48b8fefae7 [--------------------------] 0.0b / 200.6MiB | 0.0 b/s
Copying blob a8d0f3a32034 [-------------] 0.0b / 16.7MiB (skipped: 0.0b = 0.00%)
Copying blob a8d0f3a32034 [---------------------------] 0.0b / 16.7MiB | 0.0 b/s
Copying blob 08f144d6f626 [-------------] 0.0b / 71.3MiB (skipped: 0.0b = 0.00%)
Copying blob 109b08795e8b [--------------] 0.0b / 9.2MiB (skipped: 0.0b = 0.00%)
Copying blob 24626d2b5036 [-------------] 0.0b / 75.9MiB (skipped: 0.0b = 0.00%)
Copying blob 08f144d6f626 [---------------------------] 0.0b / 71.3MiB | 0.0 b/s
Copying blob 109b08795e8b [----------------------------] 0.0b / 9.2MiB | 0.0 b/s
Copying bl

Once the MCP server is running, we can quickly verify its availability by attempting to connect to its `/sse` (Server-Sent Events) endpoint. A successful response indicates the server is live.


In [None]:
!curl --max-time 1 http://localhost:8005/sse 2>/dev/null


### Setting up Llama Stack Agent (Stuff we already know)
Now, let's set up our Llama Stack client. This involves importing necessary libraries, configuring the Llama Stack server URL, and selecting the language model our agent will use.


In [2]:
!pip install -U llama-stack-client==0.2.7 dotenv > /dev/null 2>&1 && echo "pip Python Prerequisites installed succesfuly"

import os
from src.utils import step_printer
from termcolor import cprint
import rich
import uuid

from llama_stack_client import LlamaStackClient
from llama_stack_client.lib.agents.event_logger import EventLogger



stream=False 
#allowed_models_list=["granite3.2:8b"]
allowed_models_list=["meta-llama/Llama-3.2-3B-Instruct"]

LLAMA_STACK_SERVER='http://localhost:8321'
client = LlamaStackClient(
    base_url=LLAMA_STACK_SERVER,
)

selected_model = None
models = client.models.list()
print("--- Available models: ---")
for m in models:
    print(f"{m.identifier} - {m.provider_id} - {m.provider_resource_id}")
    # Check if the model identifier contains any of the allowed substrings
    if any(substring in m.identifier for substring in allowed_models_list):
        # Only set selected_model if it hasn't been set yet
        if selected_model is None:
            selected_model = m.identifier
           
if selected_model is None:
    print("No allowed model found in the list.")


print(f"Selected model (from allowed list): {selected_model}")
            # Removed the break here to show all available models, but the selection logic remains picking the first one
SELECTED_MODEL = selected_model

vector_db_id = "Our_Parks_DB"
query_config = {
    "query_generator_config": {
        "type": "default",
        "separator": " "
    },
    "max_tokens_in_context": 300,
    "max_chunks": 2
}


pip Python Prerequisites installed succesfuly
--- Available models: ---
meta-llama/Llama-3.2-3B-Instruct - ollama - llama3.2:3b-instruct-fp16
all-MiniLM-L6-v2 - ollama - all-minilm:latest
granite3.2:8b - ollama - granite3.2:8b
Selected model (from allowed list): meta-llama/Llama-3.2-3B-Instruct


### Registering the new MCP Server as a tool
Before registering our new MCP tools, let's inspect the tools currently available to Llama Stack. These are typically built-in tools or tools from previously registered providers.


In [3]:
registered_tools = client.tools.list()
registered_toolgroups = [t.toolgroup_id for t in registered_tools]

for tools in registered_tools:
    rich.print(tools)


This is a crucial step: we are now registering our MCP Weather server as a 'toolgroup' with Llama Stack. This makes the functionalities exposed by the `mcp-weather` container (`get_alerts`, `get_forecast`) available for our agents to use.

In [4]:
client.toolgroups.register(
        toolgroup_id="mcp::mcp-weather",
        provider_id="model-context-protocol",
        mcp_endpoint={"uri":"http://localhost:8005/sse"},
    )

After registration, let's list the available tools again. You should now see the weather-related tools (`get_alerts`, `get_forecast`) exposed by our MCP server.


In [5]:
registered_tools = client.tools.list()
registered_toolgroups = [t.toolgroup_id for t in registered_tools]

for tools in registered_tools:
    print("\n")
    rich.print(tools)

























Now, we'll define our Llama Stack agent. We'll instruct it to be a helpful agent and specifically grant it access to the `mcp::mcp-weather` toolgroup we just registered.


In [6]:
from llama_stack_client.lib.agents.agent import Agent

agent = Agent(
    client, 
    model=SELECTED_MODEL,
    instructions="""You are a helpful agent with access to tools, use the weather tool to answer questions
            """ ,
    tools=["mcp::mcp-weather"],
)

Finally, let's test our agent! We'll provide it with a few prompts that require it to use the newly available weather tools to retrieve information.

Sources and related content


In [7]:
stream=False
user_prompts = [
       "what is the weather in boulder colorado",
       "are there any weather alerts for Boston at the moment?",
]


for prompt in user_prompts:
    # Generate a new Unique Identifier for each session 
    new_uuid = uuid.uuid4()
    session_id = agent.create_session(f"mcp-session-{new_uuid}")

    cprint(f"\n{'='*100}\nProcessing user query: {prompt}\n{'='*100}", "blue")

    
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 

[34m
Processing user query: what is the weather in boulder colorado

---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[33mTool call: get_forecast, Arguments: {'latitude': '40.0113', 'longitude': '-105.2729'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[33mThe current weather in Boulder, Colorado is mostly cloudy with a temperature around 70°F and a chance of showers and thunderstorms after 1pm. Tonight, it will be partly cloudy with a low of 46°F. On Sunday, there's a 70% chance of showers and thunderstorms, with temperatures ranging from 63-69°F.
[0m

[34m
Processing user query: are there any weather alerts for Boston at the moment?

---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[33mTool call: get_alerts, Arguments: {'state': 'MA'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[33mThere are no current weather alerts for Massachusetts.
[0m



### Analyzing the Agent's Execution

Let's break down the output you just saw to understand how our Llama Stack agent, now empowered by the MCP Weather tools, processed your queries:

For the query "**what is the weather in boulder colorado**":

* **Step 1 (InferenceStep):** The agent receives the query. Its intelligence determines that to answer this question, it needs weather *forecast* data. It identifies the `get_forecast` tool (exposed via MCP) as relevant and intelligently extracts the latitude and longitude for "Boulder, Colorado" to form the tool call arguments (`{'latitude': '40.0116', 'longitude': '-105.2729'}`).
* **Step 2 (ToolExecutionStep):** The `get_forecast` tool is executed by Llama Stack, which communicates with our running `mcp-weather` server. The **Observation** is the detailed weather forecast data returned by the MCP server, including "Today: Temperature: 85°F...Tonight: Temperature: 58°F...".
* **Step 3 (InferenceStep):** The agent processes the raw forecast data (the Observation from Step 2). Its final thought is to synthesize this information into a concise, human-readable **Model Response** that directly answers the user's question about the weather in Boulder, including the forecast for the coming days.

Similarly, for the query "**are there any weather alerts for Boston at the moment?**":

* **Step 1 (InferenceStep):** The agent understands the request is for weather *alerts*. It recognizes the `get_alerts` tool (also exposed via MCP) as the appropriate tool. It then determines the two-letter state code for "Boston" (which is 'MA' for Massachusetts) and constructs the tool call (`{'state': 'MA'}`).
* **Step 2 (ToolExecutionStep):** The `get_alerts` tool is executed, again via the `mcp-weather` server. The **Observation** in this case indicates `{'type': 'text', 'text': 'No active alerts for this state.'}`.
* **Step 3 (InferenceStep):** Based on the tool's output, the agent formulates a direct and clear **Model Response**, stating that there are no current weather alerts for Massachusetts.

This flow clearly illustrates how the Llama Stack agent dynamically selected the correct MCP tool based on the user's intent, executed it, and then synthesized the results from the external service into a natural language response. This dynamic interaction with external services is a core capability enabled by the Model Context Protocol.

## Lab Summary: Working with MCP Servers

In this lab, you gained hands-on experience with the **Model Context Protocol (MCP)** and its integration within the Llama Stack framework. You learned how MCP serves as a vital bridge, allowing your Llama Stack agents to leverage external, specialized services as integral tools in their problem-solving workflows.

Through the exercises, you learned to:

* **Deploy an MCP Server:** You successfully ran a `mcp-weather` container, acting as an external service providing weather-related functionalities.
* **Connect MCP to Llama Stack:** You saw how Llama Stack's architecture facilitates the registration of MCP endpoints, making their exposed functions discoverable by agents.
* **Integrate External Tools:** You registered specific weather tools (`get_alerts` and `get_forecast`) from your MCP server with Llama Stack.
* **Empower Agents with External Data:** You observed a Llama Stack agent dynamically using these newly available MCP tools to retrieve real-time weather forecasts and alerts, directly responding to user queries.

This experience highlights how Llama Stack simplifies the process of extending agent capabilities by integrating domain-specific tools via MCP. It showcases the power of a unified API that abstracts away the complexities of diverse external services, enabling developers to build sophisticated, context-aware AI applications with remarkable ease and flexibility.