# Level 4: Agents & MCP Tools

This notebook is for developers who are already familiar with [basic agent workflows](Level2_simple_agent_with_websearch.ipynb). 
Here, we will highlight more advanced use cases for agents where a single tool call is insufficient to complete the required task.

We will also use [MCP tools](https://github.com/modelcontextprotocol/servers) (which can be deployed onto an OpenShift cluster) throughout this demo to show users how to extend their agents beyond Llama Stacks's current builtin tools and connect to many different services and data sources to build their own custom agents.  

### Agent Example:

This notebook will walkthrough how to build a system that can answer the following question via an agent built with Llama Stack:

- *"Review OpenShift logs for the failing-pod. Categorize each as either ‘Normal’ or ‘Error’. If it's an error search for a solution. Summarize any errors found."*

### MCP Tools:

#### OpenShift MCP Server

Throughout this notebook we will be relying on the [kuberenetes-mcp-server](https://github.com/manusa/kubernetes-mcp-server) by [manusa](https://github.com/manusa) to interact with our OpenShift cluster. Please see installation instructions below if you do not already have this deployed in your environment. 

* [OpenShift MCP application installation](https://github.com/eformat/rhoai-policy-collection/tree/main/gitops/applications/mcp-openshift)


## Pre-Requisites

Before starting this notebook, ensure that you have:
- Followed the instructions in the [Setup Guide](./Level0_getting_started_with_Llama_Stack.ipynb) notebook.
- Access to an OpeShift cluster with a deployment of the [OpenShift MCP server](https://github.com/eformat/rhoai-policy-collection/tree/main/gitops/applications/mcp-openshift).
- A Tavily API key is required. You can register for one at https://app.tavily.com/home.

Add your TAVILY_SEARCH_API_KEY="tvly-dev-your-key" to the `env.example` file.

## Setting Up this Notebook
We will initialize our environment as described in detail in our ["Getting Started" notebook](./Level1_getting_started_with_Llama_Stack.ipynb). Please refer to it for additional explanations.

In [1]:
# for accessing the environment variables
import os
from dotenv import load_dotenv
load_dotenv(override=True)

# for communication with Llama Stack
from llama_stack_client import LlamaStackClient
from llama_stack_client import Agent
from llama_stack_client.lib.agents.react.agent import ReActAgent
from llama_stack_client.lib.agents.react.tool_parser import ReActOutput
from llama_stack_client.lib.agents.event_logger import EventLogger

# pretty print of the results returned from the model/agent
from termcolor import cprint
import sys
sys.path.append('.')
from src.utils import step_printer

base_url = os.getenv("REMOTE_BASE_URL")


# Tavily search API key is required for some of our demos and must be provided to the client upon initialization.
# We will cover it in the agentic demos that use the respective tool. Please ignore this parameter for all other demos.
tavily_search_api_key = os.getenv("TAVILY_SEARCH_API_KEY")
if len(tavily_search_api_key) != 41:
    raise ValueError("Sorry your Tavily Search key seems invalid?")
else:
    provider_data = {"tavily_search_api_key": tavily_search_api_key}


client = LlamaStackClient(
    base_url=base_url,
    provider_data=provider_data
)

print(f"Connected to Llama Stack server")

# model_id for the model you wish to use that is configured with the Llama Stack server
model_id = "llama3-2-3b" # "deepseek-r1-0528-qwen3-8b-bnb-4bit"

temperature = float(os.getenv("TEMPERATURE", 0.0))
if temperature > 0.0:
    top_p = float(os.getenv("TOP_P", 0.95))
    strategy = {"type": "top_p", "temperature": temperature, "top_p": top_p}
else:
    strategy = {"type": "greedy"}

max_tokens = 5000

# sampling_params will later be used to pass the parameters to Llama Stack Agents/Inference APIs
sampling_params = {
    "strategy": strategy,
    "max_tokens": max_tokens,
}

stream = False

print(f"Inference Parameters:\n\tModel: {model_id}\n\tSampling Parameters: {sampling_params}\n\tstream: {stream}")

Connected to Llama Stack server
Inference Parameters:
	Model: llama3-2-3b
	Sampling Parameters: {'strategy': {'type': 'greedy'}, 'max_tokens': 5000}
	stream: False


## Validate tools are available in our Llama Stack instance

When an instance of Llama Stack is redeployed, it may be the case that the tools will need to be re-registered. Also if a tool is already registered with a Llama Stack instance, trying to register another one with the same `toolgroup_id` will throw you an error.

For this reason, it is recommended to validate your tools and toolgroups. The following code will check that both the `builtin::websearch` and `mcp::openshift` tools are correctly registered, and if not it will attempt to register them using their specific endpoints.

In [2]:
ocp_mcp_url = os.getenv("REMOTE_OCP_MCP_URL")

registered_tools = client.tools.list()
registered_toolgroups = [t.toolgroup_id for t in registered_tools]
if "mcp::openshift" not in registered_toolgroups:
    client.toolgroups.register(
        toolgroup_id="mcp::openshift",
        provider_id="model-context-protocol",
        mcp_endpoint={"uri":ocp_mcp_url},
    )

print(f"Your Llama Stack server is registered with the following tool groups @ {set(registered_toolgroups)} \n")

Your Llama Stack server is registered with the following tool groups @ {'mcp::weather', 'builtin::rag', 'mcp::openshift', 'builtin::websearch', 'mcp::fast-mcp-tools', 'mcp::github'} 



## Defining our Agent - Prompt Chaining

In [3]:
model_prompt= """You are a helpful assistant. You have access to a number of tools.
Whenever a tool is called, be sure to return the Response in a friendly and helpful tone.
"""

### Deploy a namespace

Let's first create a namespace on the OpenShift cluster.

In [4]:
# Create simple agent with tools
agent = Agent(
    client,
    model= model_id,  # replace this with model_id to get the value of INFERENCE_MODEL_ID environment variable
    instructions = model_prompt , # update system prompt based on the model you are using
    tools=["mcp::openshift"],
    tool_config={"tool_choice":"auto"},
    sampling_params=sampling_params
)

user_prompts = ["Create namespace called test in our cluster"]
session_id = agent.create_session(session_name="OCP_Slack_demo")

for i, prompt in enumerate(user_prompts):
    response = agent.create_turn(
        messages=[
            {
                "role":"user",
                "content": prompt
            }
        ],
        session_id=session_id,
        stream=stream,
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[35mTool call: namespace_create, Arguments: {'namespace': 'test'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[35mThe namespace "test" has been successfully created in your cluster. You can now use this namespace for your Kubernetes resources. If you need to delete the namespace, you can use the `namespace_delete` function.
[0m



### Deploy a pod with simulated error logs

For the purpose of testing and retrieving logs from a pod exhibiting errors, we will deploy a pod on an OpenShift cluster that produces simulated error logs. We have a pre-built container image available for this "fake" error pod that you can use. With the help of the agent and the OpenShift MCP server we can deploy the pod as follows.

In [5]:
# Create simple agent with tools
agent = Agent(
    client,
    model= model_id,  # replace this with model_id to get the value of INFERENCE_MODEL_ID environment variable
    instructions = model_prompt , # update system prompt based on the model you are using
    tools=["mcp::openshift"],
    tool_config={"tool_choice":"auto"},
    sampling_params=sampling_params
)

user_prompts = ["Run a pod called slack-test in namespace test using the quay.io/redhat-et/failing-test-pod:latest image"]
session_id = agent.create_session(session_name="OCP_Slack_demo")

for i, prompt in enumerate(user_prompts):
    response = agent.create_turn(
        messages=[
            {
                "role":"user",
                "content": prompt
            }
        ],
        session_id=session_id,
        stream=stream,
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[35mTool call: pods_run, Arguments: {'image': 'quay.io/redhat-et/failing-test-pod:latest', 'name': 'slack-test', 'namespace': 'test'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[35mThe pod "slack-test" has been successfully created in the namespace "test" using the "quay.io/redhat-et/failing-test-pod:latest" image. The pod is currently in the "Pending" phase and will be executed once it is ready. You can check the status of the pod by running the command `kubectl get pod slack-test -n test`.
[0m



You should see a pod `slack-test` successfully deployed in your namespace on the OpenShift cluster. If you view the logs of the pod, you should see the simulated error message as follows:
```
Starting container...
Failure: Unknown Error
Error details: Container failed due to an unexpected issue during startup.
Potential cause: Missing dependencies, configuration errors, or permission issues.
```

### Retrieve logs for erroneous pods running on OpenShift and send a message to Slack

Now that we have a simulated erroneous pod running on the OpenShift cluster, we can task the agent with summarizing the logs and sending a message to Slack.

In [6]:
# Create simple agent with tools
agent = Agent(
    client,
    model= model_id,  # replace this with model_id to get the value of INFERENCE_MODEL_ID environment variable
    instructions = model_prompt , # update system prompt based on the model you are using
    tools=["mcp::openshift"],
    tool_config={"tool_choice":"auto"},
    sampling_params=sampling_params
)

user_prompts = ["View the logs for the pod slack-test which has a single container slack-test in the test namespace. Categorize it as normal or error.",
               "Summarize the results with the pod name, category along with a briefly explanation as to why you categorized it as normal or error."]
session_id = agent.create_session(session_name="OCP_Slack_demo")

for i, prompt in enumerate(user_prompts):
    response = agent.create_turn(
        messages=[
            {
                "role":"user",
                "content": prompt
            }
        ],
        session_id=session_id,
        stream=stream,
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[35mTool call: pods_log, Arguments: {'container': 'slack-test', 'name': 'slack-test', 'namespace': 'test'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[35mThe logs for the pod "slack-test" in the "test" namespace have been retrieved. Unfortunately, the logs indicate that the container "slack-test" experienced an error during startup, categorized as a normal error. The error message suggests that there may be missing dependencies, configuration errors, or permission issues. If you'd like, I can try to help you investigate further or provide guidance on how to resolve the issue.
[0m


---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[35mTool call: pods_log, Arguments: {'container': 'slack-test', 'name': 'slack-test', 'namespace': 'test'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[35mThe pod "slack-test" in the "test" namespace experienced an error during startup, categorized as an error. The error message suggests that there may be missing dependencies, configuration errors, or permission issues.
[0m



### Output Analysis

Lets step through the output to further understands whats happening in this notebook.

1. First the LLM generated a tool call for the `pods_log` tool included in the **OpenShift MCP server** and fetched the logs for the specified pod.
2. The tool successfully retrieved the logs for the pod.
3. The LLM  then received the logs from the tool call, along with the original query.
4. This context was then passed back to the LLM for the final inference. The inference result provided a summary of the pod logs along with its category of 'Normal' or 'Error'.

## Defining our Agent - ReAct

Now that we've shown that we can successfully accomplish this multi-step multi-tool task using prompt chaining, let's see if we can give our agent a bit more autonomy to perform the same task but with a single prompt instead of a chain. To do this, we will instantiate a **ReAct agent** (which is included in the llama stack python client by default).The ReAct agent is a variant of the simple agent but with the ability to loop through "Reason then Act" iterations, thinking through the problem and then using tools until it determines that it's task has been completed successfully.  

Unlike prompt chaining which follows fixed steps, ReAct dynamically breaks down tasks and adapts its approach based on the results of each step. This makes it more flexible and capable of handling complex, real-world queries effectively.

Below you will see the slight differences in the agent definition and the prompt used to accomplish our task.

In [7]:
model_id = "deepseek-r1-0528-qwen3-8b-bnb-4bit"
stream = True

#model_id = "llama3-2-3b"
#stream = False

agent = ReActAgent(
            client=client,
            model=model_id,
            tools=["mcp::openshift", "builtin::websearch"],
            response_format={
                "type": "json_schema",
                "json_schema": ReActOutput.model_json_schema(),
            },
            sampling_params={"max_tokens":512},
        )
user_prompts =["""Review the OpenShift logs for the pod 'slack-test' with a container of the same name,in the 'test' namespace."
                If the logs indicate an error search for the top OpenShift solution. Create a summary message with the category and explanation of the error."""]
session_id = agent.create_session("web-session")
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 


[34mProcessing user query: Review the OpenShift logs for the pod 'slack-test' with a container of the same name,in the 'test' namespace."
                If the logs indicate an error search for the top OpenShift solution. Create a summary message with the category and explanation of the error.[0m
[33minference> [0m[33m{

[0m[33m   [0m[33m "[0m[33mthought[0m[33m":[0m[33m "[0m[33mI[0m[33m need[0m[33m to[0m[33m review[0m[33m the[0m[33m logs[0m[33m of[0m[33m the[0m[33m pod[0m[33m '[0m[33mslack[0m[33m-test[0m[33m'[0m[33m in[0m[33m the[0m[33m '[0m[33mtest[0m[33m'[0m[33m namespace[0m[33m.[0m[33m First[0m[33m,[0m[33m I[0m[33m should[0m[33m get[0m[33m the[0m[33m logs[0m[33m from[0m[33m the[0m[33m pod[0m[33m.[0m[33m Then[0m[33m,[0m[33m if[0m[33m there[0m[33m's[0m[33m an[0m[33m error[0m[33m,[0m[33m I[0m[33m need[0m[33m to[0m[33m search[0m[33m for[0m[33m the[0m[33m top[0m[33m Open[0m[33

### Output Analysis

Above, we can see that the ReAct agent took nearly an identical approach to the prompt chaining method above, but using a single prompt instead of a chain.  

1. First the LLM generated a tool call for the `pods_log` tool included in the **OpenShift MCP server** and fetched the logs for the specified pod.
2. The tool successfully retrieved the logs for the pod.
3. The LLM  then received the logs from the tool call, along with the original query.
4. This context was then passed back to the LLM for the final inference. The inference result provided a summary of the pod logs along with its category of 'Normal' or 'Error'.
5. Next the LLM generates a tool call for the default builtin `brave_search` tool which was not available. This is because the models have been trained with Brave Search as a built-in tool.
6. Next the LLM generates a tool call for the `web_search` tool instead looking for the top answer to the error.
7. A summary of the pod log error and possible next steps to help solve the problem are suggested. 

## Key Takeaways

This notebook demonstrated how to build an agentic MCP applications with Llama Stack. We did this by initializing an agent with access to two MCP servers that were registered to our Llama Stack server, then invoked the agent on our specified set of queries. We showed that we can do this with more directed Prompt Chaining or with the more open ended ReAct pattern.