## Initialize the Agent

We are initializing a Llama Stack application which starts a Llama Stack server running alongside our main application. Running a separate Llama Stack server is a better approach for real use cases, but for our simple example this setup is sufficient.

On the other hand, we are going to use the model llama3.2:3b-instruct-fp16 served by a local Ollama instance. We need to define a set of rules or instructions so that the LLM can correctly infer when and how to use the tools we provide. We also need to configure the temperature parameter, which controls the balance between strictness and creativity of the model.

Finally, we add our tools, which are implemented in the files [math.py](./tools/maths.py) and [uppercase.py](./tools/uppercaser.py).

In [6]:
import os

from llama_stack_client import Agent, AgentEventLogger
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
from tools.uppercaser import to_uppercase
from tools.maths import add, divide, multiply, subtract


model = "llama3.2:3b-instruct-fp16"
os.environ["INFERENCE_MODEL"] = model
client = LlamaStackAsLibraryClient('ollama')
client.initialize()

instructions = """
You are a smart assistant that helps users with his questions.
You have differents available tools and you must use to retrieve information or perform actions.
You must not generate code or scripts.
No simulate the output, perform all operations with tools available.
Multiple tool calls may need to happen serially.
You may only call one tool at a time.
Remember you may need to pass the output of the first tool to subsequent tool calls in some cases
"""

agent = Agent(
  client,
  model=model,
  instructions=instructions,
  sampling_params={
    "strategy": {"type": "top_p", "temperature": 0.2, "top_p": 0.9},
  },
  tools=[
      to_uppercase, add, divide, multiply, subtract
  ],
)
session_id = agent.create_session('new_session')

## Interaction of model with tools

Now we can ask the model to perform tasks related to the tools. For example, if the input contains keywords related to the provided tools, the model will infer that it needs to use those specific tools.

If we ask to uppercase some text, the presence of the word "uppercase" signals that it should use the `to_uppercase` tool. Additionally, if the input involves multiple tasks requiring different tools, the model should call each tool sequentially and use the output of the previous tool as the input for the next one. This is not a default behavior, but we define it explicitly in the model instructions:
"Remember you may need to pass the output of the first tool to subsequent tool calls in some cases."

In [2]:


prompt = "Can you to uppercase 'hello world!'?"
print("prompt>", prompt)

response = agent.create_turn(
  messages=[{"role": "user", "content": prompt}],
  session_id=session_id,
)

for log in AgentEventLogger().log(response):
    log.print()

prompt> Can you to uppercase 'hello world!'?
[33minference> [0m[33m[to[0m[33m_upper[0m[33mcase[0m[33m(text[0m[33m='[0m[33mhello[0m[33m world[0m[33m![0m[33m')][0m[97m[0m
[30m[0m[32mtool_execution> Tool:to_uppercase Args:{'text': 'hello world!'}[0m
[32mtool_execution> Tool:to_uppercase Response:"HELLO WORLD!"[0m
[33minference> [0m[33mHow[0m[33m can[0m[33m I[0m[33m assist[0m[33m you[0m[33m further[0m[33m?[0m[97m[0m
[30m[0m

In [3]:
prompt = "First, add 3 + 3. Then divide the result by 2. Finally, multiply that result by 6. What is the final answer?"
print("prompt>", prompt)

response = agent.create_turn(
  messages=[{"role": "user", "content": prompt}],
  session_id=session_id,
)

for log in AgentEventLogger().log(response):
    log.print()

prompt> First, add 3 + 3. Then divide the result by 2. Finally, multiply that result by 6. What is the final answer?
[33minference> [0m[33m[[0m[33madd[0m[33m(a[0m[33m=[0m[33m3[0m[33m,[0m[33m b[0m[33m=[0m[33m3[0m[33m),[0m[33m divide[0m[33m(result[0m[33m=add[0m[33m(a[0m[33m=[0m[33m3[0m[33m,[0m[33m b[0m[33m=[0m[33m3[0m[33m),[0m[33m b[0m[33m=[0m[33m2[0m[33m),[0m[33m multiply[0m[33m(result[0m[33m=[0m[33mdivide[0m[33m(a[0m[33m=[0m[33m3[0m[33m,[0m[33m b[0m[33m=[0m[33m3[0m[33m),[0m[33m b[0m[33m=[0m[33m6[0m[33m)][0m[97m[0m
[30m[0m[32mtool_execution> Tool:add Args:{'a': 3.0, 'b': 3.0}[0m
[32mtool_execution> Tool:add Response:6[0m
[33minference> [0m[33m[[0m[33mdivide[0m[33m(a[0m[33m=[0m[33m6[0m[33m,[0m[33m b[0m[33m=[0m[33m2[0m[33m)][0m[97m[0m
[32mtool_execution> Tool:divide Args:{'a': 6.0, 'b': 2.0}[0m
[32mtool_execution> Tool:divide Response:3.0[0m
[33minference> [0m[33m[m[

## Using MCP server to provide tools to Agent


Now we will create a new agent that uses an MCP server to register tools. These tools are the same as the ones we used in the previous section, but this time they are provided via the MCP server.

Using an MCP server allows serving tools over HTTP to connected clients. This enables centralization of tools alongside different AI applications.

To do this, we need to run our MCP server, which is implemented with FastMCP and can be found in [mcp_server](./mcp_server.py). To start the server, simply run: `python mcp_server.py`

When you execute the script below, you will see the tools registered—the default tools and those served by the MCP server. Finally, we can query the model with the same prompt as before, and the output should be the same.


In [4]:
from llama_stack_client.types.toolgroup_register_params import McpEndpoint
from pprint import pprint

client.toolgroups.register(
    toolgroup_id="mcp::custom",
    provider_id="model-context-protocol",
    mcp_endpoint=McpEndpoint(uri="http://127.0.0.1:5000/sse"),
)

mcp_agent = Agent(
  client,
  model=model,
  instructions=instructions,
  tools=["mcp::custom"],
)


tools = client.tools.list()
pprint(tools)

[Tool(description='Insert documents into memory', identifier='insert_into_memory', parameters=[], provider_id='rag-runtime', tool_host=None, toolgroup_id='builtin::rag', type='tool', metadata=None, provider_resource_id=None),
 Tool(description='Search for information in a database.', identifier='knowledge_search', parameters=[Parameter(description='The query to search for. Can be a natural language sentence or keywords.', name='query', parameter_type='string', required=True, default=None)], provider_id='rag-runtime', tool_host=None, toolgroup_id='builtin::rag', type='tool', metadata=None, provider_resource_id=None),
 Tool(description='Search the web for information', identifier='web_search', parameters=[Parameter(description='The query to search for', name='query', parameter_type='string', required=True, default=None)], provider_id='tavily-search', tool_host=None, toolgroup_id='builtin::websearch', type='tool', metadata=None, provider_resource_id=None),
 Tool(description='Query Wolfram

In [5]:
prompts = [
    "Can you to uppercase 'hello world!'?",
    "First, add 3 + 3. Then divide the result by 2. Finally, multiply that result by 6. What is the final answer?",
]

mcp_session_id = mcp_agent.create_session('new_session')

for prompt in prompts:
    response = mcp_agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=mcp_session_id,
    )

    print("prompt>", prompt)
    for log in AgentEventLogger().log(response):
        log.print()

prompt> Can you to uppercase 'hello world!'?
[33minference> [0m[33m[[0m[33mcall[0m[33m_to[0m[33m_upper[0m[33mcase[0m[33m(text[0m[33m='[0m[33mhello[0m[33m world[0m[33m![0m[33m')][0m[97m[0m
[32mtool_execution> Tool:call_to_uppercase Args:{'text': 'hello world!'}[0m
[32mtool_execution> Tool:call_to_uppercase Response:[TextContentItem(text='{\n  "success": true,\n  "result": "HELLO WORLD!"\n}', type='text')][0m
[33minference> [0m[33mHow[0m[33m can[0m[33m I[0m[33m assist[0m[33m you[0m[33m further[0m[33m?[0m[97m[0m
[30m[0mprompt> First, add 3 + 3. Then divide the result by 2. Finally, multiply that result by 6. What is the final answer?
[33minference> [0m[33m[[0m[33mcall[0m[33m_add[0m[33m(a[0m[33m=[0m[33m3[0m[33m,[0m[33m b[0m[33m=[0m[33m3[0m[33m,[0m[33m session[0m[33m_id[0m[33m="[0m[33m123[0m[33m4[0m[33m"),[0m[33m call[0m[33m_div[0m[33mide[0m[33m(a[0m[33m=[0m[33m9[0m[33m,[0m[33m b[0m[33m=[