# Level 2: Simple Agent with Web Search

This notebook will introduce how to build a simple agent using Llama Stack's agent framework, enhanced with a single tool: the builtin web search tool. This capability will  allow the agent to retrieve up to date external information beyond the limits of its training data. This is an important step toward developing a more capable and autonomous agent.

## Overview

This tutorial will walk you through how to build your own AI agent who can search the web:

1. Configure a Llama Stack agent.
2. Enhance the agent by providing it access to a specific tool
2. Interact with the agent and tests its use of the web search tool.

## Prerequisites

Before starting this notebook, ensure that you have:
- Followed the instructions in the [Setup Guide](./Level0_getting_started_with_Llama_Stack.ipynb) notebook. 
- A Tavily API key. It is critical for this notebook to run correctly. You can register for one at [https://tavily.com/](https://tavily.com/).

## 1. Setting Up this Notebook
We will start with a few imports.

In [7]:
from llama_stack_client import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger

Next, we will initialize our environment as described in detail in our ["Getting Started" notebook](./Level0_getting_started_with_Llama_Stack.ipynb). Please refer to it for additional explanations.

In [8]:
# for accessing the environment variables
import os
from dotenv import load_dotenv
load_dotenv()

# for communication with Llama Stack
from llama_stack_client import LlamaStackClient

# pretty print of the results returned from the model/agent
import sys
sys.path.append('..')  
from src.utils import step_printer
from termcolor import cprint

base_url = os.getenv("REMOTE_BASE_URL")

# Tavily search API key is required for some of our demos and must be provided to the client upon initialization.
# We will cover it in the agentic demos that use the respective tool. Please ignore this parameter for all other demos.
tavily_search_api_key = os.getenv("TAVILY_SEARCH_API_KEY")
if tavily_search_api_key is None:
    provider_data = None
else:
    provider_data = {"tavily_search_api_key": tavily_search_api_key}


client = LlamaStackClient(
    base_url=base_url,
    provider_data=provider_data
)
    
print(f"Connected to Llama Stack server")

# model_id for the model you wish to use that is configured with the Llama Stack server
# model_id = "granite"
model_id = "qwen"

temperature = float(os.getenv("TEMPERATURE", 0.0))
if temperature > 0.0:
    top_p = float(os.getenv("TOP_P", 0.95))
    strategy = {"type": "top_p", "temperature": temperature, "top_p": top_p}
else:
    strategy = {"type": "greedy"}

max_tokens = int(os.getenv("MAX_TOKENS", 4096))

# sampling_params will later be used to pass the parameters to Llama Stack Agents/Inference APIs
sampling_params = {
    "strategy": strategy,
    "max_tokens": max_tokens,
}

stream_env = os.getenv("STREAM", "False")
# the Boolean 'stream' parameter will later be passed to Llama Stack Agents/Inference APIs
# any value non equal to 'False' will be considered as 'True'
stream = (stream_env != "False")

print(f"Inference Parameters:\n\tModel: {model_id}\n\tSampling Parameters: {sampling_params}\n\tstream: {stream}")

Connected to Llama Stack server
Inference Parameters:
	Model: qwen
	Sampling Parameters: {'strategy': {'type': 'greedy'}, 'max_tokens': 2048}
	stream: False


## 2. Configure an agent for tool use.

- **Agent Initialization**: First we create an `Agent` instance with the desired LLM model, agent instructions and tools.

- **Instructions**: The `instructions` parameter, also referred to as the system prompt, specifies the agent's role and behavior. In this example, the agent is configured as a helpful web search assistant. It is instructed to use a tool whenever a web search is required and to respond in a friendly and helpful tone.

- **Tools**: The `tools` parameter defines the tools available to the agent. In this case, the `builtin::websearch` tool is used, which enables the agent to perform web searches. This tool is essential for retrieving up-to-date information from the web.

- **How It Works**: When a user query is provided, the agent processes the input and determines whether a tool is required to fulfill the request. If the query involves retrieving information from the web, the agent invokes the `builtin::websearch` tool. The tool interacts with Tavily Search to fetch real-time data, which is then processed and returned to the user in a friendly and helpful tone. This workflow ensures that the agent can handle a wide range of queries effectively.

For more details on the `builtin::websearch` tool and its capabilities, refer to the [Llama-stack tools documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/tools.html#web-search-providers). 

In [9]:
agent = Agent(
    client, 
    model=model_id,
    instructions="""You are a helpful websearch assistant. When you are asked to search the latest you must use a tool. 
            Whenever a tool is called, be sure return the response in a friendly and helpful tone.
            """ ,
    tools=["builtin::websearch"],
    sampling_params=sampling_params
)

INFO:httpx:HTTP Request: POST http://llamastack:8321/v1/agents "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: GET http://llamastack:8321/v1/tools?toolgroup_id=builtin%3A%3Awebsearch "HTTP/1.1 200 OK"


## 3. Run the agent.
- Populate `user_prompts` with questions that you would like to ask the agent.
- Create a unique agent session for this conversation so that it can store metadata and context history in the Llama Stack server.
- Finally, display the agent's responses for each query.

In [10]:
user_prompts = [
    "What’s latest in OpenShift?",
]
for prompt in user_prompts:
    print("\n"+"="*50)
    cprint(f"Processing user query: {prompt}", "blue")
    print("="*50)
    session_id = agent.create_session("web-session")
    response = agent.create_turn(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        session_id=session_id,
        stream=stream
    )
    if stream:
        for log in EventLogger().log(response):
            log.print()
    else:
        step_printer(response.steps) # print the steps of an agent's response in a formatted way. 

INFO:httpx:HTTP Request: POST http://llamastack:8321/v1/agents/3fc00b3b-9d83-4861-95b2-590b60faaf92/session "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://llamastack:8321/v1/agents/3fc00b3b-9d83-4861-95b2-590b60faaf92/session/9768fb31-1968-4010-888e-4fabe795620d/turn "HTTP/1.1 200 OK"



[34mProcessing user query: What’s latest in OpenShift?[0m

---------- 📍 Step 1: InferenceStep ----------
🛠️ Tool call Generated:
[35mTool call: brave_search, Arguments: {'query': 'latest updates in OpenShift 2023 new features and releases'}[0m

---------- 📍 Step 2: ToolExecutionStep ----------
🔧 Executing tool...



---------- 📍 Step 3: InferenceStep ----------
🤖 Model Response:
[35m<think>
Okay, the user asked for the latest in OpenShift. I need to check the tool response to see what's new.

First, the Brave search results show a YouTube video from June 2023 about the OpenShift Roadmap Update, mentioning OpenShift 4.19 with key updates. That's probably the latest version. Then there's another video from January 2023 on OpenShift 4.12, but that's older. The IBM Cloud docs mention a release date of December 13, 2023, which aligns with OpenShift 4.19. The Whizlabs blog lists new features like Cluster Installation, KEDA, and Service Mesh. The Kasten release notes mention Azure Federated Identity support for OpenShift on Azure.

So, putting this together, the latest version is OpenShift 4.19 released in December 2023, with features from the roadmap update and the IBM Cloud info. The new features include Kubernetes-based auto-scaling (KEDA), OpenShift Service Mesh, and improved cluster installation. 

## Output Analysis
Here, we can observe that the `builtin::websearch` tool is used to perform a web search. The outputs are displayed in the notebook with color-coded text to help interpret the process:

- **Blue Text**: Represents the user's input or query.
- **Magenta Text**: Displays the LLM's inference response. 
- **Pink Text**: Indicates the tool execution process, such as the tool being called and the query being sent to the web search API.

Great! 
We can see that the model returned relevant and up-to-date information about OpenShift. This is particularly impressive given that the Granite 3.2 8B model (that we are using here) was released in February 2025 and has a knowledge cutoff of April 2024. These results were only possible due to its ability to call tools like web search, demonstrating the agent's capacity to retrieve real-time data effectively.

## Key Takeaways

- We've demonstrated how to set up Llama Stack agents and extended them with builtin tools (like web search) that come prepackaged with Llama Stack.
- We've shown that this simple approach can provide significantly increased functionality of existing open source LLM's. 
- This will serves as a foundational example for the more advanced examples to come involving Agentic RAG, External Tools, and complex agentic patterns.

Continue to the [next notebook](./Level3_advanced_agent_with_Prompt_Chaining_and_ReAct.ipynb) to learn how we can upgrade our agents to solve even more complex and multi-step tasks using advanced agentic patterns. 


#### Any Feedback?

If you have any feedback on this or any other notebook in this demo series we'd love to hear it! Please go to https://www.feedback.redhat.com/jfe/form/SV_8pQsoy0U9Ccqsvk and help us improve our demos.