# 🛠️🦙 Build with Llama Stack and Haystack


This notebook demonstrates how to use the `LlamaStackChatGenerator` component with tools to enable function calling capabilities. We'll create a simple weather tool that the model can call to provide dynamic, up-to-date information.


In [None]:
## Installation
%%bash

pip install llama-stack-haystack

## Setup

Before running this example, you need to:

1. Set up Llama Stack Server through an inference provider
2. Have a model available (e.g., `llama3.2:3b`)

For a quick start on how to setup server with Ollama, see the [Llama Stack documentation](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).

Once you have the server running, it will typically be available at `http://localhost:8321/v1/openai/v1`.


In [2]:
## Importing Components
from haystack.components.tools import ToolInvoker
from haystack.dataclasses import ChatMessage
from haystack.tools import Tool
from haystack_integrations.components.generators.llama_stack import LlamaStackChatGenerator


## Defining a Tool

Tools in Haystack allow models to call functions to get real-time information or perform actions. Let's create a simple weather tool that the model can use to provide weather information.


In [3]:
# Define a tool that models can call
def weather(city: str):
    """Return mock weather info for the given city."""
    return f"The weather in {city} is sunny and 32°C"

# Define the tool parameters schema
tool_parameters = {
    "type": "object", 
    "properties": {
        "city": {"type": "string"}
    }, 
    "required": ["city"]
}

# Create the weather tool
weather_tool = Tool(
    name="weather",
    description="Useful for getting the weather in a specific city",
    parameters=tool_parameters,
    function=weather,
)


## Setting Up Components

Now let's create the `ToolInvoker` and `LlamaStackChatGenerator` components.


In [12]:
# Create a tool invoker with the weather tool
tool_invoker = ToolInvoker(tools=[weather_tool])

# Create the LlamaStackChatGenerator
chat_generator = LlamaStackChatGenerator(
    model="ollama/llama3.2:3b",  # model name varies depending on the inference provider used for the Llama Stack Server
    api_base_url="http://localhost:8321/v1/openai/v1",
)


## Using Tools with the Chat Generator

Now lets ask `LlamaStackChatGenerator` some questions.


In [None]:
# Create a message asking about the weather
messages = [ChatMessage.from_user("What's the weather in Tokyo?")]

# Generate a response from the model with access to tools
response = chat_generator.run(messages=messages, tools=[weather_tool])["replies"]

print(f"Assistant message: {response[0]}")
print(f"Tool calls: {response[0].tool_calls}")


Assistant message: ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text=''), ToolCall(tool_name='weather', arguments={'city': 'Tokyo'}, id='call_52l1vdot')], _name=None, _meta={'model': 'llama3.2:3b', 'index': 0, 'finish_reason': 'tool_calls', 'usage': {'completion_tokens': 17, 'prompt_tokens': 162, 'total_tokens': 179, 'completion_tokens_details': None, 'prompt_tokens_details': None}})
Tool calls: [ToolCall(tool_name='weather', arguments={'city': 'Tokyo'}, id='call_52l1vdot')]


In [13]:
# If the assistant message contains a tool call, run the tool invoker
if response[0].tool_calls:
    print("\n🛠️ Executing tool call...")
    tool_messages = tool_invoker.run(messages=response)["tool_messages"]
    print(f"Tool results: {tool_messages}")
    
    # Continue the conversation with the tool results
    all_messages = messages + response + tool_messages
    final_response = chat_generator.run(messages=all_messages, tools=[weather_tool])["replies"]
    print(f"\nFinal response: {final_response[0].text}")
else:
    print("No tool calls were made.")



🛠️ Executing tool call...
Tool results: [ChatMessage(_role=<ChatRole.TOOL: 'tool'>, _content=[ToolCallResult(result='The weather in Tokyo is sunny and 32°C', origin=ToolCall(tool_name='weather', arguments={'city': 'Tokyo'}, id='call_52l1vdot'), error=False)], _name=None, _meta={})]

Final response: According to the weather data, the temperature in Tokyo is currently 32°C (0°F), with lots of sunshine. It's a beautiful day to explore the city! Would you like more information on the current weather conditions or forecasts for Tokyo?


Now lets create a simple mechanism to chat with `LlamaStackChatGenerator`.

In [15]:
messages = []

while True:
  msg = input("Enter your message or Q to exit\n🧑 ")
  if msg=="Q":
    break
  messages.append(ChatMessage.from_user(msg))
  response = chat_generator.run(messages=messages)
  assistant_resp = response['replies'][0]
  print("🤖 "+assistant_resp.text)
  messages.append(assistant_resp)

🤖 The main character in The Witcher series, also known as the eponymous figure, is Geralt of Rivia, a monster hunter with supernatural abilities and mutations that allow him to control the elements. He was created by Polish author_and_polish_video_game_development_company_(CD Projekt).
🤖 One of the most fascinating aspects of dolphin behavior is their ability to produce complex, context-dependent vocalizations that are unique to each individual, similar to human language. They also exhibit advanced social behaviors, such as cooperation, empathy, and self-awareness.


If you want to switch your model provider, you can reuse the same `LlamaStackChatGenerator` code with different providers. Simply run the desired inference provider on the Llama Stack Server and update the model name during the initialization of `LlamaStackChatGenerator`.

For more details on available inference providers, see (Llama Stack docs)[https://llama-stack.readthedocs.io/en/latest/providers/inference/index.html].