# Llama Stack Quick Start Demo

This notebook demonstrates how to use Llama Stack to run an agent with tools.

## 1. Prepare Llama Stack Server (Prerequisites)

Before running this notebook, you need to deploy and start the Llama Stack Server.

### Install Llama Stack and Dependencies

This section demonstrates how to start a local Llama Stack Server. Alternatively, you can deploy a server in a Kubernetes cluster by following the instructions in the [Deploy Llama Stack Server via Operator](../en/solutions/How_to_Create_an_AI_Agent_with_LlamaStack.md#deploy-llama-stack-server-via-operator) section.

If you haven't installed the Llama Stack yet, install it along with the required provider packages:

```bash
pip install llama-stack sqlite_vec
```

The `llama-stack` package will automatically install its core dependencies. Since the configuration file uses the `sqlite-vec` provider for vector storage, you also need to install the `sqlite_vec` package.

### Start the Local Server

Set the required environment variable for the API key (used by the DeepSeek provider in the config):

```bash
export API_KEY=your-deepseek-api-key
```

**Note:** Replace `your-deepseek-api-key` with your actual DeepSeek API key.

Run the following command in your terminal to start the server:

```bash
llama stack run llama_stack_config.yaml --port 8321
```

**Note:** The server must be running before you can connect to it from this notebook.

## 2. Install Dependencies

**Note:** `llama-stack-client` requires Python 3.12 or higher. If your Python version does not meet this requirement, refer to the FAQ section in the documentation: **How to prepare Python 3.12 in Notebook**.

In [None]:
!pip install "llama-stack-client>=0.4" "requests" "fastapi" "uvicorn" --target ~/packages

## 3. Import Libraries

In [None]:
import sys
from pathlib import Path

user_site_packages = Path.home() / "packages"
if str(user_site_packages) not in sys.path:
    sys.path.insert(0, str(user_site_packages))

import os
import requests
from typing import Dict, Any
from llama_stack_client import LlamaStackClient, Agent
from llama_stack_client.lib.agents.client_tool import client_tool
from llama_stack_client.lib.agents.event_logger import AgentEventLogger

print('Libraries imported successfully')

## 4. Define a Tool

Use the @client_tool decorator to define a weather query tool.

In [None]:
@client_tool
def get_weather(city: str) -> Dict[str, Any]:
    """Get current weather information for a specified city.

    Uses the wttr.in free weather API to fetch weather data.

    :param city: City name, e.g., Beijing, Shanghai, Paris
    :returns: Dictionary containing weather information including city, temperature and humidity
    """
    try:
        url = f'https://wttr.in/{city}?format=j1'
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        data = response.json()

        current = data['current_condition'][0]
        return {
            'city': city,
            'temperature': f"{current['temp_C']}°C",
            'humidity': f"{current['humidity']}%",
        }
    except Exception as e:
        return {'error': f'Failed to get weather information: {str(e)}'}

print('Weather tool defined successfully')

## 5. Connect to Server and Create Agent

Use LlamaStackClient to connect to the running server, create an Agent, and execute tool calls.

In [None]:
base_url = os.getenv('LLAMA_STACK_URL', 'http://localhost:8321') # or change it to your server's URL
print(f'Connecting to Server: {base_url}')

client = LlamaStackClient(base_url=base_url)

# Get available models
print('Getting available models...')
try:
    models = client.models.list()
    if not models:
        raise Exception('No models found')

    print(f'Found {len(models)} available models:')
    for model in models[:5]:  # Show only first 5
        model_type = model.custom_metadata.get('model_type', 'unknown') if model.custom_metadata else 'unknown'
        print(f'  - {model.id} ({model_type})')

    # Select first LLM model
    llm_model = next(
        (m for m in models
            if m.custom_metadata and m.custom_metadata.get('model_type') == 'llm'),
        None
    )
    if not llm_model:
        raise Exception('No LLM model found')

    model_id = llm_model.id
    print(f'Using model: {model_id}\n')

except Exception as e:
    print(f'Failed to get model list: {e}')
    print('Make sure the server is running')
    raise


# Create Agent
print('Creating Agent...')
agent = Agent(
    client,
    model=model_id,
    instructions='You are a helpful weather assistant. When users ask about weather, use the get_weather tool to query weather information, then answer based on the query results.',
    tools=[get_weather],
)

print('Agent created successfully')

## 6. Run the Agent

In [None]:
# Create session
session_id = agent.create_session('weather-agent-session')
print(f'✓ Session created: {session_id}\n')

# First query
print('=' * 60)
print('User> What is the weather like in Beijing today?')
print('-' * 60)

response_stream = agent.create_turn(
    messages=[{'role': 'user', 'content': 'What is the weather like in Beijing today?'}],
    session_id=session_id,
    stream=True,
)

### Display the Result

In [None]:
logger = AgentEventLogger()
for printable in logger.log(response_stream):
    print(printable, end='', flush=True)
print('\n')

### Try Different Queries

In [None]:
# Second query
print('=' * 60)
print('User> What is the weather in Shanghai?')
print('-' * 60)

response_stream = agent.create_turn(
    messages=[{'role': 'user', 'content': 'What is the weather in Shanghai?'}],
    session_id=session_id,
    stream=True,
)

logger = AgentEventLogger()
for printable in logger.log(response_stream):
    print(printable, end='', flush=True)
print('\n')

## 7. FastAPI Service Example

You can also run the agent as a FastAPI web service for production use. This allows you to expose the agent functionality via HTTP API endpoints.

In [None]:
# Import FastAPI components
from fastapi import FastAPI
from pydantic import BaseModel
from threading import Thread
import time

# Create a simple FastAPI app
api_app = FastAPI(title="Llama Stack Agent API")

class ChatRequest(BaseModel):
    message: str


@api_app.post("/chat")
async def chat(request: ChatRequest):
    """Chat endpoint that uses the Llama Stack Agent"""
    session_id = agent.create_session('fastapi-weather-session')

    # Create turn and collect response
    response_stream = agent.create_turn(
        messages=[{'role': 'user', 'content': request.message}],
        session_id=session_id,
        stream=True,
    )

    # Collect the full response
    full_response = ""
    logger = AgentEventLogger()
    for printable in logger.log(response_stream):
        full_response += printable

    return {"response": full_response}

print("FastAPI app created. Use the next cell to start the server.")

### Start the FastAPI Server

**Note**: In a notebook, you can start the server in a background thread. For production, run it as a separate process using `uvicorn`.

In [None]:
# Start server in background thread (for notebook demonstration)
from uvicorn import Config, Server

# Create a server instance that can be controlled
config = Config(api_app, host="127.0.0.1", port=8000, log_level="info")
server = Server(config)

def run_server():
    server.run()

# Use daemon=True so the thread stops automatically when the kernel restarts
# This is safe for notebook demonstrations
# For production, use process managers instead of threads
server_thread = Thread(target=run_server, daemon=True)
server_thread.start()

# Wait a moment for the server to start
time.sleep(2)
print("✓ FastAPI server started at http://127.0.0.1:8000")

### Test the API

Now you can call the API using HTTP requests:

In [None]:
# Test the API endpoint
response = requests.post(
    "http://127.0.0.1:8000/chat",
    json={"message": "What's the weather in Shanghai?"},
    timeout=60
)

print(f"Status Code: {response.status_code}")
print("Response:")
print(response.json().get('response'))

### Stop the Server

You can stop the server by calling its shutdown method:

In [None]:
# Stop the server
if 'server' in globals() and server.started:
    server.should_exit = True
    print("✓ Server shutdown requested. It will stop after handling current requests.")
    print("  Note: The server will also stop automatically when you restart the kernel.")
else:
    print("Server is not running or has already stopped.")

## 8. More Resources

For more resources on developing AI Agents with Llama Stack, see:

### Official Documentation
- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.
- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management.

### Code Examples and Projects
- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers.
- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios.

### Community and Support
- [Llama Stack GitHub Issues](https://github.com/llamastack/llama-stack/issues) - Report bugs, ask questions, and contribute to the project.
