# Some ways to use the Responses API in Llama Stack

The development of this notebook was assisted by Google Gemini and Cursor using Claude 4 Sonnet.

Before getting started, follow the following steps.

First install Llama Stack and all of the other dependencies for this notebook.
One way to do that is:

- First install Python 3.12 or later (do not try this with older versions of Python: it will not work).
- Then make a Python virtual environment.
- Then within that virtual environment run:

```
pip install -r requirements.txt
```

Once everything is installed, run the Llama Stack server:

```
llama stack run run.yaml --image-type venv --port 8321
```

Also run the National Parks Service Model Context Protocol (MCP) server:

```
python nps_mcp_server.py --transport sse --port 3005
```

See [README_NPS](../README_NPS.md) for more information about this server.

## Configuration

Here we point to the locations of the servers we just started up above.  Also, we provide the model ID for the model we want to use.  The model ID should be one that that is specified in [run.yaml](./run.yaml).  In the [run.yaml](./run.yaml) included here, we have the following models defined:

- `openai/gpt-3.5-turbo` and `openai/gpt4o` are models from OpenAI.  They will only work if you have OPENAI_API_KEY set in your environment to a [valid OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).
- `llama-openai-compat/Llama-3.3-70B-Instruct` is a a model from Meta Llama.  This will only work if you have LLAMA_API_KEY set in your environment to a valid API key for the hosted [Llama API](https://www.llama.com/products/llama-api/).
- `watsonx/Llama-3.3-70B-Instruct` is the same model running on watsonx.ai (which has a somewhat different style for model IDs).  This will only work if you have a WATSONX_API_KEY and WATSONX_PROJECT_ID set (which requires an IBM Cloud account).  You may also need to set WATSONX_BASE_URL set if your watsonx.ai instance is running anywhere other than US South (which is the default).  Note that the watsonx provider in Llama Stack was [not working](https://github.com/llamastack/llama-stack/issues/3165) when this notebook was created, but hopefully it will work by the time you read this.

If you can't or don't want to get any of those API keys, you can update [run.yaml](./run.yaml) to use [another inference provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/index.html#overview).  Llama Stack includes numerous providers for calling hosted models like the ones above.  It also includes providers to call models that you deploy and run yourself using a model serving capability, e.g., the [vLLM provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_vllm.html) or the [ollama provider](https://llama-stack.readthedocs.io/en/latest/providers/inference/remote_ollama.html).

Debugging Tip: If you get a connection error, make sure the port used in the definition of LLAMA_STACK_URL above, is the same port shown in the output of the `llama stack run` command in your terminal window.

In [1]:
LLAMA_STACK_URL = "http://localhost:8321/"
NPS_MCP_URL = "http://localhost:3005/sse/"
LLAMA_STACK_MODEL_IDS = [
    "openai/gpt-3.5-turbo",
    "openai/gpt-4o",
    "llama-openai-compat/Llama-3.3-70B-Instruct",
    "watsonx-Llama-3.3-70B-Instruct"
]

# Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml.
LLAMA_STACK_MODEL_ID = LLAMA_STACK_MODEL_IDS[1]

## Using the Llama Stack client

The most obvious way to use the Responses API in Llama Stack is via the Llama Stack client.  That's the way that is most seamlessly integrated with Llama Stack itself, so we would recommend it for most beginning users who don't already have a commitment to another client library.  Here are some examples using this approach.

In [2]:
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url=LLAMA_STACK_URL)

In [3]:
client.chat.completions.create(
    model=LLAMA_STACK_MODEL_ID,
    messages=[{"role": "user", "content": "What is the capital of France?"}]
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK"


OpenAIChatCompletion(id='chatcmpl-CMLJhAtAGksNpIMMBISKQ90crFP1P', choices=[OpenAIChatCompletionChoice(finish_reason='stop', index=0, message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(role='assistant', content='The capital of France is Paris.', name=None, tool_calls=None, refusal=None, annotations=[], audio=None, function_call=None), logprobs=None)], created=1759441193, model='gpt-4o-2024-08-06', object='chat.completion', service_tier='default', system_fingerprint='fp_f33640a400', usage={'completion_tokens': 7, 'prompt_tokens': 14, 'total_tokens': 21, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, metrics=[{'trace_id': 'd3422a86f79d36ad25298cb79dd6d52b', 'span_id': 'b433b6a19a608413', 'timestamp': '2025-10-02T21:39:53.647856Z', 'attributes': {'model_id': 'openai/gpt-4o', 'provider_id': 'openai'}, 'type': 'metric

You should see `content='The capital of France is Paris.'` in the `OpenAIChatCompletion` printed by the previous cell.

### Plain model inference

We will start with a simple but powerful use case in which the generative AI model provides a response directly without needing extra content or tools:

In [4]:
simple_llama_stack_client_response = client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input="What is the capital of France?"
)

simple_llama_stack_client_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


ResponseObject(id='resp-a1457dab-b966-43a9-ae61-827844800b0d', created_at=1759441193, model='openai/gpt-4o', object='response', output=[OutputOpenAIResponseMessage(content=[OutputOpenAIResponseMessageContentUnionMember2(annotations=[], text='The capital of France is Paris.', type='output_text')], role='assistant', type='message', id='msg_e553afaa-53e0-4219-845e-4830f289812e', status='completed')], parallel_tool_calls=False, status='completed', text=Text(format=TextFormat(type='text', description=None, name=None, schema_=None, strict=None)), error=None, previous_response_id=None, temperature=None, top_p=None, truncation=None, user=None)

The response object is a little complex to read, so we provide a function to print it out in a more readable format:

In [5]:
def print_simple_response(response):
    print(f"ID: {response.id}")
    print(f"Status: {response.status}")
    print(f"Model: {response.model}")
    print(f"Created at: {response.created_at}")
    print(f"Output type: {response.output[0].type}")
    print(f"Response content: {response.output[0].content[0].text}")

In [6]:
print_simple_response(simple_llama_stack_client_response)

ID: resp-a1457dab-b966-43a9-ae61-827844800b0d
Status: completed
Model: openai/gpt-4o
Created at: 1759441193
Output type: message
Response content: The capital of France is Paris.


### Retrieval-Augmented Generation

Next we will expand out to enabling the model to use context from a document to complement its own internal knowledge.

To keep the example simple, we will just get one document, specifically a PDF file describing the US National Parks.

In [7]:
import requests
from pathlib import Path

# Download a sample PDF for demonstration
def download_sample_pdf(url: str, filename: str) -> str:
    """Download a PDF from URL and save it locally"""
    print(f"Downloading PDF from: {url}")
    response = requests.get(url)
    response.raise_for_status()
    
    filepath = Path(filename)
    with open(filepath, 'wb') as f:
        f.write(response.content)
    
    print(f"PDF saved as: {filepath}")
    return str(filepath)

pdf_url = "https://www.nps.gov/aboutus/upload/NPIndex2012-2016.pdf"
pdf_path = download_sample_pdf(pdf_url, "NPIndex2012-2016.pdf")
pdf_title = "The National Parks: Index 2012-2016"

Downloading PDF from: https://www.nps.gov/aboutus/upload/NPIndex2012-2016.pdf
PDF saved as: NPIndex2012-2016.pdf


Then we create a vector store and load the PDF file into that vector store.

In [8]:
import uuid

vector_store_name= f"vec_{str(uuid.uuid4())[0:8]}"

vector_store = client.vector_stores.create(name=vector_store_name)
vector_store_id = vector_store.id

vector_store_id

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK"


'vs_66b06035-9dc1-490b-b656-5a55e1ca9b2a'

In [9]:
file_create_response = client.files.create(file=Path(pdf_path), purpose="assistants")
file_create_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK"


File(id='file-27ca5281b1cb49e59811252ac08331e7', bytes=6540612, created_at=1759441196, expires_at=1790977196, filename='NPIndex2012-2016.pdf', object='file', purpose='assistants')

In [10]:
file_ingest_response = client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_create_response.id,
)
file_ingest_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_66b06035-9dc1-490b-b656-5a55e1ca9b2a/files "HTTP/1.1 200 OK"


VectorStoreFile(id='file-27ca5281b1cb49e59811252ac08331e7', attributes={}, chunking_strategy=ChunkingStrategyVectorStoreChunkingStrategyAuto(type='auto'), created_at=1759441196, object='vector_store.file', status='completed', usage_bytes=0, vector_store_id='vs_66b06035-9dc1-490b-b656-5a55e1ca9b2a', last_error=None)

Now we can uses the Responses API with that vector store to get answers from that PDF file.

In [11]:
rag_llama_stack_client_response = client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input="When did the Bering Land Bridge become a national preserve?",
    tools=[
        {
            "type": "file_search",
            "vector_store_ids": [vector_store_id],
        }
    ]
)

# Here and below, we've commented out the responses to avoid cluttering the notebook.
# You can uncomment them to see the response objects.

#rag_llama_stack_client_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


This one is a little more complicated, so we will need a more powerful print method to see it all nicely formatted:

In [12]:
def print_rag_response(response):
    print(f"ID: {response.id}")
    print(f"Status: {response.status}")
    print(f"Model: {response.model}")
    print(f"Created at: {response.created_at}")
    print(f"Output items: {len(response.output)}")
    
    for i, output_item in enumerate(response.output):
        if len(response.output) > 1:
            print(f"\n--- Output Item {i+1} ---")
        print(f"Output type: {output_item.type}")
        
        if output_item.type in ("text", "message"):
            print(f"Response content: {output_item.content[0].text}")
        elif output_item.type == "file_search_call":
            print(f"  Tool Call ID: {output_item.id}")
            print(f"  Tool Status: {output_item.status}")
            # 'queries' is a list, so we join it for clean printing
            print(f"  Queries: {', '.join(output_item.queries)}")
            # Display results if they exist, otherwise note they are empty
            print(f"  Results: {output_item.results if output_item.results else 'None'}")
        else:
            print(f"Response content: {output_item.content}")

In [13]:
print_rag_response(rag_llama_stack_client_response)

ID: resp-10aab1cc-b48e-438a-8a99-b587055f3f6c
Status: completed
Model: openai/gpt-4o
Created at: 1759441227
Output items: 2

--- Output Item 1 ---
Output type: file_search_call
  Tool Call ID: call_jEknSMM7r1vHwEvMRUKMAgNh
  Tool Status: completed
  Queries: Bering Land Bridge national preserve designation date
  Results: [OutputOpenAIResponseOutputMessageFileSearchToolCallResult(attributes={}, file_id='', filename='', score=3.040066682370848, text=' for abundant wildlife and sport\n \nfishing for five species of salmon.\n \nEstablished Dec. 2, 1980. Length: 67 miles.\n \nAcreage‚Äî30,664.79 Federal: 26,417.85 \nNonfederal: 4,246.94. \nAniakchak  \nN\national Monument  and \nAniakchak  \nNational Preserve \n1000 Silver Street, Bldg.603 \nP\nO Box 245 \nKing Salmon, AK 99613 \n907\xad246\xad3305 \nwww.nps.gov/ania \nThe Aniakchak Caldera, covering some 30 square miles, is \no\nne of the great dry calderas in the world. Located in the \nvolcanically active Aleutian Mountains, the Aniakch

### MCP tool calling

Now we will move on to a more substantial example using the NPS MCP server you set up at the top of this notebook.

In [14]:
mcp_llama_stack_client_response = client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
    tools=[
        {
            "type": "mcp",
            "server_url": NPS_MCP_URL,
            "server_label": "National Parks Service tools",
        }
    ]
)

#mcp_llama_stack_client_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


The output for this one is even more complicated, so we'll add some more clauses to the print method to come up with a version that handles both RAG and MCP tool calling:

In [15]:
def print_response(response):
    print(f"ID: {response.id}")
    print(f"Status: {response.status}")
    print(f"Model: {response.model}")
    print(f"Created at: {response.created_at}")
    print(f"Output items: {len(response.output)}")
    
    for i, output_item in enumerate(response.output):
        if len(response.output) > 1:
            print(f"\n--- Output Item {i+1} ---")
        print(f"Output type: {output_item.type}")
        
        if output_item.type in ("text", "message"):
            print(f"Response content: {output_item.content[0].text}")
        elif output_item.type == "file_search_call":
            print(f"  Tool Call ID: {output_item.id}")
            print(f"  Tool Status: {output_item.status}")
            # 'queries' is a list, so we join it for clean printing
            print(f"  Queries: {', '.join(output_item.queries)}")
            # Display results if they exist, otherwise note they are empty
            print(f"  Results: {output_item.results if output_item.results else 'None'}")
        elif output_item.type == "mcp_list_tools":
            print_mcp_list_tools(output_item)
        elif output_item.type == "mcp_call":
            print_mcp_call(output_item)
        else:
            print(f"Response content: {output_item.content}")

def print_mcp_call(mcp_call):
    """Print MCP call in a nicely formatted way"""
    print(f"\nüõ†Ô∏è  MCP Tool Call: {mcp_call.name}")
    print(f"   Server: {mcp_call.server_label}")
    print(f"   ID: {mcp_call.id}")
    print(f"   Arguments: {mcp_call.arguments}")
    
    if mcp_call.error:
        print("Error: {mcp_call.error}")
    elif mcp_call.output:
        print("Output:")
        # Try to format JSON output nicely
        try:
            import json
            parsed_output = json.loads(mcp_call.output)
            print(json.dumps(parsed_output, indent=4))
        except:
            # If not valid JSON, print as-is
            print(f"   {mcp_call.output}")
    else:
        print("   ‚è≥ No output yet")

def print_mcp_list_tools(mcp_list_tools):
    """Print MCP list tools in a nicely formatted way"""
    print(f"\nüîß MCP Server: {mcp_list_tools.server_label}")
    print(f"   ID: {mcp_list_tools.id}")
    print(f"   Available Tools: {len(mcp_list_tools.tools)}")
    print("=" * 80)
    
    for i, tool in enumerate(mcp_list_tools.tools, 1):
        print(f"\n{i}. {tool.name}")
        print(f"   Description: {tool.description}")
        
        # Parse and display input schema
        schema = tool.input_schema
        if schema and 'properties' in schema:
            properties = schema['properties']
            required = schema.get('required', [])
            
            print("   Parameters:")
            for param_name, param_info in properties.items():
                param_type = param_info.get('type', 'unknown')
                param_desc = param_info.get('description', 'No description')
                required_marker = " (required)" if param_name in required else " (optional)"
                print(f"     ‚Ä¢ {param_name} ({param_type}){required_marker}")
                if param_desc:
                    print(f"       {param_desc}")
        
        if i < len(mcp_list_tools.tools):
            print("-" * 40)

In [16]:
print_response(mcp_llama_stack_client_response)

ID: resp-b9b69b27-0c75-4c46-90dc-13675e78501e
Status: completed
Model: openai/gpt-4o
Created at: 1759441230
Output items: 3

--- Output Item 1 ---
Output type: mcp_list_tools

üîß MCP Server: National Parks Service tools
   ID: mcp_list_0c00d5f2-4822-4f4f-a228-26bf5e1c8e00
   Available Tools: 5

1. search_parks
   Description: Search for national parks by state, park code, or query string.

Args:
    state_code: Two-letter state code (e.g., 'CA', 'NY')
    park_code: Four-letter park code (e.g., 'yell', 'acad')
    query: Search query for park names or descriptions
    limit: Maximum number of results to return (default: 10)

Returns:
    JSON string with park information including name, description, website, and location
   Parameters:
     ‚Ä¢ state_code (string) (required)
     ‚Ä¢ park_code (string) (required)
     ‚Ä¢ query (string) (required)
     ‚Ä¢ limit (integer) (required)
----------------------------------------

2. get_park_alerts
   Description: Get current alerts for a 

You can see the full output from the call to the Responses API above asking "Tell me about some events at some parks in Rhode Island.".  Here are some of the highlights:

* Output Item 1 shows that it called the NPS server to list all of that server's tools.
* Output Item 2 shows that it then called the `search_parks` tool from that server with arguments `{"state_code":"RI","limit":5}` and got back a list of 4 national parks in Rhode Island.  Since the limit it specified was 5 and it only got 4, presumably that's all there are in that state.
* Output Item 3 shows that it called `get_park_events` with arguments `{"park_code": "blrv", "limit": 5}`.  Notice that the value for `park_code` is the same as the value for `code` in the first output from the `search_parks` tool.  So the model recognized that the `code` field in the output of `search_parks` corresponds to the `park_code` field in the input of `get_park_events`.  It gets back a list of events for that park.
* Output Items 4-6 also call `get_park_events` with the other three park codes that had been returned by the `search_parks` tool.
* Output Item 7 then provides the actual response to be sent to the user.  It describes the outputs of all four of the calls to `get_park_events` in a human-friendly form.

One important thing to note is that all 7 output items came from *one* call to the Responses API.  This illustrates one of the advantages of this API and its implementation in Llama Stack: Llama Stack handles all of the coordination between all of these steps and calls the model and to the MCP server.  You could accomplish the same thing using basic "completions" API that allow you to specify tools, but it would then be up to you in the client to do all that coordinating.

## Using the OpenAI client

In the examples above we used the Llama Stack client to connect to Llama Stack.  However, since we're calling an OpenAI-compatible API (Responses), we can also use the OpenAI client to do the same thing.  The setup is a little bit clunkier than it is using the Llama Stack client.  With the OpenAI client, you have to add some extra stuff at the end of the URL to point the client at the OpenAI-compatible part of the overall Llama Stack.  However, once you do that, you can do the same things with the OpenAI client as we did with the Llama Stack client above.

Using the OpenAI client may be the best fit if you are writing code for a project that already uses that client for other purposes and you don't want to add any more dependencies to that project.

We start with the slightly clunky setup:

In [17]:

from openai import OpenAI

# Direct OpenAI client instantiation with Llama Stack
openai_client = OpenAI(
    api_key="no-key-needed",  # Llama Stack typically doesn't require a real key
    base_url=LLAMA_STACK_URL + "v1/openai/v1", # This suffix gets to the part of the URL that is specific to the OpenAI API, which you need here since you are using the OpenAI client.
)

Notice that while this is the OpenAI client, the URL we pointed it to is our Llama Stack server.  So even though we're using a different client here, we're still using the same server to handle the request.  Here we see that the simple example we used above works the same way as it did with the Llama Stack client:

In [18]:

simple_openai_client_response = openai_client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input="What is the capital of France?"
)

print_response(simple_openai_client_response)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


ID: resp-0b7c5a98-6022-40c9-92a5-50c3a4b381a4
Status: completed
Model: openai/gpt-4o
Created at: 1759441234.0
Output items: 1
Output type: message
Response content: The capital of France is Paris.


The RAG example also works the same way it did with the Llama Stack client:

In [19]:
import uuid

openai_client_vector_store_name= f"vec_{str(uuid.uuid4())[0:8]}"

openai_client_vector_store = openai_client.vector_stores.create(name=vector_store_name)
openai_client_vector_store_id = vector_store.id

openai_client_vector_store_id

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK"


'vs_66b06035-9dc1-490b-b656-5a55e1ca9b2a'

In [20]:
openai_client_file_create_response = openai_client.files.create(file=Path(pdf_path), purpose="assistants")
#openai_client_file_create_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK"


In [21]:
openai_client_file_ingest_response = openai_client.vector_stores.files.create(
    vector_store_id=vector_store_id,
    file_id=file_create_response.id,
)
#openai_client_file_ingest_response

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_66b06035-9dc1-490b-b656-5a55e1ca9b2a/files "HTTP/1.1 200 OK"


In [22]:
rag_openai_client_response = openai_client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input="When did the Bering Land Bridge become a national preserve?",
    tools=[
        {
            "type": "file_search",
            "vector_store_ids": [vector_store_id],
        }
    ]
)

print_response(rag_openai_client_response)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


ID: resp-45551697-5f5b-4ef5-85e3-d2b66e46547b
Status: completed
Model: openai/gpt-4o
Created at: 1759441261.0
Output items: 2

--- Output Item 1 ---
Output type: file_search_call
  Tool Call ID: call_zQJRvx4sCE8uqHGwAqGErewO
  Tool Status: completed
  Queries: Bering Land Bridge national preserve establishment date
  Results: [Result(attributes={}, file_id='', filename='', score=3.2072613128248655, text=' for abundant wildlife and sport\n \nfishing for five species of salmon.\n \nEstablished Dec. 2, 1980. Length: 67 miles.\n \nAcreage‚Äî30,664.79 Federal: 26,417.85 \nNonfederal: 4,246.94. \nAniakchak  \nN\national Monument  and \nAniakchak  \nNational Preserve \n1000 Silver Street, Bldg.603 \nP\nO Box 245 \nKing Salmon, AK 99613 \n907\xad246\xad3305 \nwww.nps.gov/ania \nThe Aniakchak Caldera, covering some 30 square miles, is \no\nne of the great dry calderas in the world. Located in the \nvolcanically active Aleutian Mountains, the Aniakchak last \nerupted in 1931. The crater includes

And similarly, the complex example using the NPS MCP server also works the same way as it did with the Llama Stack client:

In [23]:
complex_openai_client_response = openai_client.responses.create(
    model=LLAMA_STACK_MODEL_ID,
    input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
    tools=[
        {
            "type": "mcp",
            "server_url": NPS_MCP_URL,
            "server_label": "National Parks Service tools",
        }
    ]
)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


Because both the OpenAI client and the Llama Stack client are wrapping the same underlying server API, the structures of their outputs wind up being the same too, so the complicated print_response method that prints out the response objects for the Llama Stack client also prints out the response objects for the OpenAI client.

In [24]:
print_response(complex_openai_client_response)

ID: resp-08b1b77a-ab7d-43de-865b-affb3f00c30e
Status: completed
Model: openai/gpt-4o
Created at: 1759441266.0
Output items: 3

--- Output Item 1 ---
Output type: mcp_list_tools

üîß MCP Server: National Parks Service tools
   ID: mcp_list_38dc8e2a-69bd-483d-aec0-19c17cdeb87d
   Available Tools: 5

1. search_parks
   Description: Search for national parks by state, park code, or query string.

Args:
    state_code: Two-letter state code (e.g., 'CA', 'NY')
    park_code: Four-letter park code (e.g., 'yell', 'acad')
    query: Search query for park names or descriptions
    limit: Maximum number of results to return (default: 10)

Returns:
    JSON string with park information including name, description, website, and location
   Parameters:
     ‚Ä¢ state_code (string) (required)
     ‚Ä¢ park_code (string) (required)
     ‚Ä¢ query (string) (required)
     ‚Ä¢ limit (integer) (required)
----------------------------------------

2. get_park_alerts
   Description: Get current alerts for 

As you can see, we get the same results here that we did with the Llama Stack client, as you would expect since we're calling the same Llama Stack server with the same arguments.

## LangChain

Both of the examples above are using clients that directly mirror the Responses API.  That means you have complete control over and visibility into exactly what's going into the API and exactly what's coming out of the API.  That can be useful and powerful, but it can also be a little complicated and tedious.  A lot of users prefer to use a framework that provides higher level abstractions.  Examples of such frameworks include LangChain, LangGraph, LlamaIndex, Haystack, and many more.

Here we configure LangChain with `use_responses_api` so it will use the same Responses API that we use above.  However, it hides a lot of the details of that API making the code simpler.

In [25]:
from langchain_openai import ChatOpenAI

# Pointing to your local Responses API server
llm = ChatOpenAI(
    model=LLAMA_STACK_MODEL_ID,
    base_url=LLAMA_STACK_URL + "v1/openai/v1", # This suffix gets to the part of the URL that is specific to the OpenAI API, which you need here since you are using the OpenAI client.,
    use_responses_api=True
)

In [26]:
import json

def simple_print(obj) -> None:
    if hasattr(obj, "__dict__"):
        print(json.dumps(obj.__dict__, indent=2, default=str))
    else:
        print(json.dumps(obj, indent=2, default=str))

In [27]:
langchain_simple_response = llm.invoke("What is the capital of France?")
simple_print(langchain_simple_response)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


{
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris.",
      "annotations": []
    }
  ],
  "additional_kwargs": {},
  "response_metadata": {
    "id": "resp-0ebbecfb-1a28-462d-bd41-0c10c2af3bf4",
    "created_at": 1759441270.0,
    "model": "openai/gpt-4o",
    "object": "response",
    "status": "completed",
    "model_name": "openai/gpt-4o"
  },
  "type": "ai",
  "name": null,
  "id": "msg_a7f94b46-98b3-491c-b37b-f55b853018ff",
  "example": false,
  "tool_calls": [],
  "invalid_tool_calls": [],
  "usage_metadata": null
}


We can then use the vector store that we populated earlier to do basic RAG answering:

In [28]:
llm_with_rag = llm.bind_tools([
  {
    "type": "file_search",
    "vector_store_ids": [vector_store_id],
  },
])

langchain_rag_response = llm_with_rag.invoke("When did the Bering Land Bridge become a national preserve?")
simple_print(langchain_rag_response)


INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


{
  "content": [
    {
      "type": "text",
      "text": "The Bering Land Bridge National Preserve was established on December 2, 1980.",
      "annotations": []
    }
  ],
  "additional_kwargs": {
    "tool_outputs": [
      {
        "id": "call_w7YVfzllDlk3qvhjPZLpercH",
        "queries": [
          "Bering Land Bridge national preserve established date"
        ],
        "status": "completed",
        "type": "file_search_call",
        "results": [
          {
            "attributes": {},
            "file_id": "",
            "filename": "",
            "score": 3.2836690894105725,
            "text": " for abundant wildlife and sport\n \nfishing for five species of salmon.\n \nEstablished Dec. 2, 1980. Length: 67 miles.\n \nAcreage\u201430,664.79 Federal: 26,417.85 \nNonfederal: 4,246.94. \nAniakchak  \nN\national Monument  and \nAniakchak  \nNational Preserve \n1000 Silver Street, Bldg.603 \nP\nO Box 245 \nKing Salmon, AK 99613 \n907\u00ad246\u00ad3305 \nwww.nps.gov/ania 

Similarly, we can use the same MCP server and query as in the earlier examples:

In [29]:
llm_with_mcp = llm.bind_tools([
  {
    "type": "mcp",
    "server_label": "NPS",
    "server_url": NPS_MCP_URL,
    "require_approval": "never",
  },
])

langchain_mcp_response = llm_with_mcp.invoke("Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.")
simple_print(langchain_mcp_response)

INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"


{
  "content": [
    {
      "type": "text",
      "text": "It seems that I am currently unable to access the information about parks in Rhode Island due to a rate limit issue. However, I can provide general information. Rhode Island, though small in size, is home to several beautiful parks, including:\n\n1. **Roger Williams National Memorial**: This park commemorates the life of Roger Williams, the founder of Rhode Island and a pioneer of religious freedom. It's located in Providence and offers educational exhibits related to Williams' life and legacy.\n\n2. **Blackstone River Valley National Historical Park**: Shared with Massachusetts, this park celebrates the early American Industrial Revolution, focusing on the role of the Blackstone River Valley.\n\nWhile I can't retrieve specific upcoming events at the moment, these parks often host educational tours, historical reenactments, and seasonal events. You might want to visit their official websites for the most up-to-date event infor

There is a lot more that can be done with LangChain, but a deeper dive into LangChain is beyond the scope of this document.  The key point here is that if you want a higher level framework with more abstract concepts, LangChain can be a good fit for that purpose, and you can point it at your Llama Stack server and have it use the Responses API.