# Llama Stack API Testing

This notebook demonstrates how to connect to a Llama Stack server and perform basic operations using the **Llama Stack API**.

Let's start by querying the Llama Stack server and check if it is healthy and ready to accept requests. First we need to install the `llama-stack` client using `pip`:

In [1]:
%pip install llama-stack-client==0.3.0 rich

Collecting llama-stack-client==0.3.0
  Downloading llama_stack_client-0.3.0-py3-none-any.whl.metadata (18 kB)
Collecting rich
  Downloading rich-14.2.0-py3-none-any.whl.metadata (18 kB)
Collecting click (from llama-stack-client==0.3.0)
  Downloading click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
Collecting distro<2,>=1.7.0 (from llama-stack-client==0.3.0)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting fire (from llama-stack-client==0.3.0)
  Downloading fire-0.7.1-py3-none-any.whl.metadata (5.8 kB)
Collecting pandas (from llama-stack-client==0.3.0)
  Downloading pandas-2.3.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (91 kB)
Collecting pyaml (from llama-stack-client==0.3.0)
  Downloading pyaml-25.7.0-py3-none-any.whl.metadata (12 kB)
Collecting pydantic<3,>=1.9.0 (from llama-stack-client==0.3.0)
  Downloading pydantic-2.12.5-py3-none-any.whl.metadata (90 kB)
Collecting termcolor (from llama-stack-client==0.3.0)
  Downloading termcolor-

Now, import the `LlamaStackClient` class, and set the base URL of the Llamastack server (You can get the URL by running `oc get svc -n competitor-analysis`)

In [2]:
from llama_stack_client import LlamaStackClient
import rich

# For access from Notebooks within the cluster
LS_URL = "http://llama-stack-dist-service:8321"

# For access from Notebooks external to the cluster, use the route URL instead (oc get route -n competitor-analysis)
#LS_URL = "https://llama-stack-ext-competitor-analysis.apps.ocp.sx7qw.sandbox2219.opentlc.com/"

# Initialize the client
client = LlamaStackClient(base_url=LS_URL)

List the models available in this Llama Stack instance.

In [3]:
models = client.models.list()
rich.print(models)

INFO:httpx:HTTP Request: GET http://llama-stack-dist-service:8321/v1/models "HTTP/1.1 200 OK"


Now, list the providers available in this Llama Stack instance

In [4]:
providers = client.providers.list()
rich.print(providers[:3])  # Print only first 3 providers
print(f"\n... and {len(providers) - 3} more providers")

INFO:httpx:HTTP Request: GET http://llama-stack-dist-service:8321/v1/providers "HTTP/1.1 200 OK"



... and 15 more providers


## Chat Completion (OpenAI Compatible API)

In the latest versions of Llama Stack, use `client.chat.completions.create()` instead of the deprecated `client.inference.chat_completion()`.

In [5]:
# Get the LLM model for inference
inference_model_id = next(m.identifier for m in models if m.model_type == "llm")
print(f"Using model: {inference_model_id}")

prompt = "What is the capital of Mongolia?"

# LlamaStack 0.3.0: Use OpenAI-compatible chat.completions.create() API
response = client.chat.completions.create(
    messages=[{"role": "user", "content": prompt}],
    model=inference_model_id  # Note: 'model' not 'model_id'
)
rich.print(response)

# Extract the answer
if response.choices and len(response.choices) > 0:
    answer = response.choices[0].message.content
    print(f"\n‚úÖ Answer: {answer}")

Using model: vllm-inference/granite-3-3-8b-instruct


INFO:httpx:HTTP Request: POST http://llama-stack-dist-service:8321/v1/chat/completions "HTTP/1.1 200 OK"



‚úÖ Answer: The capital of Mongolia is Ulaanbaatar, a city that serves as the political, economic, and cultural hub of the country. Ulaanbaatar is located in the north-central part of Mongolia and is the largest city in the nation, with a population of over 1.4 million people (as of 2021 estimates). It has a rich history, blending traditional Mongolian culture with modern urban developments. The city is home to essential landmarks, such as the Genghis Khan Statue Complex, the National Museum of Mongolia, and Sukhbaatar Square, which has historical significance as the site where Mongolia declared independence from China in 1921.

Ulaanbaatar is also known for experiencing extreme seasonal temperature variations, with long, cold winters and short, mild summers. The city faces environmental challenges related to air pollution, primarily due to the extensive use of coal for heating during winter. Nonetheless, Ulaanbaatar remains a vital center for commerce, education, and tourism in Mongo

## List Vector Stores

Llama Stack natively integrates with vector databases, using the `vector_stores` provider. In our case, we are using Milvus. The database was populated from the KubeFlow Pipeline (KFP) that you ran earlier.


In [6]:
# List all vector stores
vector_stores = list(client.vector_stores.list())

if not vector_stores:
    print("‚ùå No vector stores found!")
    print("Run the KFP pipeline to ingest documents first")
else:
    print(f"‚úÖ Found {len(vector_stores)} vector store(s)\n")
    
    for vs in vector_stores:
        vs_id = getattr(vs, 'id', None)
        vs_name = getattr(vs, 'name', None)
        file_counts = getattr(vs, 'file_counts', None)
        
        print(f"Vector Store: {vs_name}")
        print(f"  ID: {vs_id}")
        if file_counts:
            print(f"  Files: {file_counts.total} total, {file_counts.completed} completed, {file_counts.failed} failed")
        print()
        
        # Store for later use
        target_vector_store = vs


INFO:httpx:HTTP Request: GET http://llama-stack-dist-service:8321/v1/vector_stores "HTTP/1.1 200 OK"


‚úÖ Found 1 vector store(s)

Vector Store: competitor-docs
  ID: vs_a8fba93a-3efe-4c36-a43d-0cb25edca697
  Files: 11 total, 11 completed, 0 failed



## RAG Query: Search Vector Store

Use `client.vector_stores.search()` to perform semantic search on the indexed documents. Llama Stack does a semantic similarity search in the background and fetches the relevant documents from the Milvus vector database.


In [7]:
# Define your search query
query_text = "What was the standalone Profit After Tax (PAT) for HDFC Bank in Q2 FY26?"

print(f"üîç Query: {query_text}\n")

# Perform semantic search using the vector store API
try:
    search_response = client.vector_stores.search(
        vector_store_id=target_vector_store.id,
        query=query_text,
        max_num_results=5
    )
    
    # Display results
    if search_response.data and len(search_response.data) > 0:
        print(f"‚úÖ Found {len(search_response.data)} results:\n")
        
        for i, item in enumerate(search_response.data):
            # Extract content
            content = ""
            if hasattr(item, 'content'):
                if isinstance(item.content, list):
                    content = " ".join([
                        c.text if hasattr(c, 'text') else str(c) 
                        for c in item.content
                    ])
                else:
                    content = str(item.content)
            
            score = getattr(item, 'score', 'N/A')
            
            print(f"[{i+1}] Score: {score:.4f}")
            print(f"    {content[:300]}...")
            print()
    else:
        print("‚ùå No results found for your query")
        
except Exception as e:
    print(f"‚ùå Search failed: {e}")
    import traceback
    traceback.print_exc()


INFO:httpx:HTTP Request: POST http://llama-stack-dist-service:8321/v1/vector_stores/vs_a8fba93a-3efe-4c36-a43d-0cb25edca697/search "HTTP/1.1 200 OK"


üîç Query: What was the standalone Profit After Tax (PAT) for HDFC Bank in Q2 FY26?

‚úÖ Found 5 results:

[1] Score: 0.9163
    ; down 22% YoY
- Net profit after tax of ‚Çπ 1.8 bn compared to profit of ‚Çπ 2.0 bn in the prior year
- EPS of ‚Çπ 2.52
- Solvency Ratio at 210% as of September 30, 2025

## Subsidiaries - Q2FY26 update - HDFC Securities Ltd

- 94.11% stake held by the Bank as of September 30, 2025
- 7.4 million customer...

[2] Score: 0.9029
     - 400 013. CIN:  L65920MH1994PLC080618

<!-- image -->

## NEWS RELEASE

HDFC Bank Ltd. HDFC Bank House, Senapati Bapat Marg, Lower Parel, Mumbai - 400 013. CIN:  L65920MH1994PLC080618

herein below are in accordance with the accounting standards used in their standalone reporting under the applica...

[3] Score: 0.8999
     exchange &amp; derivatives revenue of ‚Çπ 15.9 billion (‚Çπ 14.6 billion in the corresponding quarter of the previous year), net trading and mark to market gain of ‚Çπ 23.9 billion (‚Çπ 2.9 billion in the corr

In the next notebook, you will implement a Retrieval Augemented Generation (RAG) pipeline with Llama Stack
