# Semantic Kernel Tool Use Example

This document provides an overview and explanation of the code used to create a Semantic Kernel-based tool that integrates with ChromaDB for Retrieval-Augmented Generation (RAG). The example demonstrates how to build an AI agent that retrieves travel documents from a ChromaDB collection, augments user queries with semantic search results, and streams detailed travel recommendations.

## Initializing the Environment

SQLite Version Fix
If you encounter the error:
```
RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0
```

Uncomment this code block at the start of your notebook:

In [None]:
# %pip install pysqlite3-binary
# import sys
# sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

[31mERROR: Could not find a version that satisfies the requirement pysqlite3-binary (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for pysqlite3-binary[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


KeyError: 'pysqlite3'

### Importing Packages
The following code imports the necessary packages:

In [5]:
import json
import os
import chromadb
from typing import Annotated, TYPE_CHECKING
from dotenv import load_dotenv

from IPython.display import display, HTML

from azure.identity import DefaultAzureCredential

from semantic_kernel.agents import ChatCompletionAgent, ChatHistoryAgentThread
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import FunctionCallContent,FunctionResultContent, StreamingTextContent
from semantic_kernel.functions import kernel_function

if TYPE_CHECKING:
    from chromadb.api.models.Collection import Collection

### Creating the Semantic Kernel and AI Service

A Semantic Kernel instance is created and configured with an asynchronous OpenAI chat completion service. The service is added to the kernel for use in generating responses.

In [None]:
load_dotenv()

# Option 1: Using API Key (recommended for development)
chat_completion_service = AzureChatCompletion(
    deployment_name=os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini"),
    endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-01"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY")
)

# Option 2: Using Azure AD Authentication (uncomment to use)
# Create Azure credential 
credential = DefaultAzureCredential()

# Create a token provider function
def get_azure_ad_token():
    """Function to get Azure AD token for OpenAI."""
    token = credential.get_token("https://cognitiveservices.azure.com/.default")
    return token.token

# chat_completion_service = AzureChatCompletion(
#     deployment_name=os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini"),
#     endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
#     api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-01"),
#     ad_token=get_azure_ad_token()

### Defining the Prompt Plugin

The PromptPlugin is a native plugin that defines a function to build an augmented prompt using retrieval context

In [None]:
class PromptPlugin:
    """
    This plugin implements Retrieval-Augmented Generation (RAG) by combining:
    1. Information retrieval from ChromaDB vector database
    2. Prompt augmentation with retrieved context
    3. Generation using the augmented prompt
    """

    def __init__(self, collection: "Collection"):
        # Store reference to ChromaDB collection for semantic search
        self.collection = collection

    @kernel_function(
        name="build_augmented_prompt",
        description="Build an augmented prompt using retrieval context."
    )
    def build_augmented_prompt(self, query: str, retrieval_context: str) -> str:
        """
        RAG AUGMENTATION
        Takes the user's original query and retrieved context, then creates
        a structured prompt that instructs the LLM to base its response on
        the provided context rather than just its training data.
        
        This is the "Augmented" part of RAG - we're augmenting the user's
        query with relevant retrieved information.
        """
        return (
            f"Retrieved Context:\n{retrieval_context}\n\n"
            f"User Query: {query}\n\n"
            "Based ONLY on the above context, please provide your answer."
        )
    
    @kernel_function(name="retrieve_context", description="Retrieve context from the database.")
    def get_retrieval_context(self, query: str) -> str:
        """
        RAG Step 1: RETRIEVAL
        This function performs semantic search in the ChromaDB vector database:
        
        1. Takes the user's query as input
        2. Uses ChromaDB's built-in embedding model to convert the query to vectors
        3. Searches for the most similar documents in the vector database
        4. Returns the top 2 most relevant documents with their metadata
        
        This is the "Retrieval" part of RAG - we're retrieving relevant
        information from our knowledge base before generating a response.
        """
        # Perform semantic search in ChromaDB
        # ChromaDB automatically converts the query text to embeddings and
        # finds the most semantically similar documents
        results = self.collection.query(
            query_texts=[query],              # Convert user query to vector embedding
            include=["documents", "metadatas"], # Return both text content and metadata
            n_results=2                       # Retrieve top 2 most similar documents
        )
        
        # Process and format the retrieved results
        context_entries = []
        if results and results.get("documents") and results["documents"][0]:
            # Combine each document with its metadata for richer context
            for doc, meta in zip(results["documents"][0], results["metadatas"][0]):
                context_entries.append(f"Document: {doc}\nMetadata: {meta}")
        
        # Return formatted context or fallback message
        # This retrieved context will be used in the augmentation step
        return "\n\n".join(context_entries) if context_entries else "No retrieval context found."

# RAG Process Summary:
# 1. RETRIEVAL: get_retrieval_context() searches ChromaDB for relevant documents
# 2. AUGMENTATION: build_augmented_prompt() combines query + retrieved context  
# 3. GENERATION: The LLM generates a response based on the augmented prompt
#
# This approach ensures the AI's responses are grounded in your specific
# knowledge base rather than just relying on its training data.

### Defining Weather Information Plugin

The WeatherInfoPlugin is a native plugin that provides temperature information for specific travel destinations.

In [8]:
class WeatherInfoPlugin:
    """A Plugin that provides the average temperature for a travel destination."""

    def __init__(self):
        # Dictionary of destinations and their average temperatures
        self.destination_temperatures = {
            "maldives": "82°F (28°C)",
            "swiss alps": "45°F (7°C)",
            "african safaris": "75°F (24°C)"
        }

    @kernel_function(description="Get the average temperature for a specific travel destination.")
    def get_destination_temperature(self, destination: str) -> Annotated[str, "Returns the average temperature for the destination."]:
        """Get the average temperature for a travel destination."""
        # Normalize the input destination (lowercase)
        normalized_destination = destination.lower()

        # Look up the temperature for the destination
        if normalized_destination in self.destination_temperatures:
            return f"The average temperature in {destination} is {self.destination_temperatures[normalized_destination]}."
        else:
            return f"Sorry, I don't have temperature information for {destination}. Available destinations are: Maldives, Swiss Alps, and African safaris."

### Defining Destinations Information Plugin

The DestinationsPlugin is a native plugin that provides detailed information about popular travel destinations.

In [9]:
class DestinationsPlugin:
    # Destination data store with rich details about popular travel locations
    DESTINATIONS = {
        "maldives": {
            "name": "The Maldives",
            "description": "An archipelago of 26 atolls in the Indian Ocean, known for pristine beaches and overwater bungalows.",
            "best_time": "November to April (dry season)",
            "activities": ["Snorkeling", "Diving", "Island hopping", "Spa retreats", "Underwater dining"],
            "avg_cost": "$400-1200 per night for luxury resorts"
        },
        "swiss alps": {
            "name": "The Swiss Alps",
            "description": "Mountain range spanning across Switzerland with picturesque villages and world-class ski resorts.",
            "best_time": "December to March for skiing, June to September for hiking",
            "activities": ["Skiing", "Snowboarding", "Hiking", "Mountain biking", "Paragliding"],
            "avg_cost": "$250-500 per night for alpine accommodations"
        },
        "safari": {
            "name": "African Safari",
            "description": "Wildlife viewing experiences across various African countries including Kenya, Tanzania, and South Africa.",
            "best_time": "June to October (dry season) for optimal wildlife viewing",
            "activities": ["Game drives", "Walking safaris", "Hot air balloon rides", "Cultural village visits"],
            "avg_cost": "$400-800 per person per day for luxury safari packages"
        },
        "bali": {
            "name": "Bali, Indonesia",
            "description": "Island paradise known for lush rice terraces, beautiful temples, and vibrant culture.",
            "best_time": "April to October (dry season)",
            "activities": ["Surfing", "Temple visits", "Rice terrace trekking", "Yoga retreats", "Beach relaxation"],
            "avg_cost": "$100-500 per night depending on accommodation type"
        },
        "santorini": {
            "name": "Santorini, Greece",
            "description": "Stunning volcanic island with white-washed buildings and blue domes overlooking the Aegean Sea.",
            "best_time": "Late April to early November",
            "activities": ["Sunset watching in Oia", "Wine tasting", "Boat tours", "Beach hopping", "Ancient ruins exploration"],
            "avg_cost": "$200-600 per night for caldera view accommodations"
        }
    }

    @kernel_function(
        name="get_destination_info",
        description="Provides detailed information about specific travel destinations."
    )
    def get_destination_info(self, query: str) -> str:
        # Find which destination is being asked about
        query_lower = query.lower()
        matching_destinations = []

        for key, details in DestinationsPlugin.DESTINATIONS.items():
            if key in query_lower or details["name"].lower() in query_lower:
                matching_destinations.append(details)

        if not matching_destinations:
            return (f"User Query: {query}\n\n"
                    f"I couldn't find specific destination information in our database. "
                    f"Please use the general retrieval system for this query.")

        # Format destination information
        destination_info = "\n\n".join([
            f"Destination: {dest['name']}\n"
            f"Description: {dest['description']}\n"
            f"Best time to visit: {dest['best_time']}\n"
            f"Popular activities: {', '.join(dest['activities'])}\n"
            f"Average cost: {dest['avg_cost']}" for dest in matching_destinations
        ])

        return (f"Destination Information:\n{destination_info}\n\n"
                f"User Query: {query}\n\n"
                "Based on the above destination details, provide a helpful response "
                "that addresses the user's query about this location.")

## Setting Up ChromaDB

To facilitate Retrieval-Augmented Generation, a persistent ChromaDB client is instantiated and a collection named `"travel_documents"` is created (or retrieved if it exists). This collection is then populated with sample travel documents and metadata.

In [None]:
# STEP 1: Create or connect to a persistent ChromaDB vector database
# ChromaDB automatically handles:
# - Converting text documents to vector embeddings using a default embedding model
# - Storing these embeddings in a searchable vector index
# - Persisting the database to disk for reuse across sessions
collection = chromadb.PersistentClient(path="./chroma_db").create_collection(
    name="travel_documents",                    # Unique collection name for our travel knowledge base
    metadata={"description": "travel_service"}, # Optional metadata about this collection
    get_or_create=True,                        # Reuse existing collection if it already exists
)

# STEP 2: Define our knowledge base documents
# These are the documents that will be available for retrieval during RAG
# In a real application, these might come from:
# - PDF files, web scraping, databases, APIs, etc.
# - Company documentation, FAQs, product manuals, etc.
documents = [
    "Contoso Travel offers luxury vacation packages to exotic destinations worldwide.",
    "Our premium travel services include personalized itinerary planning and 24/7 concierge support.",
    "Contoso's travel insurance covers medical emergencies, trip cancellations, and lost baggage.",
    "Popular destinations include the Maldives, Swiss Alps, and African safaris.",
    "Contoso Travel provides exclusive access to boutique hotels and private guided tours.",
]

# STEP 3: Add documents to the vector database
# ChromaDB will automatically:
# 1. Convert each document text into high-dimensional vector embeddings
# 2. Store these vectors in an efficient searchable index
# 3. Associate each vector with its original text and metadata
collection.add(
    documents=documents,                                           # The actual text content
    ids=[f"doc_{i}" for i in range(len(documents))],             # Unique ID for each document
    metadatas=[{"source": "training", "type": "explanation"} for _ in documents]  # Metadata for each doc
)

# What happens behind the scenes:
# - Each document is processed by an embedding model (like sentence-transformers)
# - Text is converted to a vector of numbers (e.g., 384 or 768 dimensions)
# - Similar documents will have similar vector representations
# - When we query later, ChromaDB will find documents with vectors most similar to the query vector
#
# This vector database is now ready for semantic search during RAG retrieval!

In [None]:
agent = ChatCompletionAgent(
    service=chat_completion_service,
    plugins=[DestinationsPlugin(), WeatherInfoPlugin(), PromptPlugin(collection)],
    name="TravelAgent",
    instructions="""You are a travel agent assistant. For ANY travel-related query:

1. ALWAYS first call 'retrieve_context' to get relevant information from the knowledge base
2. ALWAYS then call 'build_augmented_prompt' with the user's query and the retrieved context
3. Use the augmented prompt to provide your final response

For temperature questions, also call 'get_destination_temperature'.
For destination details, also call 'get_destination_info' if applicable.

Never say 'I have no context for that' - always attempt to retrieve context first.""",
)

### Running the Agent with Streaming Chat History
The main asynchronous loop creates a chat history for the conversation and, for each user input, first adds the augmented prompt (as a system message) to the chat history so that the agent sees the retrieval context. The user message is also added, and then the agent is invoked using streaming. The output is printed as it streams in.

In [12]:
async def main():
    thread: ChatHistoryAgentThread | None = None

    user_inputs = [
        "Can you explain Contoso's travel insurance coverage?",
        "What is the average temperature of the Maldives?",
        "What is a good cold destination offered by Contoso and what is it average temperature?",
    ]

    for user_input in user_inputs:
        html_output = (
            f"<div style='margin-bottom:10px'>"
            f"<div style='font-weight:bold'>User:</div>"
            f"<div style='margin-left:20px'>{user_input}</div></div>"
        )

        agent_name = None
        full_response: list[str] = []
        function_calls: list[str] = []

        # Buffer to reconstruct streaming function call
        current_function_name = None
        argument_buffer = ""

        async for response in agent.invoke_stream(
            messages=user_input,
            thread=thread,
        ):
            thread = response.thread
            agent_name = response.name
            content_items = list(response.items)

            for item in content_items:
                if isinstance(item, FunctionCallContent):
                    if item.function_name:
                        current_function_name = item.function_name

                    # Accumulate arguments (streamed in chunks)
                    if isinstance(item.arguments, str):
                        argument_buffer += item.arguments
                elif isinstance(item, FunctionResultContent):
                    # Finalize any pending function call before showing result
                    if current_function_name:
                        formatted_args = argument_buffer.strip()
                        try:
                            parsed_args = json.loads(formatted_args)
                            formatted_args = json.dumps(parsed_args)
                        except Exception:
                            pass  # leave as raw string

                        function_calls.append(f"Calling function: {current_function_name}({formatted_args})")
                        current_function_name = None
                        argument_buffer = ""

                    function_calls.append(f"\nFunction Result:\n\n{item.result}")
                elif isinstance(item, StreamingTextContent) and item.text:
                    full_response.append(item.text)

        if function_calls:
            html_output += (
                "<div style='margin-bottom:10px'>"
                "<details>"
                "<summary style='cursor:pointer; font-weight:bold; color:#0066cc;'>Function Calls (click to expand)</summary>"
                "<div style='margin:10px; padding:10px; background-color:#f8f8f8; "
                "border:1px solid #ddd; border-radius:4px; white-space:pre-wrap; font-size:14px; color:#333;'>"
                f"{chr(10).join(function_calls)}"
                "</div></details></div>"
            )

        html_output += (
            "<div style='margin-bottom:20px'>"
            f"<div style='font-weight:bold'>{agent_name or 'Assistant'}:</div>"
            f"<div style='margin-left:20px; white-space:pre-wrap'>{''.join(full_response)}</div></div><hr>"
        )

        display(HTML(html_output))

await main()
