# Tool Calling Overview
Tool calling (also known as function calling) lets LLMs interact with your code and external systems in real time. Tool calling powers the classic agent loop of: **Observe → Decide → Act → Respond**

Instead of just generating text from internal knowledge, the model can:
- Fetch up-to-date data (weather, documents, search results)
- Take actions (call APIs, run calculations, trigger backend workflows)

This is essential for building agents and digital humans that:
- Respond accurately using real-world data
- Perform tasks through APIs or services
- Act in context, not just chat

You define available tools using structured JSON, and the model decides when and how to call them. This notebook will explore implementing a simple tool calling agent that you can expand upon in your digital human project.

## Prerequisites
Prior to getting started, you will need an NVIDIA API Key from the NVIDIA API Catalog to access the models used in this notebook.  

Need an API Key? It's Free!
  1. Navigate to **[NVIDIA API Catalog](https://build.nvidia.com/explore/discover)**.
  2. Select any model, such as `llama-3.3-70b-instruct`.
  3. On the right panel above the sample code snippet, click on "Get API Key". This will prompt you to log in if you have not already.

In [1]:
import os
import getpass
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
api_key = os.getenv("NVIDIA_API_KEY")

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvapi_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

## Tool Calling with NIM: A Simple Weather Agent

In this walkthrough, we'll build a simple tool-using agent using NVIDIA's NIM endpoint with OpenAI-compatible syntax. Our Digital Human will:
1. Process a user query about weather conditions
2. Determine whether external weather data is needed
3. Call the appropriate weather service API
4. Generate a natural language response integrating the retrieved data

This demonstration highlights the complete agent loop, showing both the decision-making process and response generation phases.

First, let's define a function to retrieve weather data:

In [13]:
from openai import OpenAI
import os
import json
import requests

def get_weather(latitude, longitude):
    response = requests.get(f"https://api.open-meteo.com/v1/forecast?latitude={latitude}&longitude={longitude}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m")
    data = response.json()
    return data['current']['temperature_2m']

Next, we'll initialize our client to connect to the NIM endpoint:

In [14]:
# Client setup for NIM endpoint
client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key=os.environ.get("NVIDIA_API_KEY")
)

#### 📄 Function Calling Schema Overview
For an agent  to effectively use tools, we need to provide detailed specifications about available functions it can use. This **schema** defines what the function does and what parameters it requires.

Functions can be set in the `tools` parameter of an API request. It is made up of the following fields:
| Parameter   | Description                                                                                   | Required |
|-------------|-----------------------------------------------------------------------------------------------|----------|
| `name`      | The name of the function the model may call. Must be a valid function name (a-z, A-Z, 0-9, underscores). | Yes   |
| `description` | A short description of what the function does. Helps the model decide when to use it.      | No    |
| `parameters` | The parameters the function accepts, written in JSON Schema format.                         | Yes   |
| `type`      | Should always be `"object"` at the top level of parameters.                                  | Yes   |
| `properties` | Defines each input field with a name, description, and type.                                | Yes   |
| `required`  | A list of required fields inside properties.                                                 | Yes   |

In [29]:
# Here's how we define a weather function:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for provided coordinates in celsius.",
        "parameters": {
            "type": "object",
            "properties": {
                "latitude": {"type": "number"},
                "longitude": {"type": "number"}
            },
            "required": ["latitude", "longitude"],
            "additionalProperties": False
        }
    }
}]

Let's process a weather-related query.  
First, we'll define our user input, then let the model evaluate the query and decide whether to use our weather tool:

In [30]:
# User input
user_question = "What’s the weather like in San Francisco?"

In [20]:
# Step 1: Let the model decide whether to call a tool
response = client.chat.completions.create(
    model="meta/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": user_question}],
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message

Let's examine the model's decision:

In [31]:
# lets view the output - we see the model responds with a tool call!
print(json.dumps(message.model_dump(), indent=2))

{
  "content": null,
  "refusal": null,
  "role": "assistant",
  "audio": null,
  "function_call": null,
  "tool_calls": [
    {
      "id": "chatcmpl-tool-d41214755ffd4b26b8ea5aa94db133e7",
      "function": {
        "arguments": "{\"latitude\": 37.7749, \"longitude\": -122.4194}",
        "name": "get_weather"
      },
      "type": "function"
    }
  ]
}


Now we'll execute the weather function with the parameters the model provided above, then send the results back to the LLM for the final response:

In [32]:
# Step 2: If the model decides to call a tool, execute it
if message.tool_calls:
    for call in message.tool_calls:
        
        if call.function.name == "get_weather":
            args = json.loads(call.function.arguments)
            tool_result = get_weather(args["latitude"], args["longitude"])
            
            # Step 3: Send tool result back to the model for final response
            followup = client.chat.completions.create(
                model="meta/llama-3.3-70b-instruct",
                messages=[
                    {"role": "user", "content": user_question},
                    message,
                    {
                        "role": "tool",
                        "tool_call_id": call.id,
                        "name": "get_weather",
                        "content": json.dumps(tool_result),
                    },
                ],
                temperature=0.2,
                top_p=0.7,
                max_tokens=1024,
                stream=True
            )

            for chunk in followup:
                if chunk.choices[0].delta.content:
                    print(chunk.choices[0].delta.content, end="")
else:
    # Fallback: no tool call, just print model response
    print(message.content)

The current temperature in San Francisco is 21.4°C (70.5°F).

#### Key Takeaways

**Tool Integration**: Digital Humans can seamlessly incorporate external data and services into their responses

**Autonomous Decision-Making**: The model independently determines when to use tools based on user queries
Complete Reasoning Loop: We've demonstrated the full observe → decide → act → respond cycle
Natural Interaction: The final response integrates technical data into a natural, human-like answer

This pattern can be extended to create sophisticated Digital Humans that leverage multiple tools and services to provide rich, contextual interactions across diverse domains and use cases.

> ℹ️ **Note: This notebook demonstrates the a direct API approach to tool calling.**  
It uses manual API calls to simulate function (tool) calling behavior in a simple, script-based way.

We start with this method because it helps illustrate the core concepts behind tool calling without introducing the complexity of full pipelines or real-time components.

In the **larger nvidia-pipecat notebook**, you'll see how these same principles are extended using a production-ready, asynchronous pipeline that supports speech, streaming, and multi-modal integration.

Here's a comparison of both approaches for context:

| **Aspect**         | **Direct API Approach**          | **Pipecat Approach**                      |
|--------------------|----------------------------------|-------------------------------------------|
| Architecture        | Script-based, synchronous        | Pipeline-based, asynchronous              |
| Integration         | Manual API client setup          | Modular component connections             |
| Speech Support      | Text-only                        | Full speech I/O (STT and TTS)             |
| Streaming           | Basic chunking                   | Full bi-directional streaming             |
| Interruptibility    | Limited                          | Built-in support                          |
| Scalability         | Limited                          | Designed for complex applications         |
| Deployment          | Simple scripts                   | Production-ready architecture             |