# Tool Calling with Nemotron-Super-49B-v1.5 on NVIDIA NIM

This notebook demonstrates how to invoke the `nvidia/llama-3_3-nemotron-super-49b-v1_5` large language model through NVIDIA Inference Microservices (NIM) using the OpenAI-compatible Chat Completions API and walk through a minimal tool-calling loop.

- Model card: [NVIDIA Llama-3.3 Nemotron Super 49B v1.5](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1_5)
- NIM Docs: [https://docs.nvidia.com/nim/](https://docs.nvidia.com/nim/)


## Table of Contents
- [Prerequisites](#Prerequisites)
- [Setup](#Setup)
- [API Key Generation](#API-Key-Generation)
- [Define Tools](#Define-Tools)
- [Run Tool Call](#Run-Tool-Call)
- [Troubleshooting](#Troubleshooting)
- [Conclusion](#Conclusion)


## Prerequisites

- Python 3.10 or later with `pip`
- An NVIDIA API key with access to the NIM endpoints (set `NVIDIA_API_KEY` in your environment)
- Network access to reach `https://integrate.api.nvidia.com`


## Setup

Run the following cell once per environment to install the Python dependencies. If your workspace already has compatible versions, you can skip this step.


In [None]:
%pip install --upgrade openai python-dotenv --quiet

### API Key Generation
Before we get started, generate the API keys to use model from NVIDIA NIM microservice.

To generate 'NVIDIA_API_KEY' for NVIDIA NIM microservice:

1. Create a free account with [NVIDIA](https://build.nvidia.com/nvidia/nvidia-nemotron-nano-9b-v2).
2. Under Input select the Python tab, and click **Get API Key** and then click **Generate Key**.
3. Copy and save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.

### Initialize Client

The next cell initializes an OpenAI-compatible client configured for the Nemotron NIM endpoint. It reads your NVIDIA API key from the `NVIDIA_API_KEY` environment variable and falls back to a secure prompt if the variable is unset.


In [None]:
import json
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

NIM_BASE_URL = "https://integrate.api.nvidia.com/v1"
MODEL_ID = "nvidia/llama-3.3-nemotron-super-49b-v1.5"

api_key = os.getenv("NVIDIA_API_KEY")
if not api_key:
    raise RuntimeError("Set NVIDIA_API_KEY in your environment or .env before running this cell.")

client = OpenAI(api_key=api_key, base_url=NIM_BASE_URL)
print(f"Client ready for {MODEL_ID}")


Client ready for nvidia/llama-3.3-nemotron-super-49b-v1.5


## Define Tools

We'll expose a single local function that returns mock weather data. The model will call this tool when it needs structured information and we'll feed the result back into the conversation.


In [7]:
def get_current_weather(location: str, unit: str = "celsius") -> dict:
    """Return dummy weather data for the requested location."""
    forecast = {
        "temperature_c": 19,
        "temperature_f": 66,
        "condition": "clear skies",
        "humidity": 0.72,
        "wind_kph": 8.0,
    }
    if unit.lower().startswith("f"):
        temperature = forecast["temperature_f"]
        unit_label = "F"
    else:
        temperature = forecast["temperature_c"]
        unit_label = "°C"

    return {
        "location": location,
        "summary": forecast["condition"],
        "temperature": temperature,
        "unit": unit_label,
        "humidity": forecast["humidity"],
        "wind_kph": forecast["wind_kph"],
    }

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Look up the current weather for a city and return structured conditions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g. San Francisco, CA",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit to return.",
                    },
                },
                "required": ["location"],
            },
        },
    }
]


## Run Tool Call

The snippet below sends a chat completion request with the tool definition, inspects the tool call returned by the model, executes the local function, and then submits the tool result back to the model to obtain a final natural-language answer.


In [8]:
messages = [
    {
        "role": "system",
        "content": "You are a helpful weather assistant. Call tools when you need real-world data before replying to the user.",
    },
    {
        "role": "user",
        "content": "What's the weather like in San Francisco this afternoon?",
    },
]

first_completion = client.chat.completions.create(
    model=MODEL_ID,
    messages=messages,
    tools=tools,
    tool_choice="auto",
    temperature=0.2,
    max_tokens=512,
)
assistant_message = first_completion.choices[0].message
print("Assistant requested tool:")
print(assistant_message)

assistant_dict = {
    "role": assistant_message.role,
    "content": assistant_message.content or "",
}
if assistant_message.tool_calls:
    assistant_dict["tool_calls"] = [
        {
            "id": call.id,
            "type": call.type,
            "function": {
                "name": call.function.name,
                "arguments": call.function.arguments,
            },
        }
        for call in assistant_message.tool_calls
    ]

messages.append(assistant_dict)

for call in assistant_message.tool_calls or []:
    if call.function.name != "get_current_weather":
        continue
    call_args = json.loads(call.function.arguments or "{}")
    tool_response = get_current_weather(**call_args)
    print("Tool response:")
    print(tool_response)
    messages.append(
        {
            "role": "tool",
            "tool_call_id": call.id,
            "name": call.function.name,
            "content": json.dumps(tool_response),
        }
    )


final_completion = client.chat.completions.create(
    model=MODEL_ID,
    messages=messages,
    temperature=0.2,
    max_tokens=512
)
final_message = final_completion.choices[0].message
print("\nFinal model answer:\n")
print(final_message.content)


Assistant requested tool:
ChatCompletionMessage(content='<think>\nOkay, the user is asking about the weather in San Francisco this afternoon. Let me check what tools I have available. There\'s the get_current_weather function, which requires a location and optionally a unit. The user didn\'t specify the unit, so I\'ll default to celsius or fahrenheit? The function\'s parameters have an enum for unit, but since it\'s not required, maybe I should just proceed without it. Wait, the required field is only location. So I can call get_current_weather with location as "San Francisco, CA". The user mentioned "this afternoon", but the function is for current weather. Hmm, but maybe the current weather can give an idea of what it\'s like now, which might be similar to the afternoon. Alternatively, if the function can provide a forecast, but looking at the description, it says "current weather". So the function might not provide a forecast. But the user is asking for this afternoon. However, sinc

## Troubleshooting

- Ensure `NVIDIA_API_KEY` is set and has access to the Nemotron NIM deployment; the API returns HTTP 401 if the key is missing or invalid.
- When running from a remote environment, verify that outbound HTTPS traffic to `integrate.api.nvidia.com` is allowed.
- If the model replies directly without calling a tool, adjust the system prompt or send a follow-up user message requesting structured data to encourage tool use.


## Conclusion

This notebook demonstrated a simple, complete tool-calling workflow using the `Nemotron-Super-49B-v1.5` model served through NVIDIA NIM. By defining tools and handling the model's tool-use requests, you can build powerful applications that connect LLMs to external data and APIs. From here, you can expand on this example by adding more complex tools, implementing more robust error handling, or integrating real-world APIs.

