[![ Click here to deploy.](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/launchable/deploy?launchableID=env-32qPA9uwn9WF7aFkmtZrPeltQJL) <a href="https://colab.research.google.com/github/anniesurla/GenerativeAIExamples/blob/dev%2Fasurla-qwen-next/Building_a_Simple_AI_Agent_with_Qwen3_Next_powered_by_NVIDIA_NIM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![image](https://imgur.com/FfbspFL.png)
# Building an AI Agent with Qwen3 Next Powered by NVIDIA NIM

In the following notebook, we'll take a look at a few things:

1. How to use the NVIDIA-hosted NIM API to run inference on the model through the [ChatCompletions API](https://platform.openai.com/docs/api-reference/chat)
2. How to build a Simple Notes Taking Agent powered by the `qwen3-next` NIM using the [OpenAI Agents SDK](https://github.com/openai/openai-agents-python/tree/main).

Let's get right into it!

## Getting an API Key from build.nvidia.com

In order to use the NVIDIA-hosted NIM API - we'll need to have an API key from build.nvidia.com, luckily this is very straightforward.

First, let's navigate to the model on [build.nvidia.com](https://build.nvidia.com/qwen/qwen3-next-80b-a3b-thinking)!

> NOTE: You'll need to ensure you're logged in before moving to the next steps!

Once there, you can click on the green button that says: `View Code`.

![image](https://i.imgur.com/mSGQPfC.png)

A new modal should appear on your screen, where you can click the `Generate API Key` text to obtain your API key!

![image](https://imgur.com/sDlK4eO.png)

Once you have that API key, you're good to move on to the next step which will capture it as an environment variable.

In [2]:
import os
import getpass

os.environ["NVIDIA_API_KEY"] = getpass.getpass("Enter your NVIDIA API key (found at build.nvidia.com)")

## Using the OpenAI Library with the NVIDIA-hosted NIM API

We will be using the Python [OpenAI SDK](https://github.com/openai/openai-python) to access the `qwen3-next` model on build.nvidia.com.

Let's start with a classic `pip install`.

In [3]:
! pip install -qU openai

Once we've installed the `openai` library - we can use it to create an OpenAI client, which we'll point at the NVIDIA-hosted NIM API endpoint.

We'll also be sure to provide the API key we entered above by referencing the environment variable.

In [4]:
from openai import OpenAI

client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = os.environ["NVIDIA_API_KEY"]
)

This latest model from Qwen is best suitable for agentic tasks, where multi-step reasoning over long contexts is crucial.

Let's get acquainted with how you can interact with this model at API level.

We will use a math question from AIME25, a dataset designed to test reasoning model's Mathemaric capabilities.

The example is as follows.

In [5]:
math_prompt = """
The 9 members of a baseball team went to an ice cream parlor after their game. Each player had a singlescoop cone of chocolate, vanilla, or strawberry ice cream. At least one player chose each flavor, and the number of players who chose chocolate was greater than the number of players who chose vanilla, which was greater than the number of players who chose strawberry. Let $N$ be the number of different assignments of flavors to players that meet these conditions. Find the remainder when $N$ is divided by 1000.

Please provide your answer in boxed format.
"""

response = client.chat.completions.create(
  model="qwen/qwen3-next-80b-a3b-thinking",
  messages=[{"role":"user","content":math_prompt}],
  top_p=0.7,
  temperature=0.6,
  stream=True
)

for chunk in response:
  reasoning = getattr(chunk.choices[0].delta, "reasoning_content", None)
  if reasoning:
    print(reasoning, end="")
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

This is a complex or challenging question, and it is difficult to provide a direct and correct answer. I need to think about it.
Well, so we have 9 baseball players, each getting a single-scoop ice cream: chocolate (C), vanilla (V), or strawberry (S). Conditions: at least one of each flavor, so no flavor is zero. Also, number of chocolate > vanilla > strawberry. Let's denote the counts as c, v, s for chocolate, vanilla, strawberry respectively. So we have c + v + s = 9, with c > v > s ≥ 1 (since at least one each, so s ≥ 1, then v > s implies v ≥ 2, c > v implies c ≥ 3, but maybe better to just write inequalities strictly).

First step: find all possible triples (c, v, s) of positive integers satisfying c > v > s and c + v + s = 9. Then for each such triple, compute the number of assignments, which is the multinomial coefficient: 9! / (c! v! s!), since we're assigning flavors to distinct players (wait, yes, the problem says "different assignments of flavors to players", so each player 

## Building a Simple Note Taking Agent Powered by the `Qwen3 Next` NIM.

Next, we'll look at a very simple example of how we can build Agents leveraging the NVIDIA-hosted NIM API powered by NVIDIA NIM.

In order to get started, we need to grab `openai-agents` SDK

> NOTE: Instructions on how you can download and run the NIM are available [here](https://build.nvidia.com/qwen/qwen3-next-80b-a3b-thinking/deploy)!

In [6]:
!pip install -qU "openai-agents[litellm]"

Next, we can create our `AsyncOpenAI` client through the same process we used for our `OpenAI` client.

In [7]:
from openai import AsyncOpenAI

client = AsyncOpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = os.environ["NVIDIA_API_KEY"]
)

To streamline note-taking, we will create two function tools:
- Write Notes: Takes filename and text input, and writes it into a .txt file in plain text format.
- Display Notes: This function displays the stored content, making it easy to review notes at any time.

In [8]:
from agents import function_tool
import os

@function_tool
async def display_file(filename: str) -> str:
    """Read and return the contents of a file."""
    if not os.path.exists(filename):
        return f"File '{filename}' not found."
    print(f"[INFO] Reading file: {filename}")
    with open(filename, "r", encoding="utf-8") as f:
        return f.read()

@function_tool
def write_file(filename: str, content: str) -> str:
    """Write content to a file (append)."""
    print(f"[INFO] Writing to file: {filename}")
    with open(filename, "a", encoding="utf-8") as f:
        f.write(content + '\n')
    return f"Content written to '{filename}'."


Next up, we'll create the Agent itself. For more details, you can explore the OpenAI Agents SDK [here](https://github.com/openai/openai-agents-python/tree/main). In this notebook, we'll focus on integrating with the NVIDIA-hosted NIM API.

The key component for smooth integration is
```
OpenAIChatCompletionsModel(model="<model_name>", ...)
```

Since NVIDIA NIM exposes the ChatCompletions API for the model - you can seamlessly integrate these models whether locally or through the NVIDIA-hosted NIM API.

In [9]:
from agents import Agent, OpenAIChatCompletionsModel, ModelSettings

agent = Agent(
        name="Notes Assistant",
        instructions="You're a helpful assistant. You take notes and save them to notes.txt. You can also read from notes.txt.",
        model=OpenAIChatCompletionsModel(model="qwen/qwen3-next-80b-a3b-thinking", openai_client=client),
        model_settings=ModelSettings(temperature=0.6),
        tools=[display_file, write_file],
)

Now that we have our Agent, let's go ahead and *disable* automated tracing - since we're not providing our OpenAI key, and are only communicating with the NVIDIA-hosted NIM API powered by NIM on build.nvidia.com.


In [10]:
from agents import set_tracing_disabled
set_tracing_disabled(disabled=True)

Finally, we can run our agent. Let's ask it to jot down something, and then ask it to display.

In [11]:
from agents import Runner

result = await Runner.run(agent, "I am excited to use Qwen3 Next with NVIDIA NIM, can you take note of this?")

[INFO] Writing to file: notes.txt


In [12]:
result = await Runner.run(agent, "Can you read me what i have in my notes?")
print(result.final_output)

[INFO] Reading file: notes.txt
Your notes currently contain:  
"I am excited to use Qwen3 Next with NVIDIA NIM"


Now that you've built a simple Agent using NVIDIA NIM - head over the [model page](https://build.nvidia.com/openai/gpt-oss-20b) and try it out, or follow the [deployment instructions](https://build.nvidia.com/openai/gpt-oss-20b/deploy) to start building locally!