In [15]:
!pip install dotenv

Collecting dotenv
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting python-dotenv (from dotenv)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading dotenv-0.9.9-py2.py3-none-any.whl (1.9 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, dotenv
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [dotenv]
[1A[2KSuccessfully installed dotenv-0.9.9 python-dotenv-1.1.1


In [1]:
import requests
import json
import os
import dotenv
from utils import get_price
dotenv.load_dotenv()


True

In [2]:
headers = {
    "Authorization": f"Bearer {os.getenv('OPENROUTER_API')}",
    "Content-Type": "application/json"
}
chat_url = "https://openrouter.ai/api/v1/chat/completions"
models_url = "https://openrouter.ai/api/v1/models"

**Get Available Models**


In [3]:
response = requests.get(models_url)

with open("models_response.json", "w") as f:
    json.dump(response.json(), f, indent=2)

    models_data = response.json()
    nemotron_info = None
    for model in models_data.get("data", []):
        if model.get("id") == "nvidia/nemotron-nano-9b-v2":
            nemotron_info = model
            print("NVIDIA Nemotron Nano 9B V2 model info:")
            print(json.dumps(nemotron_info.get("id"), indent=2))
            print(json.dumps(nemotron_info.get("description"), indent=2))
            print(json.dumps(nemotron_info.get("architecture"), indent=2))
            print(json.dumps(nemotron_info.get("pricing"), indent=2))
            print(json.dumps(nemotron_info.get("top_provider"), indent=2))
            break


NVIDIA Nemotron Nano 9B V2 model info:
"nvidia/nemotron-nano-9b-v2"
"NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. \n\nThe model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so."
{
  "modality": "text->text",
  "input_modalities": [
    "text"
  ],
  "output_modalities": [
    "text"
  ],
  "tokenizer": "Other",
  "instruct_type": null
}
{
  "prompt": "0",
  "completion": "0",
  "request": "0",
  "image": "0",
  "web_search": "0",
  "internal_reasoning": "0"
}
{
  "context_length": 128000,
  "max_completion_tokens": null,
  "is_moderated": false
}


**Minimal Implementation of Completions API**

In [4]:
response = requests.post(chat_url,headers=headers, json={
    "model": "nvidia/nemotron-nano-9b-v2",
    "messages": [
        {
            "role": "user",
            "content": "What is the meaning of life? Answer in one sentence."
        }
    ]
})
print(response.json().get("choices")[0].get("message").get("content"))


The meaning of life is to find purpose, connection, and fulfillment through personal growth, relationships, and contributing to something greater than oneself.



**Taking Advantage of Openrouter's Routing**: 


With OpenRouter, we can optimize for multiple objectives in routing to a model and provider, such as cost, latency, etc.

We can also specify backup models to control for model downtime.


In [5]:

#find best (cheapest) provider for fixed model
response = requests.post(chat_url, headers=headers, json={
    #
    'models': ['meta-llama/llama-3.1-70b-instruct', 'nvidia/nemotron-nano-9b-v2'],
    'messages': [
      {
        'role': 'user',
        'content': 'What is the meaning of life? Answer in 5 words.'
      }
    ],

    #tell openrouter to sort by price, other options are "throughput" and "latency"
    #also tell openrouter to deny data collection
    'provider': {
      'sort': 'price',
      'data_collection': 'deny'
    }
})


In [6]:
print(f"Routed to: {response.json().get('provider')}")
print(f"Model: {response.json().get('model')}")
print(f"Response: {response.json().get('choices')[0].get('message').get('content')}")
print("Price:", get_price(response))


Routed to: DeepInfra
Model: meta-llama/llama-3.1-70b-instruct
Response: To find your own purpose.
Price: $0.00000398
