[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/encoders/nvidia_nim-encoder.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/encoders/nvidia_nim-encoder.ipynb)

# Using Nvidia NIM Models

The NIM models (meta/llama3-70b-instruct) can be used with our LiteLLMEncoder and usage is primarily the same as with other embedding models. The model produces high-quality embeddings for semantic search, classification, and other text similarity tasks.

## Getting Started

We start by installing semantic-router. Support for the new `nim` encoder was added in `semantic-router==0.1.8`.

In [None]:
!pip install -qU semantic-router==0.1.8

We start by defining a dictionary mapping routes to example phrases that should trigger those routes.

In [None]:
from semantic_router import Route

politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president",
        "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
)

Let's define another for good measure:

In [2]:
chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
)

routes = [politics, chitchat]

Now we initialize our embedding model, we will use the `nv-embedqa-e5-v5` model.

In [3]:
import os
from getpass import getpass
from semantic_router.encoders import NimEncoder

os.environ["NVIDIA_NIM_API_KEY"] = os.getenv("NVIDIA_NIM_API_KEY") or getpass(
    "Enter Nvidia NIM API Key: "
)

encoder = NimEncoder(
    name="nvidia_nim/nvidia/nv-embedqa-e5-v5",
    score_threshold=0.4,
    api_key=os.environ["NVIDIA_NIM_API_KEY"],
)

Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`.

In [4]:
from semantic_router.routers import SemanticRouter

rl = SemanticRouter(encoder=encoder, routes=routes, auto_sync="local")

2025-04-09 15:30:29 - httpx - INFO - _client.py:1013 - _send_single_request() - HTTP Request: POST https://integrate.api.nvidia.com/v1/embeddings "HTTP/1.1 200 OK"
[92m15:30:29 - LiteLLM:INFO[0m: utils.py:1177 - Wrapper: Completed Call, calling success_handler
2025-04-09 15:30:29 - LiteLLM - INFO - utils.py:1177 - wrapper() - Wrapper: Completed Call, calling success_handler
[92m15:30:29 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:29 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:29 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:29 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:29 - LiteLLM:INF

In [5]:
result = encoder(
    [
        "I'm interested in learning about llama 2",
        "Don't you love politics?",
        "How's the weather today?",
        "I love the politics",
    ]
)

2025-04-09 15:30:32 - httpx - INFO - _client.py:1013 - _send_single_request() - HTTP Request: POST https://integrate.api.nvidia.com/v1/embeddings "HTTP/1.1 200 OK"
[92m15:30:32 - LiteLLM:INFO[0m: utils.py:1177 - Wrapper: Completed Call, calling success_handler
2025-04-09 15:30:32 - LiteLLM - INFO - utils.py:1177 - wrapper() - Wrapper: Completed Call, calling success_handler
[92m15:30:32 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:32 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:32 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:32 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5


[92m15:30:32 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:32 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:32 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:32 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
[92m15:30:37 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:37 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:37 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
2025-0

We can check the dimensionality of our vectors by looking at the `index` attribute of the `RouteLayer`.

In [6]:
rl.index.dimensions

1024

We do have 256-dimensional vectors. Now let's test them:

In [7]:
rl("don't you love politics?")

2025-04-09 15:30:37 - httpx - INFO - _client.py:1013 - _send_single_request() - HTTP Request: POST https://integrate.api.nvidia.com/v1/embeddings "HTTP/1.1 200 OK"
[92m15:30:37 - LiteLLM:INFO[0m: utils.py:1177 - Wrapper: Completed Call, calling success_handler
2025-04-09 15:30:37 - LiteLLM - INFO - utils.py:1177 - wrapper() - Wrapper: Completed Call, calling success_handler
[92m15:30:37 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:37 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:37 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:37 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5


RouteChoice(name='politics', function_call=None, similarity_score=None)

In [8]:
rl("how's the weather today?")

2025-04-09 15:30:39 - httpx - INFO - _client.py:1013 - _send_single_request() - HTTP Request: POST https://integrate.api.nvidia.com/v1/embeddings "HTTP/1.1 200 OK"
[92m15:30:39 - LiteLLM:INFO[0m: utils.py:1177 - Wrapper: Completed Call, calling success_handler
2025-04-09 15:30:39 - LiteLLM - INFO - utils.py:1177 - wrapper() - Wrapper: Completed Call, calling success_handler
[92m15:30:39 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:39 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:39 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:39 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5


RouteChoice(name='chitchat', function_call=None, similarity_score=None)

Both are classified accurately, what if we send a query that is unrelated to our existing `Route` objects?

In [9]:
rl("I'm interested in learning about llama 2")

2025-04-09 15:30:41 - httpx - INFO - _client.py:1013 - _send_single_request() - HTTP Request: POST https://integrate.api.nvidia.com/v1/embeddings "HTTP/1.1 200 OK"
[92m15:30:41 - LiteLLM:INFO[0m: utils.py:1177 - Wrapper: Completed Call, calling success_handler
2025-04-09 15:30:41 - LiteLLM - INFO - utils.py:1177 - wrapper() - Wrapper: Completed Call, calling success_handler
[92m15:30:41 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:41 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia_nim/nvidia/nv-embedqa-e5-v5
[92m15:30:41 - LiteLLM:INFO[0m: cost_calculator.py:622 - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5
2025-04-09 15:30:41 - LiteLLM - INFO - cost_calculator.py:622 - completion_cost() - selected model name for cost calculation: nvidia/nv-embedqa-e5-v5


RouteChoice(name='politics', function_call=None, similarity_score=None)

In this case, we return `None` because no matches were identified. We always recommend optimizing your `RouteLayer` for optimal performance, you can see how in [this notebook](https://github.com/aurelio-labs/semantic-router/blob/main/docs/06-threshold-optimization.ipynb).

---