# Nvidia NIM embeddings

Connect to NVIDIA's NIM embedding service using the NVIDIA Embedding class.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install llama-index-embeddings-nvidia

In [None]:
!pip install llama-index

Start an NVIDIA NIM embedding microservice

In [None]:
# download the model you want to deploy from NVIDIA NGC
$ ngc registry model download-version "ohlfw0olaadg/ea-participants/nv-embed-qa:4"

In [None]:
# pull and tag the NIM embeddings container
$ docker pull nvcr.io/ohlfw0olaadg/ea-participants/embedding-ms:cb2bba4eaf58b622acd3ac856dbf03d284cd770d
$ docker tag nvcr.io/ohlfw0olaadg/ea-participants/embedding-ms:cb2bba4eaf58b622acd3ac856dbf03d284cd770d embedding-ms:0.0.1

In [None]:
# run the container
$ docker run -it -p 12345:12345 --gpus='"device=0"' -v path/to/model.nemo:/path/to/model.nemo --name embedding-ms embedding-ms:0.0.1 /bin/bash

In [None]:
# update the model config at the following path with the path to your model
# app/model_config_templates/NV-EMBED-QA_template.yaml
# and build the triton model store
$ model_repo_generator /app/model_config_templates/NV-Embed-QA_template.yaml

In [None]:
# start the triton server and the API server
$ /app/bin/web -m /model-store -p 12345 -n 1

Connect to the embedding microservice with LlamaIndex

In [None]:
# imports
from llama_index.embeddings.nvidia import NVIDIAEmbedding

In [None]:
# set parameters
batch_size = 16
model_name = "NV-Embed-QA"
api_endpoint_url = "http://localhost:12345/v1/embeddings"

In [None]:
embedding_model = NVIDIAEmbedding(
    batch_size=batch_size,
    model_name=model_name,
    api_endpoint_url=api_endpoint_url,
)

In [None]:
# get embedding for a query
embedding_model.get_query_embedding("Hello world")

In [None]:
# get embeddings for multiple passages in batches
embedding_model.get_text_embedding_batch(["Hello", "World"])