<a href="https://colab.research.google.com/github/StrategicalIT/PipedPiperAI/blob/main/Lab04.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LAB4: Using an embedding model from NIM API
In this lab we are going to leverage Nvdia's NIM API to generate embeddings. There are multiple embedding models available. You can check them out at [https://build.nvidia.com/explore/retrieval](https://build.nvidia.com/explore/retrieval).

You can interact with the models online from your web browser but we will do so programatically using Python.  You can also use the API reference site to get more information about each embedding model as well as the actual endpoints exposed by the REST API . It can be accessed at [https://docs.api.nvidia.com/nim/reference/retrieval-apis](https://docs.api.nvidia.com/nim/reference/retrieval-apis). You will find the models by scrolling down to the "Retrieval" section

### Install dependencies

The first step is to install the necessary libraries. In this case we will install the openai Python library. This is considered the de-facto industry standard and most providers including Nvidia NIM use it

In [None]:
!pip install openai

We are also going to need NumPy to help us compute the cosine similarity a bit later

In [None]:
!pip install numpy

Let's import a few things from openai and numpy

In [None]:
from openai import OpenAI
import numpy as np

## Connect to the model

Next we read the NIM API key from the environment and store it in a variable called "apikey" for future use. You can uncomment the "print" command if you want to validate that it has been read correctly

In [None]:
#import os
#apikey = os.environ["NVIDIA_API_KEY"]
#change from OS variable import to using Google Colab secret
from google.colab import userdata
apikey = userdata.get('apikey')
#print(apikey)

Let's create a client instance. This client will be able to access all models. No need for a separate client connection for each model. Notice how were we are specifying the API key. Put your own API key

In [None]:
client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = apikey
)

We can now use the client connection to compute the embedding for any input string

In [None]:
response = client.embeddings.create(
    input=["I like good weather"],
    model="nvidia/nv-embedqa-e5-v5",
    encoding_format="float",
    extra_body={"input_type": "query", "truncate": "NONE"}
)

Notice how we using the "model" parameter to request embeddings from a specific model. Model "nv-embedqa-e5-v5" is part of Nvidia's NeMo Retriever.

Let's find out how many dimensions this model is using and examine show the embedding of our input string. For cleanness we will show only the first 6 dimensions.

In [None]:
print("The vector size of this embedding model is :", len(response.data[0].embedding))
print(response.data[0].embedding[:6])

## Calculate similarity between vectors

We can use the "similarity" method to compare a query sentence to all the embeddings in our corpus.

We will see in a later lesson how similarities are typically done by the vector database but for now we can define our own cosine similarity using NumPy functions

In [None]:
def cosine(u, v):
    return np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))

We can now use a second sentence to obtain a vector

In [None]:
query = client.embeddings.create(
    input=["Today is going to be sunny"],
    model="nvidia/nv-embedqa-e5-v5",
    encoding_format="float",
    extra_body={"input_type": "query", "truncate": "NONE"}
)

Let's calculate the similarity between both embeddings

In [None]:
sim = cosine(query.data[0].embedding, response.data[0].embedding)
print("The similarity is : ", sim)

The closer the similarity score is to 1 the closer semantically the query is to that sentence. Does the result make sense?

You can experiment with other embedding models

## Try with a different embedding model
An important role of the Data Scientist is to find the components of the solution that produce the best results for a given use case. For example, in order to obtain more accurate results from a similarity search they might evaluate multiple embedding models. Let's run the same two sentences through a different embedding model and compare the results

In [None]:
response = client.embeddings.create(
    input=["I like good weather"],
    model="nvidia/nv-embed-v1",
    encoding_format="float",
    extra_body={"input_type": "query", "truncate": "NONE"}
)

query = client.embeddings.create(
    input=["Today is going to be sunny"],
    model="nvidia/nv-embed-v1",
    encoding_format="float",
    extra_body={"input_type": "query", "truncate": "NONE"}
)

sim = cosine(query.data[0].embedding, response.data[0].embedding)
print("The similarity is : ", sim)

In your opinion, what model produced better results?

Feel free to do further testing by using other sentences

### End of Lab 4