# Using Embeddings Models hosted on NVIDIA API Catalog

This is a guide on using embedding models hosted on the NVIDIA API Catalog for the retrieval step in retrieval-augmented generation. It starts with the [ABC Bot configuration](../../../../examples/bots/abc) and modifies it to use the NVIDIA `nvidia/nv-embed-v1` model to retrieve embeddings.

## Prerequisites

1. Install the [langchain-nvidia-ai-endpoints](https://github.com/langchain-ai/langchain-nvidia/tree/main/libs/ai-endpoints) package:

In [1]:
!pip install -U --quiet langchain-nvidia-ai-endpoints


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


2. An NVIDIA NGC account to access AI Foundation Models. To create a free account go to [NVIDIA NGC website](https://ngc.nvidia.com/).

3. An API key from NVIDIA AI Catalog:
    -  Generate an API key by navigating to the AI Foundation Models section on the NVIDIA NGC website, selecting a model with an API endpoint, and generating an API key.
    -  Export the NVIDIA API key as an environment variable:

In [2]:
!export NVIDIA_API_KEY=$NVIDIA_API_KEY # Replace with your own key

4. If you're running this inside a notebook, patch the AsyncIO loop.

In [3]:
import nest_asyncio

nest_asyncio.apply()

## Configuration

To get started, copy the ABC bot configuration into a subdirectory called `config`:

In [4]:
!cp -r ../../../../examples/bots/abc config

Update the `models` section of the `config.yml` as follows. Here we update the model used for generation (with `type: main` to `meta/llama3-70b-instruct`) and 

```yaml
...
models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama3-70b-instruct
  - type: embeddings
    engine: nvidia_ai_endpoints
    model: nvidia/nv-embed-v1
...
```

In [5]:
# Hide from documentation page.
with open("config/config.yml") as f:
  content = f.read()

content = content.replace("""
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct""",
"""
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama3-70b-instruct
  - type: embeddings
    engine: nvidia_ai_endpoints
    model: nvidia/nv-embed-v1
""")

with open("config/config.yml", "w") as f:
  f.write(content)

## Usage 

Load the guardrails configuration:

In [6]:
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

Test response generation:

In [9]:
response = rails.generate(
    messages=[{"role": "user", "content": "How many personal days off do I have?"}]
)
print(response["content"])

In addition to vacation and sick leave, employees also have two personal days per year. Please refer to the employee handbook for more information.


In [10]:
print(rails.explain_info.llm_calls[3].prompt)

"""
Below is a conversation between a user and a bot called the ABC Bot.
The bot is designed to answer employee questions about the ABC Company.
The bot is knowledgeable about the employee handbook and company policies.
If the bot does not know the answer to a question, it truthfully says it does not know.

"""

# This is how a conversation between a user and the bot can go:
User message: "Hi there. Can you help me with some questions I have about the company?"
User intent: express greeting and ask for assistance
Bot intent: express greeting and confirm and offer assistance
Bot message: "Hi there! I'm here to help answer any questions you may have about the ABC Company. What would you like to know?"
User message: "What's the company policy on paid time off?"
User intent: ask question about benefits
Bot intent: respond to question about benefits
Bot message: "The ABC Company provides eligible employees with up to two weeks of paid vacation time per year, as well as five paid sick days p