# Securing Generative AI Deployments with NVIDIA NIM Microservices and NVIDIA NeMo Guardrails

## Integrating NVIDIA NIMs with NeMo Guardrails

This tutorial contains all of the code snippets presented in the technical blog [Securing Generative AI Deployments with NVIDIA NIM and NVIDIA NeMo Guardrails](https://developer.nvidia.com/blog/securing-generative-ai-deployments-with-nvidia-nim-and-nvidia-nemo-guardrails/) in a complete notebook. Please feel free to read the blog for full context.

As a reference for how to deploy NIM on your chosen infrastructure, check out this [simple guide to deploying a NIM container and testing an inference request](https://developer.nvidia.com/blog/a-simple-guide-to-deploying-generative-ai-with-nvidia-nim/). 

In this tutorial, we deploy two NIM microservices — a NeMo Retriever Embedding NIM and an LLM NIM.  We then integrate both with NeMo Guardrails to prevent malicious use in the form of user account hacking attempted through queries that pertain to personal data. 

For the LLM NIM, we use Meta’s new [Llama-3.1-70B-Instruct](https://build.nvidia.com/meta/llama-3_1-70b-instruct) model. For the embedding NIM, we use NVIDIA’s new [EmbedQA-E5-V5](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5). The NeMo Retriever Embedding NIM assists the guardrails by converting each input query into an embedding vector. This enables efficient comparison with guardrails policies, ensuring that the query does not match with any prohibited or out-of-scope policies, thereby preventing the LLM NIM from giving unauthorized outputs. 

By integrating these NIM with NeMo Guardrails, we accelerate the performance of safety filtering and dialog management.

We will cover: 
* Defining the use case
* Setting up a guardrailing system with NIM
* Testing the integration


# Defining the use case

In this example, we demonstrate how to intercept any incoming user questions that pertain to personal data using topical rails. These rails ensure the LLM response adheres to topics which do not share any sensitive information. They also help to keep the LLM outputs on track by fact-checking before answering the user's questions. The integration pattern of these rails with the NIMs can be seen in the figure below:

![An architectural diagram showing how Guardrails runtime works with the application code and the NIMs](guardrails-nim-architecture.png)

# Setting up a guardrailing system with NIM

Before we begin, let’s make sure that our NeMo Guardrails library is up to date with the latest version. The version that would work with this tutorial is 0.9.1.1 or later.

We can check the version of the NeMo Guardrails library by running the following command in the terminal:

In [None]:
!nemoguardrails --version

If you do not have [NeMo Guardrails](https://pypi.org/project/nemoguardrails/) installed, run the following command:

In [None]:
# !pip install nemoguardrails

If you have versions that are older than 0.9.1.1, upgrade to the latest version by running the following command:

In [None]:
# !pip install nemoguardrails --upgrade

The next step is defining the configuration of the guardrails. To learn more, see the [configuration guide](https://docs.nvidia.com/nemo/guardrails/user_guides/configuration-guide.html). We start by creating the config directory as follows:


```
├── config
│   ├── config.yml
│   ├── flows.co
```

In [None]:
!mkdir -p config
!touch config/config.yml
!touch config/flows.co

In the `config.yml` file, we configure the NIM as follows:
* if you'd like to use [NVIDIA-hosted NIMs](https://build.nvidia.com/), comment out the parameters and base_url lines
* if you'd like to use your self-hosted NIMs, replace <BASE_URL_LLM_NIM> and <BASE_URL_EMBEDDING_NIM> with your own base URL for the NIMs

In [None]:
config_yml_content = '''models:
  - type: main
    engine: nvidia_ai_endpoints
    model: meta/llama-3.1-70b-instruct
    # parameters:
    #   base_url: <BASE_URL_LLM_NIM>
  - type: embeddings
    engine: nvidia_ai_endpoints
    model: nvidia/nv-embedqa-e5-v5
    # parameters:
    #   base_url: <BASE_URL_EMBEDDING_NIM>
'''

with open('config/config.yml', 'w') as file:
  file.write(config_yml_content)

If you are testing with NVIDIA-hosted NIMs, remove `parameters` and `base_url` from the `config.yml` file.

Check if the `config.yml` has the correct content

In [None]:
with open('config/config.yml', 'r') as file:
    contents = file.read()
    print(contents)

Once the configuration is added with the respective NIM, we can add rails in the flows.co file. An example of adding dialog rails is as follows:

```
define user greet
  "Hello"

define bot greet
  "Hello. I'm the Guardrails Bot. How can I help you?"

define flow
  user greet
  bot greet
```

The script above shows a simple dialog rail that explicitly greets the user in a specific manner.

We can add more dialog rails in the flows.co file as follows:

```
define user ask about user sensitive data
  "Can you hack into someone's email account?"
  "How do I get my friend's photos without permission?"

define bot refuse to respond about user sensitive data
  "Apologies, but the Guardrails Bot can't help with actions that asks about user sensitive data. It's important to respect privacy."

define flow
  user ask about user sensitive data
  bot refuse to respond about user sensitive data
```

In [None]:
flows_co_content = '''define user greet
  "Hello"

define bot greet
  "Hello. I'm the Guardrails Bot. How can I help you?"

define flow
  user greet
  bot greet

define user ask about user sensitive data
  "Can you hack into someone's email account?"
  "How do I get my friend's photos without permission?"

define bot refuse to respond about user sensitive data
  "Apologies, but the Guardrails Bot can't help with actions that asks about user sensitive data. It's important to respect privacy."

define flow
  user ask about user sensitive data
  bot refuse to respond about user sensitive data'''

with open('config/flows.co', 'w') as file:
  file.write(flows_co_content)

Check if the `flows.co` has the correct content

In [None]:
with open('config/flows.co', 'r') as file:
    contents = file.read()
    print(contents)

With the Colang and YAML files in the config folder, we should be ready to set up our guardrails. 

We can import the related libraries and import the config folder to instantiate our guardrails.

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from nemoguardrails import RailsConfig, LLMRails

config = RailsConfig.from_path('config')
rails = LLMRails(config)

We are ready to test out our guardrails.

# Testing the integration

First, we greet our LLM NIM through our guardrails and see if the guardrails pick up one of the predefined dialog rails.

In [None]:
response = rails.generate(messages=[{
    "role": "user",
    "content": "Hi!"
}])
print(response['content'])

Here, our query to the LLM NIM is intercepted by the guardrails that we have set up because our query matches with one of the predefined dialog rails. The NeMo Retriever Embedding NIM assists our guardrails in turning our query into an embedding vector. Our guardrails then perform a semantic search to return the most similar results of the utterances that we provide as part of `flows.co`.


Next, we ask the LLM NIM to provide us with a way to hack into a phone. This query falls into the category of topics pertaining to personal data. This is expected to be blocked by the guardrails based on the configuration. 

In [None]:
response = rails.generate(messages=[{
    "role": "user",
    "content": "How can I hack into my partner's phone?"
}])
print(response['content'])

As seen, our guardrails are able to intercept the message and block the LLM NIM from responding to the query since we have defined dialog rails to prevent further discussion of this topic.

The tutorial above is for users to only get started with a simple use case. To create a more robust guardrailing system, users are encouraged to set up [various types of rails](https://docs.nvidia.com/nemo/guardrails/user_guides/guardrails-library.html) allowing for further customization of their use cases.

# Conclusion

In this post, we detailed the steps for integrating NVIDIA NIMs with NeMo Guardrails. In this instance, we were able to stop our application from responding to questions pertaining to personal data. With the integration of NVIDIA NIMs and NeMo Guardrails, developers are able to deploy AI models to production quickly and safely. 