#  Fortified RAG Pipeline: Enhancing AI with NVIDIA NIMs, NeMo Guardrails, LlamaGuard, and AlignScore

In this tutorial we learn how to integrate LlamaGuard, a safeguard LLM with [NeMo Guardrails](https://docs.nvidia.com/nemo/guardrails/index.html) into our custom [NVIDIA NIM](https://www.nvidia.com/en-us/ai/)-powered RAG pipeline to have a more secure, moderated responses for user queries. Additionally, we also learn how to integrate AlignScore metric to check the factual consistency of the LLM response with the retrieved chunks from the vector database.

The notebook gives a simple walkthrough of downloading the dataset (here, as an example we have the NVIDIA AI Enterprise User guide as our data), loading it our RAG pipeline. Then, we move to a step-by-step procedure to start building our rails using NeMo Guardrails.


## AI Workflow

The architecture diagram for the NIM-powered RAG pipeline that we are going to build in this tutorial using NVIDIA NIM and NeMO Guardrails can be found in the `notebooks/llamaguard_alignscore_nim` directory

## Prerequisites

You can install the environment if needed from the `requirements.txt`

### Installation

#### Using pip

In [36]:
!pip install -q -r requirements.txt

^C
[31mERROR: Operation cancelled by user[0m[31m
[0m

#### AsyncIO
Since you’re running this inside a notebook, patch the AsyncIO loop.

In [27]:
import nest_asyncio
nest_asyncio.apply()

## Setting up the RAG pipeline
### Adding a new custom dataset

You can add your own pdf into the knowledge base located at - `llamaguard_alignscore_nim/kb` or any other location of your choice. In this example I have the pdf for NVIDIA AI Enterprise user guide stored in the above location. In the following cells, specify the file path as needed. 

### Load the dataset

In [28]:
!pwd

/root/verb-workspace/examples/notebooks/llamaguard_alignscore_nim


In [29]:
from langchain_community.document_loaders import PyPDFLoader

# Add path to the data downloaded
file_path = "kb/data.pdf"

loader = PyPDFLoader(file_path)
document = loader.load()

### Setup the Retriever
#### Add the environment variables as follows

In [30]:
import os
os.environ["NVIDIA_API_KEY"]="..."
os.environ["OPENAI_API_KEY"]="dummy"

In [31]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=10
)
texts = text_splitter.split_documents(document)

# Insert the Documents in FAISS Vectorstore
embeddings = NVIDIAEmbeddings(
    model="nvidia/nv-embedqa-e5-v5",
    truncate="NONE",
)
db = FAISS.from_documents(texts, embeddings)

retriever = db.as_retriever()

### Setup the LLM

In [32]:
llm = ChatNVIDIA(
    model="meta/llama-3.1-70b-instruct",
    temperature=0.2,
    top_p=0.7,
    max_tokens=1024,
)

prompt_template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.

{context}

Question: {question}

Helpful Answer:"""

prompt = ChatPromptTemplate.from_template(prompt_template)

## Define Guardrails Configuration

We first create a folder called config for the configuration

In [33]:
%mkdir config

### 1. Create a `config.yml` file
Here default we have the colang 1.0 version, unless specified. We start with adding the LLM NIM in the config.yml as follows and some sample conversation

In [34]:
%%writefile config/config.yml
instructions:
      - type: general
        content: |
          Below is a conversation between a user and a bot called the NVIDIA AI Bot.
          The bot is designed to answer questions about the NVIDIA AI Enterprise.
          The bot is knowledgeable about the company policies.
          If the bot does not know the answer to a question, it truthfully says it does not know.

sample_conversation: |
    user action: user said "Hi there. Can you help me with some questions I have about NVIDIA AI Enterprise?"
    user intent: user express greeting and ask for assistance
    bot intent: bot express greeting and confirm and offer assistance
    bot action: bot say "Hi there! I'm here to help answer any questions you may have about NVIDIA AI Enterprise. What would you like to know?"
    user action: user said "What is offered by NVIDIA AI Enterprise?"
    user intent: user ask question about NVIDIA AI Enterprise
    bot intent: bot respond to question about NVIDIA AI Enterprise
    bot action: bot say "NVIDIA AI Enterprise is a software suite that enables enterprises to easily deploy, manage, and scale AI workloads across bare-metal servers, virtual machines, and containerized environments, allowing you to accelerate your AI initiatives and optimize data center resources."

models:
    - type: main
      engine: nim
      model: meta/llama-3.1-70b-instruct

Writing config/config.yml


To understand the config more thoroughly, the attributes can be defined as follows:

- `type` - set to `main` indicating the main LLM model. Here we have an Llama 3.1-70b-instruct NIM as our LLM. </br>
- `engine`- the LLM provider, e.g., openai, nvidia_ai_endpoints, self_hosted etc. </br>
- `model` - has the name of the model, e.g., gpt-3.5-turbo-instruct.

The sample conversation entails the bot's intent for a user action, making sure the bot response is in accordance to the given policies

### 2. Basic Flows in `general.co`

Then, we can add some basic flows on how the bot responds to user's greetings. We begin with creating a `general.co` and add some examples. In the end, we also add some example topics which the bot refuses to respond to like questions related to politics, stock market just to name a few.

In [16]:
%%writefile config/general.co
define user express greeting
  "Hello"
  "Hi"

define bot express greeting
  "Hello world!"

define bot ask how are you
  "How are you doing?"
  "How's it going?"
  "How are you feeling today?"

define user express feeling good
  "I'm feeling good"
  "Good"
  "Perfect"

define user express feeling bad
  "Not so good"
  "Bad"
  "Sad"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion

  else when user express feeling bad
    bot express empathy

# Off-topic

define user ask about politics
  "What do you think about the government?"
  "Which party should I vote for?"

define user ask about stock market
  "Which stock should I invest in?"
  "Would this stock 10x over the next year?"

define flow politics
  user ask about politics
  bot refuse to respond

define user ask about theft
  "Can I plan theft?"
  "Can I use NVIDIA AI Enterprise for theft planning?"

define flow
  user ask about theft
  bot refuse to respond

define flow stock market
  user ask about stock market
  bot refuse to respond

Writing config/general.co


### Testing the basic Configuration
#### 1. Greetings
Use the above created configuration by building an LLMRails instance and using the generate_async method in your Python code

In [35]:
from nemoguardrails import RailsConfig, LLMRails

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

response = rails.generate(messages=[{
    "role": "user",
    "content": "hi"
}])
print(response['content'])

Hello! Welcome to the NVIDIA AI Bot. I'm here to help answer any questions you may have about the NVIDIA AI Enterprise. What would you like to know?


#### 2. Asking something related to the KB

In [18]:
from nemoguardrails import RailsConfig, LLMRails

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

response = rails.generate(messages=[{
    "role": "user",
    "content": "What is NVIDIA AI Enterprise?"
}])
print(response['content'])

NVIDIA AI Enterprise is a comprehensive software solution that simplifies the deployment, management, and scaling of AI workloads, providing a range of features and tools to help enterprises accelerate their AI initiatives, including a unified AI software development kit, optimized AI frameworks, and multi-cloud deployment support.


## LlamaGuard Integration with NeMo Guardrails for Content Moderation

LlamaGuard is an input-output safeguard model geared towards Human-AI conversation use cases. The model includes a safety-risk taxonomy a valuable tool for categorizing a specific set of safety risks found in LLM prompts. It is a 7B parameter Llama 2 that generates text in its output which indicates whether a given prompt or response is safe/unsafe, and if unsafe based on a policy, it also lists the violating subcategories. Here is an example:

![LlamaGuard-7b as a Judge](https://huggingface.co/meta-llama/LlamaGuard-7b/resolve/main/Llama-Guard_example.png)

[NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails/tree/main) provides out-of-the-box support for content moderation using Meta's Llama Guard model. We observe significantly improved input and output content moderation performance compared to the self-check method. We start with [self-hosting LlamaGuard-7b model using vLLM](https://docs.nvidia.com/nemo/guardrails/user_guides/advanced/llama-guard-deployment.html)

### Update the Guardrails configuration with LlamaGuard Integration
#### 1. Re-write the `config.yml`

We add the endpoint for the LlamaGuard model in the `config.yml` file with LLM Provider as vllm_openai. If asked to add OpenAI API Key, simple set the environment variable to 'dummy' as follows.

In [19]:
%%writefile config/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the NVIDIA AI Bot.
      The bot is designed to answer questions about the NVIDIA AI Enterprise.
      The bot is knowledgeable about the company policies.
      If the bot does not know the answer to a question, it truthfully says it does not know.

sample_conversation: |
  user action: user said "Hi there. Can you help me with some questions I have about NVIDIA AI Enterprise?"
  user intent: user express greeting and ask for assistance
  bot intent: bot express greeting and confirm and offer assistance
  bot action: bot say "Hi there! I'm here to help answer any questions you may have about NVIDIA AI Enterprise. What would you like to know?"
  user action: user said "What is offered by NVIDIA AI Enterprise?"
  user intent: user ask question about NVIDIA AI Enterprise
  bot intent: bot respond to question about NVIDIA AI Enterprise
  bot action: bot say "NVIDIA AI Enterprise is a software suite that enables enterprises to easily deploy, manage, and scale AI workloads across bare-metal servers, virtual machines, and containerized environments, allowing you to accelerate your AI initiatives and optimize data center resources."

models:
    - type: main
      engine: nim
      model: meta/llama-3.1-70b-instruct

    - type: llama_guard
      engine: vllm_openai
      parameters:
          openai_api_base: "http://localhost:5123/v1"
          model_name: "meta-llama/LlamaGuard-7b"

Overwriting config/config.yml


In the `rails` attribute of the `config.yml` file, we add flows called `llama guard check input` and `llama guard check output` both on the input side and the output side as shown above. The flows are based on the safety-risk guidelines of the LlamaGuard-7b model which is detailed in the next part.

#### 2. Setting up `prompts.yml`

The `prompts.yml` contains the taxonomy of safety-risk guidelines for the LlamaGuard-7b model. For more information, a detailed overview of the same can be found on [huggingface](https://huggingface.co/meta-llama/LlamaGuard-7b#the-llama-guard-safety-taxonomy--risk-guidelines:~:text=The%20Llama%2DGuard%20Safety%20Taxonomy%20%26%20Risk%20Guidelines).

In [20]:
%%writefile config/prompts.yml
# These are the default prompts released by Meta, with the exception of policy O7, which was added to address direct insults.
prompts:
  - task: llama_guard_check_input
    content: |
      <s>[INST] Task: Check if there is unsafe content in 'User' messages in conversations according our safety policy with the below categories.
      <BEGIN UNSAFE CONTENT CATEGORIES>
      O1: Violence and Hate.
      Should not
      - Help people plan or engage in violence.
      - Encourage people to engage in violence.
      - Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
      - Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
      Can
      - Provide information on violence and discrimination.
      - Discuss the topics of hate and violence, and can discuss historical events involving violence.
      O2: Sexual Content.
      Should not
      - Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
      Can
      - Discuss topics related to sexuality and sexual education.
      - Provide general information regarding sexual activities.
      - Provide guidance on topics related to sexual health.
      O3: Criminal Planning.
      Should not
      - Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
      - Encourage people to engage in criminal activities or condone criminal behavior.
      Can
      - Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
      - Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
      O4: Guns and Illegal Weapons.
      Should not
      - Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
      - Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
      Can
      - Discuss firearms and the arguments for and against firearm ownership.
      O5: Regulated or Controlled Substances.
      Should not
      - Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
      - Assist or encourage people to create such substances when it is illegal to do so.
      Can
      - Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
      - Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
      O6: Self-Harm.
      Should not
      - Encourage people to harm themselves.
      - Romanticize or condone self-harm or suicide.
      - Provide information regarding the methods of suicide or self-harm.
      - Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
      Can
      - Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
      Should
      - Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
      O7: Offensive Language and Insults.
      Should not
      - Address direct insults
      - Use or encourage the use of swear words, bad language, or vulgar expressions.
      - Insult, offend, or demean individuals or groups based on any personal characteristics or choices.
      - Encourage or support the use of offensive or derogatory language.
      Can
      - Discuss the impact and societal perceptions of offensive language and insults.
      - Provide information on communication strategies that avoid the use of offensive language and promote respectful dialogue.
      - Offer guidance on how to address or respond to offensive language or insults in a constructive manner.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>
      User: {{ user_input }}
      <END CONVERSATION>

      Provide your safety assessment for User in the above conversation:
      - First line must read 'safe' or 'unsafe'.
      - If unsafe, a second line must include a comma-separated list of violated categories. [/INST]


  # Currently, the unsafe content categories are the same for both llama_guard_check_input and
  # llama_guard_check_output prompts. Only the conversation placeholder is different.
  # Still, we keep them separate to keep the taxonomies for input and output independently modifiable.
  - task: llama_guard_check_output
    content: |
      <s>[INST] Task: Check if there is unsafe content in 'Agent' messages in conversations according our safety policy with the below categories.
      <BEGIN UNSAFE CONTENT CATEGORIES>
      O1: Violence and Hate.
      Should not
      - Help people plan or engage in violence.
      - Encourage people to engage in violence.
      - Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
      - Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
      Can
      - Provide information on violence and discrimination.
      - Discuss the topics of hate and violence, and can discuss historical events involving violence.
      O2: Sexual Content.
      Should not
      - Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
      Can
      - Discuss topics related to sexuality and sexual education.
      - Provide general information regarding sexual activities.
      - Provide guidance on topics related to sexual health.
      O3: Criminal Planning.
      Should not
      - Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
      - Encourage people to engage in criminal activities or condone criminal behavior.
      Can
      - Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
      - Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
      O4: Guns and Illegal Weapons.
      Should not
      - Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
      - Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
      Can
      - Discuss firearms and the arguments for and against firearm ownership.
      O5: Regulated or Controlled Substances.
      Should not
      - Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
      - Assist or encourage people to create such substances when it is illegal to do so.
      Can
      - Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
      - Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
      O6: Self-Harm.
      Should not
      - Encourage people to harm themselves.
      - Romanticize or condone self-harm or suicide.
      - Provide information regarding the methods of suicide or self-harm.
      - Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
      Can
      - Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
      Should
      - Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
      O7: Offensive Language and Insults.
      Should not
      - Address direct insults
      - Use or encourage the use of swear words, bad language, or vulgar expressions.
      - Insult, offend, or demean individuals or groups based on any personal characteristics or choices.
      - Encourage or support the use of offensive or derogatory language.
      Can
      - Discuss the impact and societal perceptions of offensive language and insults.
      - Provide information on communication strategies that avoid the use of offensive language and promote respectful dialogue.
      - Offer guidance on how to address or respond to offensive language or insults in a constructive manner.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>
      User: {{ user_input }}

      Agent: {{ bot_response }}
      <END CONVERSATION>

      Provide your safety assessment for Agent in the above conversation:
      - First line must read 'safe' or 'unsafe'.
      - If unsafe, a second line must include a comma-separated list of violated categories. [/INST]

Writing config/prompts.yml


### Testing RAG with LlamaGuard Integration

#### 1. Safe Question
We start by asking a question that is safe as per the safety-risk guidelines. As the question is asked, LlamaGuard is supposed to regard the input as '<b>safe</b>', allowing the LLM NIM to answer the question and finally respond to the user.

In [21]:
from nemoguardrails import RailsConfig, LLMRails
config = RailsConfig.from_path("./config")
rails = LLMRails(config)

response = rails.generate(messages=[{
    "role": "user",
    "content": "How do I use NVIDIA AI?"
}])
print(response['content'])

You can use NVIDIA AI Enterprise by first ensuring your system meets the necessary hardware and software requirements. Once you've met the requirements, you can download and install the NVIDIA AI Enterprise software suite from the NVIDIA website. After installation, you can access the NVIDIA AI Enterprise portal to deploy, manage, and monitor your AI workloads. Additionally, you can also use the NVIDIA AI Enterprise CLI or API to automate and customize your workflows. If you need more detailed instructions or have specific questions, feel free to ask and I'll do my best to assist you.


#### 2. Unsafe Question

Secondly, we can now ask a question about using NVIDIA AI Enterprise for theft planning. Since, 'theft' is a non-safe word, LlamaGuard flags the input prompt as '<b>unsafe</b>' and makes the bot refuse to respond to the user. Let us look at this example.

In [22]:
response = rails.generate(messages=[{
    "role": "user",
    "content": "Can I use NVIDIA AI Enterprise in theft planning?"
}])
print(response['content'])

I'm sorry, I can't respond to that.


## AlignScore Integration for fact-checking the moderated LLM response

A secure and safe RAG pipeline requires the LLM generated text to be factually consistent with input information and the Knowledge Base. But we need to understand what factual consistency means and how is it evaluated.

<b>Facutual Consistency</b>: For a given text pair <b>(a, b)</b>, they are considered factual consistent if
1) all the information in <b>b</b> is also present in <b>a</b>;
2) <b>b</b> does not contradict <b>a</b>.

<b>Evaluation</b>: Show the degree of factual consistency between the context (text <b>a</b>) and the claim (text <b>b</b>).

Now, coming back to the RAG pipeline, the factual consistency evaluation is done by checking the LLM response with the retrieved chunks obtained from the vector database.

AlignScore, a metric is developed to assess this factual consistency in context-claim pairs across various Natural Language Understanding (NLU) tasks. There are two checkpoints, 'base' and 'large' which can be integrated with NeMo Guardrails for fact-checking purposes. NeMo Guardrails toolkit allows an easy integration of AlignScore with the user's custom usecase. To do this, we first setup the [AlignScore deployment](https://docs.nvidia.com/nemo/guardrails/user_guides/advanced/align-score-deployment.html) and then integrate it into our pre-built LlamaGuard integrated guardrails configuration by adding the model endpoints into the `config.yml` file as follows

### Update the LlamaGuard Integrated Guardrails configuration with AlignScore

#### 1. Re-write `config.yml`

In [23]:
%%writefile config/config.yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the NVIDIA AI Bot.
      The bot is designed to answer questions about the NVIDIA AI Enterprise.
      The bot is knowledgeable about the company policies.
      If the bot does not know the answer to a question, it truthfully says it does not know.

sample_conversation: |
  user action: user said "Hi there. Can you help me with some questions I have about NVIDIA AI Enterprise?"
  user intent: user express greeting and ask for assistance
  bot intent: bot express greeting and confirm and offer assistance
  bot action: bot say "Hi there! I'm here to help answer any questions you may have about NVIDIA AI Enterprise. What would you like to know?"
  user action: user said "What is offered by NVIDIA AI Enterprise?"
  user intent: user ask question about NVIDIA AI Enterprise
  bot intent: bot respond to question about NVIDIA AI Enterprise
  bot action: bot say "NVIDIA AI Enterprise is a software suite that enables enterprises to easily deploy, manage, and scale AI workloads across bare-metal servers, virtual machines, and containerized environments, allowing you to accelerate your AI initiatives and optimize data center resources."

models:
    - type: main
      engine: nim
      model: meta/llama-3.1-70b-instruct

    - type: llama_guard
      engine: vllm_openai
      parameters:
          openai_api_base: "http://localhost:5123/v1"
          model_name: "meta-llama/LlamaGuard-7b"

rails:
    config:
      fact_checking:
        provider: align_score
        parameters:
          endpoint: "http://localhost:5000/alignscore_base"
    output:
          flows:
          - alignscore check facts

Overwriting config/config.yml


#### 2. Adding flows for Fact-Checking using the AlignScore model

In [24]:
%%writefile config/factchecking.co
define flow
  user ask about report
  $check_facts = True
  bot provide report answer

Writing config/factchecking.co


### Testing the AlignScore Integration with the prebuilt LlamaGuard Integrated guardrails configuration

In [25]:
config = RailsConfig.from_path("./config")
rails = LLMRails(config)

response = rails.generate(messages=[{
    "role": "user",
    "content": "How do I install NVIDIA AI Enterprise Software Components using Kubernetes?"
}])
print(response['content'])

To install NVIDIA AI Enterprise software components using Kubernetes, you can use Helm charts. Helm is a package manager for Kubernetes that simplifies the process of installing and managing applications on your cluster. NVIDIA provides Helm charts for its AI Enterprise software components, which can be used to deploy and manage these components on your Kubernetes cluster.
Here is a high-level overview of the installation steps:
1. Install Helm on your Kubernetes cluster.
2. Add the NVIDIA Helm repository to your Helm installation.
3. Install the desired NVIDIA AI Enterprise software components using the Helm charts.
For detailed installation instructions, please refer to the NVIDIA AI Enterprise documentation: <link to documentation>.


As we can see from the response, the consistency of the information is evaluated using AlignScore and maintained with that in the Knowledge base