# Using NeMo Guardrails with LangChain RAG and NVIDIA NIMs

This notebook goes through how to integrate NeMo Guardrails with a basic RAG pipeline in LangChain. This notebook assumes that you already have an intermediary level developer who has a prequisite knowledge about RAG and NeMo Guardrails. If not, please visit our [RAG Example Using NVIDIA API Catalog and LangChain notebook](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RAG/notebooks/langchain/langchain_basic_RAG.ipynb) first and .

## Terminology

RAG (Retrieval-Augmented Generation) is a natural language processing technique that combines retrieval of relevant documents from a large corpus with an LLM to produce more accurate and contextually relevant responses.

[NVIDIA NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) provides programmable guardrails for ensuring trustworthiness, safety, security, and controlled dialog while protecting against common LLM vulnerabilities. 

[NVIDIA NIM microservices](https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/) are containerized microservices that simplify the deployment of generative AI models like LLMs and are optimized to run on NVIDIA GPUs. NIM microservices support models across domains like chat, embedding, reranking, and more from both the community and NVIDIA.

[NVIDIA API Catalog](https://build.nvidia.com/explore/discover) is a hosted platform for accessing a wide range of microservices online. You can test models on the catalog and then export them with an NVIDIA AI Enterprise license for on-premises or cloud deployment

Integrating NeMo Guardrails with LangChain RAG and NVIDIA NIMs ensure that the answers from LLMs are both safe and accurate.

## Installation and Requirements

Create a Python environment (preferably with Conda) using Python version 3.10.14. 
To install Jupyter Lab, refer to the [installation](https://jupyter.org/install) page.

In [None]:
!pip install langchain==0.2.5
!pip install nemoguardrails==0.9.1.1
!pip install langchain-nvidia-ai-endpoints==0.1.2

## Getting Started!

To get started you need an `NVIDIA_API_KEY` to use the NVIDIA API Catalog:

1) Create a free account with [NVIDIA](https://build.nvidia.com/explore/discover).
2) Click on your model of choice.
3) Under Input select the Python tab, and click **Get API Key** and then click **Generate Key**.
4) Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints.

In [None]:
import getpass
import os

if not os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    nvidia_api_key = getpass.getpass("Enter your NVIDIA API key: ")
    assert nvidia_api_key.startswith("nvapi-"), f"{nvidia_api_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvidia_api_key

## Creating a RAG example

In [None]:
# Instantiating NIMs — an LLM NIM and an Embedding NIM

from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

llm = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")
embedding_model = NVIDIAEmbeddings(model="nvidia/nv-embedqa-e5-v5", truncate="END")

In [None]:
# Process the documents into vectorstore

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

loader = TextLoader("../data/Sweden.txt")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=embedding_model)
retriever = vectorstore.as_retriever()

In [None]:
# Assembling our RAG pipeline

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
{context}
Question: {question}
Helpful Answer:"""

prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Now, we have a RAG example ready to be tested. Let's ask our LLM a question with a non-harmful intent.

In [None]:
rag_chain.invoke("Which city in Sweden has the lowest Gini cofficient? What is the value?")

Let's consider a scenario where a user asks our LLM a question with potentially harmful intent.

Notice that our LLM still responds to the query, even if the user's intent might be malicious.

In [None]:
rag_chain.invoke("I want to learn more about the things on the computer of a Swedish government official. How is the Swedish administration divided?")

## Integrating NeMo Guardrails

We can integrate safety filtering through the use of NeMo Guardrails. We aim to filter the incoming user messages and route it to a predefined flow if the message intent is malicious.

In [None]:
from nemoguardrails.integrations.langchain.runnable_rails import RunnableRails
from nemoguardrails import RailsConfig
import nest_asyncio
nest_asyncio.apply()

Let's load our configuration files and create our `RunnableRails` instance which allows NeMo Guardrails to be used with [LangChain's Runnables](https://python.langchain.com/v0.1/docs/expression_language/interface/).

In [None]:
config = RailsConfig.from_path("./config")
guardrails = RunnableRails(config, input_key="question")

Let's see what's the configuration what we have set in our `./config` folder.

We see that in our YAML file, we have configured NeMo Guardrails to use [Meta's Llama-3.1-70-instruct NIM LLM](https://build.nvidia.com/meta/llama-3_1-70b-instruct) and [NVIDIA's NV-EmbedQA-E5-V5 Embedding NIM](https://build.nvidia.com/nvidia/nv-embedqa-e5-v5).

In [None]:
with open('config/config.yml', 'r') as file:
    print(file.read())

We also see that have defined 2 flows: a flow to greet the user and a flow to prevent the LLM from responding to queries about user sensitive data

In [None]:
with open('config/flows.co', 'r') as file:
    print(file.read())

We can put our guardrails to intercept the incoming message before it goes into the RAG chain.

In [None]:
guardrailed_rag_chain = guardrails | rag_chain

Now, let's ask the LLM the same question with potentially harmful intent. The LLM no longer responds to the query and gives the answer that we have predefined.

In [None]:
guardrailed_rag_chain.invoke("I want to learn more about the things on the computer of a Swedish government official. How is the Swedish administration divided?")['output']

To learn more advanced techniques of using NeMo Guardrails in LangChain, check out [the documentation](https://docs.nvidia.com/nemo/guardrails/user_guides/langchain/index.html).