<a href="https://colab.research.google.com/github/datastax/ragstack-ai/blob/main/examples/notebooks/nvidia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NVIDIA NeMo Guardrails

This notebooks demonstrates how to set up and use NVIDIA NeMo's Guardrails. [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) is a cloud-native framework designed for AI models. [NeMo Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) is a toolkit for adding programmable guardrails to LLM-based conversational applications. In this notebook, you will create an RAG-based application featuring Retrieval rails that specifically guard against retrieval of chunks with potentially sensitive content. 

## Prerequisites

You will need a vector-enabled Astra database. This notebook uses OpenAI, though you can certainly use the NVIDIA models, as NeMo [supports all LLM providers supported by Langchain](https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/docs/user_guides/configuration-guide.md#supported-llm-models).

* Create an [Astra vector database](https://docs.datastax.com/en/astra-serverless/docs/getting-started/create-db-choices.html).
* Create an [OpenAI account](https://openai.com/)
* Within your database, create an [Astra DB Access Token](https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-tokens.html) with Database Administrator permissions.
* Get your Astra DB Endpoint: 
  * `https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com`

See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details.

## Setup
`ragstack-ai` includes all the packages you need to build a RAG pipeline. 

`nemoguardrails` for NeMo guardrails. 

`langchain-nvidia-ai-endpoints` includes the NVIDIA models.

`datasets` is used to import a sample dataset.

In [1]:
! pip install -qU ragstack-ai nemoguardrails langchain-nvidia-ai-endpoints datasets

In [2]:
import os
from getpass import getpass

# Enter your settings for Astra DB and OpenAI:
keys = ["ASTRA_DB_APPLICATION_TOKEN", "ASTRA_DB_API_ENDPOINT", "NVIDIA_API_KEY"]
for key in keys:
    if key not in os.environ:
        os.environ[key] = getpass(f"Enter {key}: ")

In [4]:
# Collections are where documents are stored. ex: test
collection = input("Collection: ")

## Create Guardrails

In [2]:
# Colang is a modeling language enabling the design of guardrails.
# In it, you define user and bot behaviors, as well as the flow of conversations.

# https://github.com/NVIDIA/NeMo-Guardrails/blob/feature/runnable-rails/docs/user_guides/colang-language-syntax-guide.md
COLANG_CONFIG = """
define user express greeting
    "hi"
    "hello"
    "how are you?"

define bot express greeting
    "Hi! I'm RAGBot"
    "Hello, I am RAGBot"

define bot ask how to help
    "What information do you need?"

define flow greeting
    user express greeting
    bot express greeting
    bot ask how to help

define flow qa
    $last_user_message = user ...
    $answer = execute qa_chain(query=$last_user_message)
    bot $answer

"""
# TODO: FRAZ - define the qa_chain -- it is an action that we need to pass to rails config at runtime. 

In [None]:
from nemoguardrails import RailsConfig
from nemoguardrails.integration.langchain.runnable_rails import RunnableRails


def yaml_config(engine, model):
    return f"""
    models:
      - type: main
        engine: {engine}
        model: {model}
    """


engine = "openai"
model_name = "gpt-3.5-turbo-16k"
yaml = yaml_config(engine, model_name)
config = RailsConfig.from_content(
    COLANG_CONFIG,
    yaml,
)
guardrails = RunnableRails(config)

## Create RAG Pipeline

### Embedding Model and Vector Store

In [5]:
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings

embedding = NVIDIAEmbeddings(model="nvolveqa_40k")

In [6]:
from langchain.vectorstores.astradb import AstraDB

vstore = AstraDB(
    collection_name=collection,
    embedding=embedding,
    token=os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
    api_endpoint=os.getenv("ASTRA_DB_API_ENDPOINT"),
)
print("Astra vector store configured")

Astra vector store configured


In [12]:
from datasets import load_dataset

# Load a sample dataset
philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
print("An example entry:")
print(philo_dataset[16])

An example entry:
{'author': 'aristotle', 'quote': 'Love well, be loved and do something of value.', 'tags': 'love;ethics'}


In [8]:
from langchain.schema import Document

# Constructs a set of documents from your data. Documents can be used as inputs to your vector store.
docs = []
for entry in philo_dataset:
    metadata = {"author": entry["author"]}
    if entry["tags"]:
        # Add metadata tags to the metadata dictionary
        for tag in entry["tags"].split(";"):
            metadata[tag] = "y"
    # Create a LangChain document with the quote and metadata tags
    doc = Document(page_content=entry["quote"], metadata=metadata)
    docs.append(doc)

In [9]:
# Create embeddings by inserting your documents into the vector store.
inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len(inserted_ids)} documents.")


Inserted 450 documents.


### QA Retrieval

Retrieve context from your vector database, and pass it to the NVIDIA model with a prompt.

In [11]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain_nvidia_ai_endpoints import ChatNVIDIA

retriever = vstore.as_retriever(search_kwargs={"k": 3})

prompt_template = """
Answer the question based only on the supplied context. If you don't know the answer, say you don't know the answer.
Context: {context}
Question: {question}
Your answer:
"""
prompt = ChatPromptTemplate.from_template(prompt_template)
model = ChatNVIDIA(model="mixtral_8x7b")

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | (guardrails | model)
    | StrOutputParser()
)

chain.invoke("In the given context, what subject are philosophers most concerned with?")

'In the given context, philosophers are most concerned with the subject of philosophy itself. Aristotle discusses philosophy as starting with wonder and also mentions that it can make people sick. Hegel, on the other hand, discusses the relationship between knowledge and education. Therefore, it can be inferred that philosophers are concerned with understanding the world, knowledge, and its implications on individuals and society.'

In [None]:
# Add your questions here!
# chain.invoke("<your question>")

## Cleanup

In [None]:
# WARNING: This will delete the collection and all documents in the collection
# vstore.delete_collection()

You now have a functional RAG pipeline powered by NVIDIA! NVIDIA offers many different model types suited for different problems. Check out the [catalog](https://catalog.ngc.nvidia.com) for more. 