# Fact-Checking Guardrails

This use case consists of asking the LLM to re-check whether the output is consistent with the context. In other words, ask the LLM if the response is true based on the documents retrieved from the knowledge base.

This notebook is based on example **Grounding Rail** presented in Nemo Guardrails official [repo](https://github.com/NVIDIA/NeMo-Guardrails/tree/main/examples/grounding_rail). The chatbot will be asked several questions about a certain report, and we will use Guardrails to prevent it from answering facts that are not contained within the document. We will implement this rail using both OpenAI and Llama2 models.

<div align="center">
<img src="./docs/imgs/fact_checking_workflow.png" width="600"/>
</div>

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
cd /content/drive/MyDrive/GuardRAILS LLAMA2/llama2-nemo-guardrails

/content/drive/MyDrive/GuardRAILS LLAMA2/llama2-nemo-guardrails


In [4]:
%%capture
!pip install python-dotenv
!pip install transformers==4.33.1 --upgrade
!pip install nemoguardrails --upgrade
!pip install langchain --upgrade
!pip install accelerate --upgrade
!pip install spacy --upgrade #Optional
!pip install datasets bitsandbytes einops  -Uqqq
!pip install python-dotenv

In [5]:
## Load environment

import os
from dotenv import load_dotenv

load_dotenv()

False

# OpenAI

Load Guardrails configuration files located under `fact_check_config/openai` erasing the `fact_check.co` file.

In [None]:
# from nemoguardrails import LLMRails, RailsConfig

# # initialize rails config
# config = RailsConfig.from_path("./fact_check_config/openai")

# # create rails
# app = LLMRails(config, verbose=True)

  from .autonotebook import tqdm as notebook_tqdm


When asked some questions about the document contained in the `fact_check_config/openai/kb` folder, it answers accurately:

In [None]:
# history = [{"role": "user", "content": "What was the unemployment rate in March 2023?"}]
# bot_message = await app.generate_async(messages=history)
# print(bot_message['content'])

According to the US Bureau of Labor Statistics, the unemployment rate in March 2023 was 3.5 percent.


In [None]:
# history.append(bot_message)
# history.append(
#     {"role": "user", "content": "What was the unemployment rate for teenagers?"}
# )
# bot_message = await app.generate_async(messages=history)
# print(bot_message['content'])

According to the US Bureau of Labor Statistics, the unemployment rate for teenagers in March 2023 was 9.8 percent.


When asked a question that is not covered by the document, it gives a false statement:

In [None]:
# history.append(bot_message)
# history.append(
#     {"role": "user", "content": "What was the unemployment rate for senior citizens?"}
# )
# bot_message = await app.generate_async(messages=history)
# print(bot_message['content'])


According to the US Bureau of Labor Statistics, the unemployment rate for senior citizens in March 2023 was 4.7 percent.


## Adding the fact check rail

By adding the `fact_check.co` file back in the configuration, we will prevent the chatbot from hallucinating facts.

In [None]:
# # initialize rails config
# config = RailsConfig.from_path("./fact_check_config/openai")

# # create rails
# app = LLMRails(config, verbose=True)

When asked some questions about the document contained in the `fact_check_config/openai/kb` folder, it answers accurately:

In [None]:
# history = [{"role": "user", "content": "What was the unemployment rate in March 2023?"}]
# bot_message = await app.generate_async(messages=history)
# print(bot_message['content'])

According to the March 2023 US jobs report, the unemployment rate in March 2023 was 3.5 percent.


In [None]:
# history.append(bot_message)
# history.append(
#     {"role": "user", "content": "What was the unemployment rate for teenagers?"}
# )
# bot_message = await app.generate_async(messages=history)
# print(bot_message['content'])

According to the March 2023 US jobs report, the unemployment rate for teenagers was 9.8 percent.


When asked a question that is not covered by the document, it effectively replies that it is no enough information to answer.

In [None]:
# history.append(bot_message)
# history.append(
#     {"role": "user", "content": "What was the unemployment rate for senior citizens?"}
# )
# bot_message = await app.generate_async(messages=history)
# print(bot_message['content'])

I don't have enough information to answer


# Llama2

Load the HuggingFace model and create a pipeline:

In [1]:
# Important to be separated into different cell
import nest_asyncio
nest_asyncio.apply()

# Useful for debugging
import logging
logging.basicConfig(level=logging.DEBUG)

import accelerate
import bitsandbytes
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig


In [2]:
import os
os.environ['HF_TOKEN'] = "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"

In [3]:
MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,            # load model in 4-bit precision
    bnb_4bit_quant_type="nf4",    # pre-trained model should be quantized in 4-bit NF format
    bnb_4bit_use_double_quant=True, # Using double quantization as mentioned in QLoRA paper
    bnb_4bit_compute_dtype=torch.bfloat16,
    # During computation, pre-trained model should be loaded in BF16 format
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config = bnb_config,
    device_map = 'auto',
    use_cache=True,
    trust_remote_code=True,
#     use_flash_attention_2 = True
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=4096,
    do_sample=True,
    temperature=0.2,
    top_p=0.95,
    logprobs=None,
    top_k=40,
    repetition_penalty=1.1
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Wrap the pipeline using Langchain HuggingFacePipeline class. Then wrap it again using Nemo’s get_llm_instance_wrapper function and register it using register_llm_provider.

In [39]:
from nemoguardrails.llm.helpers import get_llm_instance_wrapper
from nemoguardrails import LLMRails, RailsConfig

from nemoguardrails.llm.providers import (
    HuggingFacePipelineCompatible,
    register_llm_provider,
)

hf_llm = HuggingFacePipelineCompatible(pipeline=pipe)
provider = get_llm_instance_wrapper(
    llm_instance=hf_llm, llm_type="hf_pipeline_llama2"
)
register_llm_provider("hf_pipeline_llama2", provider)

Load Guardrails configuration files located under `fact_check_config/llama2` erasing the `fact_check.co` file.

In [41]:
# initialize rails config
config = RailsConfig.from_path("/content/drive/MyDrive/GuardRAILS LLAMA2/llama2-nemo-guardrails/fact_check_config/llama2_step")

# create rails
app = LLMRails(config, verbose = True)

When asked some questions about the document contained in the `fact_check_config/llama2/kb` folder, it answers accurately:

In [42]:
await app.generate_async(prompt="What was the unemployment rate in March 2023?")



'The unemployment rate in March 2023 was 3.5%.'

In [20]:
await app.generate_async(prompt="What was the unemployment rate for senior citizens?")



'According to the latest data available, the unemployment rate for senior citizens (aged 55 and older) was 3.1% in March 2023.'

## Adding the fact check rail

We add the `fact_check.co` file back in the configuration to prevent the chatbot from hallucinating facts.

In [45]:
from nemoguardrails import LLMRails, RailsConfig

# initialize rails config
config = RailsConfig.from_path("/content/drive/MyDrive/GuardRAILS LLAMA2/llama2-nemo-guardrails/fact_check_config/llama2")

# create rails
app = LLMRails(config, verbose = True)

When asked some questions about the document contained in the `fact_check_config/llama2/kb`, no answer is provided. Something is not working properly.

In [46]:
await app.generate_async(prompt="What was the unemployment rate in March 2023?")



"I don't have enough information to answer"

## Adding the fact check rail + few shot in prompt

By adding some few shot examples to the prompt in the `general.yml` file located in the library `nemoguardrails/llm/prompts/` folder, the rail starts working.

In [47]:
from nemoguardrails.llm.helpers import get_llm_instance_wrapper
from nemoguardrails import LLMRails, RailsConfig

from nemoguardrails.llm.providers import (
    HuggingFacePipelineCompatible,
    register_llm_provider,
)

hf_llm = HuggingFacePipelineCompatible(pipeline=pipe)
provider = get_llm_instance_wrapper(
    llm_instance=hf_llm, llm_type="hf_pipeline_llama2"
)
register_llm_provider("hf_pipeline_llama2", provider)

# initialize rails config
config = RailsConfig.from_path("/content/drive/MyDrive/GuardRAILS LLAMA2/llama2-nemo-guardrails/fact_check_config/llama2")

# create rails
app = LLMRails(config, verbose = True)

When asked a question that is not covered by the document, it effectively replies that it is no enough information to answer.

In [48]:
await app.generate_async(prompt="What was the unemployment rate for senior citizens?")



"I don't have enough information to answer"