# Installation

In [1]:
! pip install langchain_community
! pip install tiktoken
! pip install chromadb
! pip install langchain
! pip install nemoguardrails
! pip install pypdf
! pip install openai
! pip install langchain-openai
! pip install langchain_chroma

Collecting langchain_community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB

Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Downloading pypdf-5.4.0-py3-none-any.whl (302 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.4.0
Collecting langchain-openai
  Downloading langchain_openai-0.3.8-py3-none-any.whl.metadata (2.3 kB)
Downloading langchain_openai-0.3.8-py3-none-any.whl (55 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.4/55.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain-openai
Successfully installed langchain-openai-0.3.8


In [16]:
from importlib.metadata import version
from tabulate import tabulate

# List of package names
packages = [
    "langchain_community",
    "tiktoken",
    "chromadb",
    "langchain",
    "nemoguardrails",
    "pypdf",
    "openai",
    "langchain-openai",
    "langchain_chroma",
]

# Collect package versions
data = []
for package in packages:
    try:
        pkg_version = version(package)
    except Exception as e:
        pkg_version = "Version not found"
    data.append([package, pkg_version])

# Print the formatted table
print(tabulate(data, headers=["Package", "Version"], tablefmt="github"))


| Package             | Version   |
|---------------------|-----------|
| langchain_community | 0.3.19    |
| tiktoken            | 0.9.0     |
| chromadb            | 0.6.3     |
| langchain           | 0.3.20    |
| nemoguardrails      | 0.12.0    |
| pypdf               | 5.4.0     |
| openai              | 1.61.1    |
| langchain-openai    | 0.3.8     |
| langchain_chroma    | 0.2.2     |


# Loading the data

In [2]:
from langchain_community.document_loaders import PyPDFLoader

file_path = r"/content/Health-insurance.pdf"
loader = PyPDFLoader(file_path)

In [3]:
docs = loader.load()
docs[0]

Document(metadata={'producer': 'Adobe Acrobat Pro DC (32-bit) 21 Paper Capture Plug-in', 'creator': 'Acrobat PDFMaker 9.0 for Word', 'creationdate': '2011-06-22T12:40:00+05:30', 'author': '', 'company': '', 'moddate': '2021-07-01T02:49:34+05:30', 'sourcemodified': 'D:20110616065908', 'title': 'Terms & condition – Heart Beat                                                                               (A Preliminary Draft for Discussion Purpose only )', 'source': '/content/Health-insurance.pdf', 'total_pages': 25, 'page': 0, 'page_label': '1'}, page_content='Health Companion – Health Insurance Plan  POLICY DOCUMENT \n \n \nPage 1 of 25 \nPolicy Document – Part II \n1. Terms & Conditions \nThe insurance cover provided under this Policy to the Insured Person up to the Sum Insured is and shall \nbe subject to (a) the terms and conditions of this Policy and (b) the receipt of premium, and (c) the \ninformation You provided to Us (including by way of the Proposal or Information Summary Sheet

This above snippet of code sets up the necessary elements to load a PDF file so that its contents can be accessed and manipulated within your program. The PyPDFLoader is specifically designed for handling PDF files, making it easier to work with such documents in Python.

In [4]:
yaml_content = """
models:
  - type: main
    engine: openai
    model: gpt-4

instructions:
  - type: general
    content: |
      You are a bot called the Health Policy Assistant.
      The bot is designed to answer user questions about the Health Insurance Policy.
      The bot is knowledgeable about the terms, conditions, benefits, and exclusions mentioned in the document.
      If the bot does not know the answer to a question, it truthfully says it does not know.
      The bot will only answer questions related to the Health Insurance Policy and will decline politely for off-topic queries.

prompts:
  - task: self_check_input
    content: |
      Your task is to check if the user message is appropriate and relevant to health insurance policy information.

      The user messages:
      - should not contain harmful or abusive language.
      - should not ask the bot to provide false or misleading policy details.
      - should not request personal, sensitive, or confidential information.
      - should not ask for financial advice beyond the scope of the document.
      - should not include explicit, offensive, or unethical content.

      User message: "{{ user_input }}"

      Question: Should the user message be blocked (Yes or No)?
      Answer:

  - task: self_check_output
    content: >
      Model_output: {{ bot_response }}

      Is this output incorrect, misleading, or contains unauthorized financial advice?

      Answer [Yes/No]:

rails:
  input:
    flows:
      - self check input

  output:
    flows:
      - self check output
"""

rag_colang_content = """
# Define allowed user intents
define user greeting
    "hi"
    "hello"
    "how are you"

define bot greeting
    "Hello, I am the Health Policy Assistant. How can I assist you with your health insurance queries?"

define flow greet
    user greeting
    bot greeting

# Define bot response to off-topic questions
define user off topic
    "What do you like?"
    "What is the time?"
    "Tell me a joke?"
    "What is your favorite food?"

define bot response off topic
    "I'm sorry, but I can only provide information related to the Health Insurance Policy."

define flow
    user off topic
    bot response off topic

# Define bot response to policy-related queries
define bot response_refusal
    "I'm sorry, I can't provide that information."

define flow self check input
    $allowed = execute self_check_input
    if not $allowed
        bot response_refusal
        stop

define flow self check output
    $allowed = execute self_check_output
    if not $allowed
        bot response_refusal
        stop

define flow
    user asks about coverage
    $answer = execute qa_chain(query=$last_user_message)
    bot $answer
"""

YAML Content:

models:

    type: main
    engine: openai
    model: gpt-4

This section specifies the AI model used by the system, which is OpenAI's GPT-4. It's setting up the model as the main component for generating responses.


instructions:

    type: general
    content: |
      You are a bot called the Health Policy Assistant...
These instructions define the bot's role and guidelines. It explains that the bot assists users by providing information about a health insurance policy, including details like terms and conditions. The bot is also programmed to acknowledge if it doesn't know an answer and to refuse off-topic questions politely.

prompts:

    task: self_check_input
    content: |
      Your task is to check if the user message is appropriate and relevant...

This prompt sets up a task for the bot to determine whether a user's message is appropriate and related to health insurance, ensuring it doesn't contain harmful or irrelevant content.

    task: self_check_output
    content: >
      Model_output: {{ bot_response }}

      Is this output incorrect, misleading, or contains unauthorized financial advice?

      Answer [Yes/No]:

Another task to validate the bot's response, ensuring that it's correct, not misleading, and doesn't offer unauthorized financial advice.

    rails:
      input:
        flows:
          - self check input

      output:
        flows:
          - self check output

This configuration ensures that both input messages and bot responses go through defined "rails" or checks (self_check_input and self_check_output) to maintain quality and relevance.

RAG Content (RAG Colang Content):

    # Define allowed user intents
    define user greeting
        "hi"
        "hello"
        "how are you"

This section sets up recognized user intents like greetings and the appropriate bot responses, ensuring the interaction starts smoothly.

Define bot response to off-topic questions

    define user off topic
        "What do you like?"
        ...
Here, the bot is configured to recognize off-topic questions and how to respond politely, staying focused on health insurance topics.

Define bot response to policy-related queries

    define flow self check input
        $allowed = execute self_check_input
        if not $allowed
            bot response_refusal
            stop

This part sets up a flow where the bot checks if the input is allowed based on earlier configurations. If not, it refuses to proceed, thus ensuring only relevant and appropriate interactions.

In [5]:
from nemoguardrails import LLMRails, RailsConfig
config = RailsConfig.from_content(rag_colang_content, yaml_content)

This above snippet of code is responsible for initializing the configuration for your AI model, specifically tailored to ensure that the responses generated adhere to certain guidelines and rules defined in the RAG and YAML content.

In [6]:
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-key"

'OPENAI_API_KEY' is likely used to authenticate requests to the OpenAI API, ensuring that your software can securely access OpenAI services.

In [7]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o"
)

In above code snippet, you're setting up an instance of a chatbot using OpenAI's GPT-4 model, customized as "gpt-4o." This instance (llm) will allow you to interact with the GPT-4 model to process and respond to text inputs, making it useful for applications involving natural language understanding and generation.

In [8]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large"
)

- This instance 'embeddings' is set up to use the "text-embedding-3-large" model.
- This model is specifically designed to generate embeddings for text.
- Embeddings are numerical representations of text that capture semantic meanings,
- useful for comparing and analyzing texts based on their content.

In [9]:
app = LLMRails(config,llm=llm)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/650 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

model.onnx:   0%|          | 0.00/90.4M [00:00<?, ?B/s]

n this snippet, you are initializing an LLMRails object named app, which integrates your previously defined AI model (llm) with the configuration rules (config). This integration enables the application to process user inputs and generate responses that adhere to the specified guidelines and constraints.

In [10]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(docs)

This snippet sets up a text splitter that is used to break down larger text documents into smaller, manageable chunks.

In [12]:
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)

In above code snippet, you're setting up a Chroma vector store, which is essentially a database of text embeddings. The Chroma.from_documents method takes two key inputs:

- documents: A collection of text chunks (all_splits) that have been split from larger documents to manage size and processing constraints.
- embedding: The embeddings object you created earlier, which generates numerical representations (embeddings) of text using a specific OpenAI model.

The purpose of this setup is to store these embeddings in a format that allows for efficient retrieval and comparison. This is particularly useful in applications involving semantic search, where you can quickly find documents that are semantically similar to a query based on the proximity of their embeddings.

In [13]:
# Import necessary classes from langchain.chains and langchain.prompts modules.
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Define a string template for generating prompts for the AI model.
# This template instructs the AI to act as a Health Policy Assistant and sets rules for how it should answer questions.
prompt_template = """You are a Health Policy Assistant.
Strictly answer the user questions only based on the given contents of the context and do not rely on external knowledge.

- If you don't know the answer or can't find the answer in the context, just say that you don't know, don't try to make up an answer.
- Provide responses that are factually correct and relevant to the health insurance policy.
- Do not give financial or legal advice beyond the policy's stated terms.
- Keep responses concise and accurate.

{context}

{question}
"""

# Create an instance of PromptTemplate, configuring it with the prompt_template.
# This object will be used to generate formatted prompts for the AI model based on provided context and questions.
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

# Define a function to create a RetrievalQA chain.
# This function sets up a chain for question answering using the specified LLM and a vector store as a retriever.
def get_qa_chain(llm):
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(),
        chain_type_kwargs={"prompt": PROMPT},
    )
    return qa_chain

# Create a QA chain instance by calling get_qa_chain with the LLM from the app instance.
qa_chain = get_qa_chain(app.llm)

# Register this QA chain as an action in the app, allowing it to be invoked with the specified name "qa_chain".
app.register_action(qa_chain, name="qa_chain")


In [14]:
user_message=input("Ask your Question: ")
bot_message = await app.generate_async(messages=[{"role": "user", "content": user_message}])
print(bot_message['content'])

Ask your Question: Tell me about terms and condition?
Certainly! The terms and conditions of a health insurance policy typically outline the rules and guidelines that govern the policy. This can include details about premium payments, coverage periods, information on the claims process, and any exclusions or limitations of the policy. If you have specific questions or need clarification on certain aspects, feel free to ask!


- Input: The script starts by capturing a question from the user via the standard input function.
- Async Generation: It then calls an asynchronous method generate_async on the app instance. This method processes the user's input in the context of previous interactions defined in the application setup.
- Output: Finally, the script prints out the bot's response to the console.

In [15]:
user_message=input("Ask your Question: ")
bot_message = await app.generate_async(messages=[{"role": "user", "content": user_message}])
print(bot_message['content'])

Ask your Question: What is your favorite food?
I'm sorry, but I can only provide information related to the Health Insurance Policy.
