<a href="https://colab.research.google.com/github/Decoding-Data-Science/zain/blob/main/Copy_of_demo_insurance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LlamaParse - Fast checking Insurance Contract for Coverage

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_insurance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we will look at how LlamaParse can be used to extract structured coverage information from an insurance policy.

## Installation of required packages

In [None]:
!pip install llama-index llama-parse

Collecting llama-index
  Downloading llama_index-0.12.35-py3-none-any.whl.metadata (12 kB)
Collecting llama-parse
  Downloading llama_parse-0.6.22-py3-none-any.whl.metadata (6.9 kB)
Collecting llama-index-agent-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.7-py3-none-any.whl.metadata (438 bytes)
Collecting llama-index-cli<0.5,>=0.4.1 (from llama-index)
  Downloading llama_index_cli-0.4.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13,>=0.12.35 (from llama-index)
  Downloading llama_index_core-0.12.35-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.4,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.11-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-llms-openai<0.4,>=0.3.0 (from llama-index)
  Do

## Download an insurance policy fron IRDAI

The Insurance Regulatory and Development Authority of India (IRDAI) maintains a great resource: https://policyholder.gov.in/web/guest/non-life-insurance-products where all insurance policies available in India are publicly available for download! Let's download a complex health insurance policy as an example.

In [None]:
!wget "https://policyholder.gov.in/documents/37343/931203/NBHTGBP22011V012223.pdf/c392bcc1-f6a8-cadd-ab84-495b3273d2c3?version=1.0&t=1669350459879&download=true" -O "./policy.pdf"

--2025-05-13 19:23:12--  https://policyholder.gov.in/documents/37343/931203/NBHTGBP22011V012223.pdf/c392bcc1-f6a8-cadd-ab84-495b3273d2c3?version=1.0&t=1669350459879&download=true
Resolving policyholder.gov.in (policyholder.gov.in)... 13.107.246.73
Connecting to policyholder.gov.in (policyholder.gov.in)|13.107.246.73|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1341586 (1.3M) [application/pdf]
Saving to: ‘./policy.pdf’


2025-05-13 19:23:14 (1.59 MB/s) - ‘./policy.pdf’ saved [1341586/1341586]



## Initializing LlamaIndex and LlamaParse

In [None]:
# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [None]:
import os
from google.colab import userdata
os.environ["LLAMA_CLOUD_API_KEY"] = userdata.get('llama')
os.environ["OPENAI_API_KEY"] = userdata.get('openai')

In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

# for the purpose of this example, we will use the small model embedding and gpt3.5
embed_model=OpenAIEmbedding(model="text-embedding-3-small")
llm = OpenAI(model="gpt-3.5-turbo-0125")

Settings.llm = llm

## Vanilla Approach - Parse the Policy with LlamaParse into Markdown

In [None]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="markdown").load_data("./policy.pdf")

Started parsing the file under job_id 558c92e5-d8a6-45a4-a6dc-91c14e2ed75b


In [None]:
print(documents[0].text[0:1000])

# Bupa niva Health Insurance

# 1. Preamble

This ‘Travel Infinity’ Policy is a contract of insurance between You and Us which is subject to payment of full premium in advance and the terms, conditions and exclusions of this Policy. Expense incurred outside the policy period will NOT be covered. Unutilized Sum Insured will expire at the end of policy year. All applicable benefits, details and limits are mentioned in your Certificate of insurance. We will cover only allopathic treatments in this policy.

# 2. Defined Terms

The terms listed below in this Section and used elsewhere in the Policy in Initial Capitals shall have the meaning set out against them in this Section.

# Standard Definitions

# 2.1 Accident or Accidental

means sudden, unforeseen and involuntary event caused by external, visible and violent means.

# 2.2 Co-payment

means a cost sharing requirement under a health insurance policy that provides that the policyholder/insured will bear a specified percentage of the a

### Markdown Element Node Parser
Our markdown element node parser works well for parsing the markdown output of LlamaParse into a set of table and text nodes.

In [None]:
from llama_index.core.node_parser import MarkdownElementNodeParser

node_parser = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8)

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
2it [00:00, 8184.01it/s]
1it [00:00, 13842.59it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
1it [00:00, 3167.90it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
1it [00:00, 768.47it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
1it [00:00, 5592.41it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
2it [00:00, 11618.57it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
2it [00:00, 13819.78it/s]
2it [00:00, 23431.87it/s]
2it [00:00, 23497.50it/s]
0it [00:00, ?it/s]
1it [00:00, 6944.21it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [

In [None]:
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

recursive_index = VectorStoreIndex(nodes=base_nodes+objects)

In [None]:
query_engine = recursive_index.as_query_engine(similarity_top_k=25)

### Querying the model for coverage

In [None]:
query_1 = "My trip was delayed and I paid 45, how much am I covered for?"

response_1 = query_engine.query(query_1)
print(str(response_1))

You are covered for the amount mentioned in the Certificate of Insurance for Trip Delay, up to the maximum limit specified.


In [None]:
query_1 = "is my dental checkup covered?"

response_1 = query_engine.query(query_1)
print(str(response_1))

Yes, your dental checkup is covered under the Emergency Dental Treatment benefit of the policy.


The information is split across the document which leads to retrieval issues. Let's try some parsing instructions to improve our result.

In [None]:
documents_with_instruction = LlamaParse(result_type="markdown", parsing_instruction="""
This document is an insurance policy.
When a benefit/coverage/exlusion is described in the document append a line of the following benefits string format (where coverage could be an exclusion).

For {nameofrisk} and in this condition {whenDoesThecoverageApply} the coverage is {coverageDescription}.

If the document contains a benefits TABLE that describe coverage amounts, do not ouput it as a table, but instead as a list of benefits strings.

""").load_data("./policy.pdf")

Started parsing the file under job_id 2bc99d1f-d902-4435-b5c8-dd582547e45f


Let see how the 2 parsing compare (change target page to explore)

In [None]:
print(len(pages_vanilla))  # How many pages are there?

1


In [None]:
pages_vanilla = documents[0].text.split("\n---\n")

In [None]:
pages_vanilla[0]

'# Bupa niva Health Insurance\n\n# 1. Preamble\n\nThis ‘Travel Infinity’ Policy is a contract of insurance between You and Us which is subject to payment of full premium in advance and the terms, conditions and exclusions of this Policy. Expense incurred outside the policy period will NOT be covered. Unutilized Sum Insured will expire at the end of policy year. All applicable benefits, details and limits are mentioned in your Certificate of insurance. We will cover only allopathic treatments in this policy.\n\n# 2. Defined Terms\n\nThe terms listed below in this Section and used elsewhere in the Policy in Initial Capitals shall have the meaning set out against them in this Section.\n\n# Standard Definitions\n\n# 2.1 Accident or Accidental\n\nmeans sudden, unforeseen and involuntary event caused by external, visible and violent means.\n\n# 2.2 Co-payment\n\nmeans a cost sharing requirement under a health insurance policy that provides that the policyholder/insured will bear a specified 

In [None]:
target_page = 0
pages_vanilla = documents[0].text.split("\n---\n")
pages_with_instructions = documents_with_instruction[0].text.split("\n---\n")

print(pages_vanilla[target_page])
print("\n\n=========================================================\n\n")
print(pages_with_instructions[target_page])

# Bupa niva Health Insurance

# 1. Preamble

This ‘Travel Infinity’ Policy is a contract of insurance between You and Us which is subject to payment of full premium in advance and the terms, conditions and exclusions of this Policy. Expense incurred outside the policy period will NOT be covered. Unutilized Sum Insured will expire at the end of policy year. All applicable benefits, details and limits are mentioned in your Certificate of insurance. We will cover only allopathic treatments in this policy.

# 2. Defined Terms

The terms listed below in this Section and used elsewhere in the Policy in Initial Capitals shall have the meaning set out against them in this Section.

# Standard Definitions

# 2.1 Accident or Accidental

means sudden, unforeseen and involuntary event caused by external, visible and violent means.

# 2.2 Co-payment

means a cost sharing requirement under a health insurance policy that provides that the policyholder/insured will bear a specified percentage of the a

In [None]:
node_parser_instruction = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8)
nodes_instruction = node_parser.get_nodes_from_documents(documents_with_instruction)
base_nodes_instruction, objects_instruction = node_parser_instruction.get_nodes_and_objects(nodes_instruction)

recursive_index_instruction = VectorStoreIndex(nodes=base_nodes_instruction+objects_instruction)
query_engine_instruction = recursive_index_instruction.as_query_engine(similarity_top_k=25)

0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, 

## Comparing Instruction-Augmented Parsing vs. Vanilla Parsing

When we parse the document with natural language instructions to add context on insurance coverage, we are able to correctly answer a wide range of queries in our RAG pipeline. In contrast, a RAG pipeline built with the vanilla method is not able to answer these queries.

In [None]:
query_1 = "My trip was delayed and I paid 45, how much am I covered for?"

response_1 = query_engine.query(query_1)
print("Vanilla:")
print(response_1)

print("With instructions:")
response_1_i = query_engine_instruction.query(query_1)
print(response_1_i)


Vanilla:
You are covered for the amount you paid for the delayed trip, as mentioned in the policy schedule.
With instructions:
You are covered for USD 45 for the delay of your trip.


Looking at the policy it says in list I that one expense not covered is Baby food

In [None]:
query_2 = "I just had a baby, is baby food covered?"

response_2 = query_engine.query(query_2)
print("Vanilla:")
print(response_2)

print("With instructions:")
response_2_i = query_engine_instruction.query(query_2)
print(response_2_i)

Vanilla:
Baby food is not covered under the policy as it falls under the category of expenses not linked to treatment, such as food and beverages, toiletries, and cosmetics.
With instructions:
Baby food is not covered based on the provided context information.


In [None]:
query_3 = "How is gauze used in my operation covered?"

response_3 = query_engine.query(query_3)
print("Vanilla:")
print(response_3)

print("With instructions:")
response_3_i = query_engine_instruction.query(query_3)
print(response_3_i)

Vanilla:
Gauze used in your operation is covered as part of the medical items and corresponding charges in a healthcare setting. The gauze falls under the category of items that are provided during medical procedures and treatments, and the charges for these items are included in the overall coverage provided by the insurance policy.
With instructions:
Gauze used in your operation would typically fall under the category of medical supplies or materials required for the surgical procedure. The coverage for gauze used in your operation would be included as part of the expenses incurred on treatment, which are covered under the policy.
