## Extract Text from PDF

In [1]:
import fitz  # PyMuPDF

# Open the PDF file
pdf_document = "policy-booklet-0923.pdf"
document = fitz.open(pdf_document)

# Extract text from each page
text = ""
for page_num in range(document.page_count):
    page = document.load_page(page_num)
    text += page.get_text()

# Save the text to a file for further processing
with open("extracted_text.txt", "w", encoding="utf-8") as file:
    file.write(text)

print("Text extraction completed successfully.")


Text extraction completed successfully.


## Constructing the Dataset

In [2]:
import pandas as pd

# Expanded dataset structure with 30 query-response pairs
data = {
    "query": [
        "What is covered under Section 1?",
        "How to make a claim?",
        "What is DriveSure?",
        "Who is covered to drive other cars?",
        "What is the cover for windscreen damage?",
        "What is not included in my cover?",
        "Does Churchill have approved repairers?",
        "What is the difference between commuting and business use?",
        "Can I use my car abroad?",
        "Are my electric car’s charging cables covered?",
        "Is my electric car battery covered?",
        "What should I do if I need to claim?",
        "How does Churchill handle repairs?",
        "What is the coverage for fire and theft?",
        "What does Motor Legal Cover include?",
        "What is the Guaranteed Hire Car Plus?",
        "What is a courtesy car?",
        "What happens if my car is written off?",
        "What is covered under Personal Benefits?",
        "What is the Uninsured Driver Promise?",
        "What does Vandalism Promise cover?",
        "How are medical expenses covered?",
        "What is the new car replacement policy?",
        "What is the cover for personal belongings?",
        "What should I do if I’m prosecuted for a motoring offence?",
        "How does Churchill handle motor contract disputes?",
        "What does accidental damage cover?",
        "What should I do if my car keys are lost or damaged?",
        "What is the coverage for misfuelling?",
        "What are the territorial limits of the policy?"
    ],
    "response": [
        "Section 1 covers liability to other people, including injuries and property damage caused by an accident involving your car.",
        "To make a claim, call 0345 878 6261. You'll need your personal details, policy number, car registration number, and a description of the loss or damage.",
        "DriveSure is a telematics insurance product that captures driving data to provide feedback and potentially lower premiums based on driving behavior.",
        "Your certificate of motor insurance will show who has cover to drive other cars. This cover is usually limited to third-party liability only.",
        "The policy covers windscreen damage under Section 5. If you use an approved supplier, the cost of repair or replacement is covered.",
        "The policy does not cover mechanical or electrical failure, wear and tear, damage to tyres caused by braking, punctures, cuts or bursts, and breakdowns.",
        "Yes, Churchill customers have access to a national network of approved repairers who handle all aspects of the repair.",
        "Business use covers driving in connection with a business or employment, while commuting covers driving to and from a permanent place of work.",
        "You can use your car abroad, but cover depends on the policy type and destination. You may need a Green Card.",
        "Yes, home chargers and charging cables for electric cars are covered under Section 2 (Fire and Theft) and Section 4 (Accidental Damage).",
        "Your car’s battery is covered if it’s damaged as a result of an insured incident, regardless of whether it's owned or leased.",
        "If you need to claim, call the relevant number provided and have your personal details, policy number, and a description of the incident ready.",
        "Repairs are handled by approved repairers with a 5-year guarantee, or you can choose your own repairer with prior approval from Churchill.",
        "Fire and theft coverage includes repair or replacement of your car if it's damaged by fire, theft, or attempted theft, up to its market value.",
        "Motor Legal Cover includes legal costs for accidents, motor contract disputes, and motoring offences up to £100,000 if included in your policy.",
        "Guaranteed Hire Car Plus provides a hire car similar to yours if your car is damaged, written off, or stolen, up to 21 days.",
        "A courtesy car is a small hatchback provided temporarily while your car is being repaired by an approved repairer.",
        "If your car is written off, Churchill will settle the claim and take ownership of the car. You must provide the registration document.",
        "Personal Benefits cover new car replacement, personal belongings, medical expenses, and personal accident benefits.",
        "The Uninsured Driver Promise ensures that if an uninsured driver hits you, your No Claim Discount is unaffected and excess is refunded.",
        "The Vandalism Promise covers damage caused by vandalism and does not affect your No Claim Discount. A police report is required.",
        "Medical expenses for injuries from an accident are covered up to specified limits, provided no other policy covers these costs.",
        "New car replacement is offered if your car is stolen or written off within a year (or two years for Comprehensive Plus) of purchase.",
        "Personal belongings in the car are covered for loss or damage due to fire, theft, or accident up to specified limits.",
        "Motor Legal Cover provides legal representation for motoring offences, but does not cover parking, obstruction, or waiting offences.",
        "Churchill handles motor contract disputes involving buying or selling your car, or hiring goods or services for your car.",
        "Accidental damage cover includes repair or replacement of your car if it's accidentally damaged, up to its market value.",
        "If your car keys are lost or damaged, Churchill will cover the cost of repair or replacement, including locksmith charges.",
        "Coverage for misfuelling includes damage caused by using the wrong fuel but does not cover the cost of draining and flushing the fuel.",
        "The territorial limits of the policy include Great Britain, Northern Ireland, the Channel Islands, and the Isle of Man."
    ]
}

# Create DataFrame and save to CSV
df = pd.DataFrame(data)
df.to_csv("dataset.csv", index=False)

print("Dataset created and saved successfully.")


Dataset created and saved successfully.


## Implementing the RAG Model

In [4]:
from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration
import torch

# Load the pretrained model and tokenizer
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq")
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq")

# Example for a single query
query = "What is DriveSure?"
input_dict = tokenizer(query, return_tensors="pt")
docs_dict = retriever(input_ids=input_dict["input_ids"], return_tensors="pt")
generated = model.generate(context_input_ids=docs_dict["context_input_ids"], context_attention_mask=docs_dict["context_attention_mask"])
response = tokenizer.batch_decode(generated, skip_special_tokens=True)

print(response)


  from .autonotebook import tqdm as notebook_tqdm
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizer'.
The tokenizer class you load from this checkpoint is not the

ImportError: 
RagRetriever requires the 🤗 Datasets library but it was not found in your environment. You can install it with:
```
pip install datasets
```
In a notebook or a colab, you can install it by executing a cell with
```
!pip install datasets
```
then restarting your kernel.

Note that if you have a local folder named `datasets` or a local python file named `datasets.py` in your current
working directory, python may try to import this instead of the 🤗 Datasets library. You should rename this folder or
that python file if that's the case. Please note that you may need to restart your runtime after installation.

RagRetriever requires the faiss library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/facebookresearch/faiss/blob/master/INSTALL.md and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.
