## Ingesting PDF

In [1]:
%pip install --q unstructured langchain
%pip install --q "unstructured[all-docs]"

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [2]:
local_path = "policy-booklet-0923.pdf"

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

In [3]:
# Preview first page
data[0].page_content

'Your car insurance policy booklet\n\nWelcome to Churchill\n\nThis booklet tells you about your car insurance\n\nAbout the policy\n\nThe policy is made up of: > This booklet. > Your car insurance details. > Your certificate (or certificates)\n\nof motor insurance.\n\nIf the policy includes Green Flag breakdown cover: > Your breakdown cover and your car\n\ninsurance are part of the same policy. > The policy also includes the Green Flag\n\npolicy booklet we’ve given you.\n\nIf you have a policy that includes DriveSure: > The policy also includes the DriveSure terms\n\nand conditions we’ve given you.\n\nPlease read all these documents carefully and keep them safe in case you need them.\n\nContents\n\nFAQs\n\n3\n\nGlossary\n\n4\n\nMaking a claim\n\n6\n\nWhat your cover includes\n\n8\n\nSection 1: Liability\n\n11\n\nSection 2: Fire and theft\n\n14\n\nSection 3: Courtesy car\n\n17\n\nSection 4: Accidental damage\n\n18\n\nSection 5: Windscreen damage\n\n20\n\nSection 6: Personal benefits\n\n2

## Vector Embeddings

In [4]:
pip install ollama

Note: you may need to restart the kernel to use updated packages.


In [17]:
!ollama pull mistral

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠦ 

In [5]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ 

In [6]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED      
nomic-embed-text:latest	0a109f422b47	274 MB	7 seconds ago	


In [7]:
%pip install --q chromadb
%pip install --q langchain-text-splitters

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [8]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [9]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [10]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████| 15/15 [04:50<00:00, 19.35s/it]


## Retrieval

In [11]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [18]:
# LLM from Ollama
local_model = "mistral"
llm = ChatOllama(model=local_model)

In [19]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [20]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [21]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [23]:
chain.invoke("What’s not included in my cover?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:05<00:00,  5.52s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.23s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.17s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.30s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.36s/it]


" Here are some things that are not covered under your policy:\n\n1. Losses that aren't directly due to your keys being lost or damaged, such as loss of use or earnings (Section 4: Accidental damage)\n2. Any other losses covered under another section of this policy for the same incident, such as dents to the bodywork (Windscreen damage)\n3. Damage caused by vandalism (this may be covered under Section 4: Accidental Damage)\n4. Reduction in your car’s market value because of lost keys (Windscreen damage)\n5. Losses not directly due to a car accident, such as medical expenses that are already covered by another insurance policy (Personal benefits)\n6. Any losses while the policy is under investigation for an accident caused by an uninsured driver (Uninsured Driver Promise)\n7. Draining, flushing or replacing the fuel if the wrong fuel is put in your car (Misfuelling)\n8. Losses that are not covered within the sections of this policy (Section 6: Personal benefits)"

In [25]:
chain.invoke("how much am i covered for ?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:05<00:00,  5.27s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.16s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.15s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.26s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.19s/it]


" In the Guaranteed Hire Car Plus section, if your car can be repaired and is driveable, coverage begins when your car goes in for repair:\n\n1. If you use their approved repairer, until they have repaired your car.\n2. If you use your own repairer, for up to 21 days in a row while they're repairing your car.\n\nIf your car can be repaired and is not driveable, coverage begins as soon as you've confirmed that they can start the repair:\n\n1. If you use their approved repairer, until they have repaired your car.\n2. If you use your own repairer, for up to 21 days in a row while they're repairing your car.\n\nIf your car is written off or stolen and not recovered, coverage will be provided for the shorter of these two periods:\n\n1. Up to 21 days in a row.\n2. Up to 5 days after their first (or only) payment has been issued to settle your claim.\n\nIn case they cannot provide you with a hire car, they'll repay your travel costs up to £50 per day, up to a total of £500 per claim if:\n\n1.