***

## Extractive Question Answering with Ollama and Chroma

This tutorial guides you through building an extractive question answering system. Given a local PDF document, the system aims to retrieve relevant passages that answer your questions. It leverages the power of large language models (LLMs) and vector embeddings to achieve this:

* **LLMs:** We'll use Ollama, a library for interacting with LLMs, to generate multiple variations of your question and answer your questions based on retrieved information.
* **Vector Embeddings:** These are numerical representations of text that capture semantic relationships between words. We'll create embeddings for the PDF content using Ollama models, allowing for efficient retrieval of relevant passages.

**Steps:**

1. **Environment Setup and Library Installation:** We'll install the necessary Python libraries for working with text data, interacting with Ollama, and managing vector embeddings.
2. **Loading the PDF Document:** We'll load the PDF document you want to extract information from using a dedicated library.
3. **Vector Embedding Generation:** We'll break down the PDF text into smaller chunks and create numerical embeddings (using Ollama) that capture the meaning of the content.
4. **Building a Vector Database:** We'll store these embeddings in a Chroma vector database for efficient retrieval later.
5. **Multi-Query Retrieval with Ollama:** To enhance the retrieval process, we'll use an LLM to generate multiple versions of your question, aiming to capture different aspects of your intent. These variations will then be used to search the vector database for potentially relevant passages.
6. **Answering User Queries:** Finally, we'll leverage an LLM to answer your questions based on the retrieved passages from the PDF document.

**Implementation Details:**

In the following sections, we'll delve into the code for each step, explaining the purpose of each line and how it contributes to building the overall system. Feel free to follow along and experiment with the code to customize it for your specific needs!

***


## Ollama: A User-Friendly Library for Local Large Language Models (LLMs)

Ollama is a Python library that makes it easier to work with powerful large language models (LLMs) directly on your own computer.  It offers a streamlined process for running and managing these models within your local environment.

**Key Features of Ollama:**

* **Local Execution:** Run LLMs directly on your machine enhancing data privacy and reducing reliance on cloud services. This also means you can use models offline.
* **Flexible Model Handling:** Ollama supports a range of open-source LLM models. The library makes it easy to switch between models for your projects.
* **Simple API:** Ollama's clear interface lets you load, run, and manage models with minimal code. 
* **Extensibility:** The library is designed to accommodate additional models and functionalities in the future.

**Typical Use Cases of Ollama:**

* **Text Generation:** Create various forms of text like poems, code, scripts, etc.
* **Text Summarization:** Condense long text passages into concise summaries.
* **Question Answering:** Provide answers based on a knowledge base.
* **Machine Translation:** Translate text between languages
* **Research and Experimentation:** Explore LLM capabilities and applications.

**Getting Started with Ollama:**

1. **Visit the Ollama website:** Go to https://ollama.com/
2. **Download and Installation:**  Find the download section and follow the specific instructions for your operating system (Windows, Linux, macOS). 

**Advantages of Using Ollama:**

* **Reduced Costs:** Local execution can be more cost-effective than cloud-based LLM services.
* **Offline Capabilities:** Work with LLMs without an internet connection.
* **Greater Control:** Increased control over the model's environment and data processing for customization and security.

**Additional Resources:**

* **Ollama GitHub Repository:** [https://github.com/ollama/ollama](https://github.com/ollama/ollama)



## Step 1: Environment Setup and Library Installation

In this first step, we'll get our environment ready and install the necessary libraries. The main tools we'll be using are:

* **unstructured and langchain:** For working with unstructured text data (like PDFs) and building language processing pipelines.
* **chromadb:** For creating and managing the vector database to store text embeddings.
* **langchain-text-splitters:** For dividing long text (from the PDF) into smaller chunks for efficient processing.
* **ollama:** For interacting with large language models.

In [1]:
"""
%pip install --q unstructured langchain
%pip install --q "unstructured[all-docs]"
%pip install --q chromadb
%pip install --q langchain-text-splitters
"""

'\n%pip install --q unstructured langchain\n%pip install --q "unstructured[all-docs]"\n%pip install --q chromadb\n%pip install --q langchain-text-splitters\n'

In [2]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

## Step 2: Loading the PDF Document 

In this step, we'll load the PDF document that we want to extract information from.  We'll use a dedicated library that's part of the `unstructured` package we installed earlier.



In [3]:
local_path = "./data/paper.pdf"

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

In [4]:
# Preview first page
data[0].page_content



## Step 3: Vector Embedding Generation

Now we'll transform the text content of the PDF into vector embeddings. These numerical representations capture the semantic meaning of the text, allowing us to efficiently find relevant passages when answering questions.



In [32]:
## Download the Ollama embedding model
!ollama pull nomic-embed-text 

[?25lpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest â ¦ [?25h[?25l[2K[1Gpulling manifest â § [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 274 MB                         
pulling c71d239df917... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  11 KB                         
pulling ce4a164fc046... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�   17 B                         
pulling 31df23ea7daa... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  420 B                         
verifying sha256 digest â ‹ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 274 MB              

In [23]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED      
nomic-embed-text:latest	0a109f422b47	274 MB	2 minutes ago	


In [7]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [8]:
# Split text into smaller chunks for processing (needs try and error)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [9]:
# Generate embeddings and add them to the vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:13<00:00,  2.78s/it]


## Step 4: Multi-Query Retrieval with Ollama

To make the question-answering process more robust, we'll use an LLM to generate multiple variations of the user's question. This helps capture different ways of expressing the same intent, improving our chances of finding relevant passages in the PDF.



In [10]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [11]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [24]:
# Downloading the 'phi3' LLM from Ollama 
!ollama pull phi3

[?25lpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ¦ [?25h[?25l[2K[1Gpulling manifest â ¦ [?25h[?25l[2K[1Gpulling manifest â ‡ [?25h[?25l[2K[1Gpulling manifest â � [?25h[?25l[2K[1Gpulling manifest â � [?25h[?25l[2K[1Gpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest â ¦ [?25h[?25l[2K[1Gpulling manifest â § [?25h[?25l[2K[1Gpulling manifest 
pulling 4fed7364ee3e...   0% â–•                â–�    0 B/2.3 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 4fed7364ee3e...   0% â–•                â–�    0 B/2.3 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 4fed736

In [25]:
# Load the LLM for query generation

local_model = "phi3"
llm = ChatOllama(model=local_model)

In [26]:
# Setup the retrieval process

retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

### Chain Creation:

We build a chain of processes using Langchain:

* **Retrieval:** The input starts with `{"context": retriever, "question": RunnablePassthrough()}`. The `retriever` (from Step 4) finds relevant passages, and `RunnablePassthrough` passes the user's question along.

* **Context and Question to LLM:** The retrieved context and the original question are fed to the LLM using the `prompt` we defined.

* **Answer Generation:** The LLM processes this information to generate an answer.

* **Output Parsing:** The `StrOutputParser` ensures the output is formatted as a simple text string.


In [27]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [28]:
#chain.invoke(input(""))


In [29]:
chain.invoke("Tell me about CTG?")


OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.87s/it]
OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.11s/it]
OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.10s/it]
OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.11s/it]
OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  

"Continuous Fetal Heart Rate (FHR) monitoring, often represented by a Cardiotocography (CTG), is an essential tool in prenatal care that records the fetal heart rate and uterine contractions during labor. The purpose of CTG analysis includes identifying signs of fetal distress or well-beiting, allowing healthcare providers to make timely decisions regarding interventions such as delivery method (e.g., vaginal birth vs. cesarean section).\n\nA CTG trace consists of a graph with two lines: one for the FHR and another for uterine contractions. The interpretation of these traces involves analyzing both baseline patterns and specific features called 'modes.' Some key modes include:\n\n1. Baseline rate: This is the average heart rate range in beats per minute (bpm) during a 10-minute window. Normal ranges are between 120 to 160 bpm. Abnormal rates, such as bradycardia (<120 bpm) or tachycardia (>160 bpm), may indicate fetal distress.\n\n2. Baseline variability: This reflects the fluctuation 

In [35]:
 #Delete all collections in the db
vector_db.delete_collection()