# RAG ICD System

https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval.html

## Libs

In [1]:
%pip install llama-index llama-hub huggingface_hub llama-cpp-python llama-hub PyMuPDF --quiet

Note: you may need to restart the kernel to use updated packages.


## Embeddings

In [2]:
# sentence transformers
from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en")

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from llama_index.llms import LlamaCPP

# model_url = "https://huggingface.co/TheBloke/Llama-2-7B-chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin"
model_url = "https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF/resolve/main/llama-2-7b-chat.Q4_0.gguf"

llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=None,
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},
    verbose=True,
)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /Users/tilmankerl/Library/Caches/llama_index/models/llama-2-7b-chat.Q4_0.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q4_0     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_0     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q4_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weigh

In [4]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(
    llm=llm, embed_model=embed_model
)

## Data

In [5]:
from pathlib import Path
from llama_hub.file.pymu_pdf.base import PyMuPDFReader

In [13]:
# loader = PyMuPDFReader()
# documents = loader.load(file_path="../data/xxx.pdf")
from llama_index import download_loader

SimpleCSVReader = download_loader("SimpleCSVReader")

loader = SimpleCSVReader(encoding="utf-8")
documents = loader.load_data(file=Path('../data/icd11.csv'))

In [8]:
from llama_index.node_parser.text import SentenceSplitter

In [14]:
text_parser = SentenceSplitter(
    chunk_size=1024,
    # separator=" ",
)

In [16]:
text_chunks = []
# maintain relationship with source doc index, to help inject doc metadata in (3)
doc_idxs = []
for doc_idx, doc in enumerate(documents):
    cur_text_chunks = text_parser.split_text(doc.text)
    text_chunks.extend(cur_text_chunks)
    doc_idxs.extend([doc_idx] * len(cur_text_chunks))


In [17]:
from llama_index.schema import TextNode

nodes = []
for idx, text_chunk in enumerate(text_chunks):
    node = TextNode(
        text=text_chunk,
    )
    src_doc = documents[doc_idxs[idx]]
    node.metadata = src_doc.metadata
    nodes.append(node)


In [19]:
from tqdm import tqdm

for node in tqdm(nodes):
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100%|██████████| 2714/2714 [01:58<00:00, 22.97it/s]


In [None]:
!pip install psycopg2-binary pgvector asyncpg "sqlalchemy[asyncio]" greenlet

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.9-cp311-cp311-macosx_11_0_arm64.whl.metadata (4.4 kB)
Collecting pgvector
  Downloading pgvector-0.2.4-py2.py3-none-any.whl.metadata (9.8 kB)
Collecting asyncpg
  Downloading asyncpg-0.29.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (4.4 kB)
Downloading psycopg2_binary-2.9.9-cp311-cp311-macosx_11_0_arm64.whl (2.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading pgvector-0.2.4-py2.py3-none-any.whl (9.6 kB)
Downloading asyncpg-0.29.0-cp311-cp311-macosx_11_0_arm64.whl (638 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m638.7/638.7 kB[0m [31m54.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: psycopg2-binary, pgvector, asyncpg
Successfully installed asyncpg-0.29.0 pgvector-0.2.4 psycopg2-binary-2.9.9


In [None]:
# CREATE ROLE docrag WITH LOGIN PASSWORD 'rag-adl-llama';
# ALTER ROLE docrag SUPERUSER;

In [None]:
# https://github.com/pgvector/pgvector
# cd /tmp
# git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
# cd pgvector
# make
# make install # may need sudo
# CREATE EXTENSION vector;

In [20]:
import psycopg2

db_name = "vector_db"
host = "localhost"
password = "rag-adl-llama"
port = "5432"
user = "docrag"
# conn = psycopg2.connect(connection_string)
conn = psycopg2.connect(
    dbname="postgres",
    host=host,
    password=password,
    port=port,
    user=user,
)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")


In [21]:
from sqlalchemy import make_url
from llama_index.vector_stores import PGVectorStore

vector_store = PGVectorStore.from_params(
    database=db_name,
    host=host,
    password=password,
    port=port,
    user=user,
    table_name="icd11",
    embed_dim=384,  # openai embedding dimension
)

In [22]:
vector_store.add(nodes)

['d21af6c9-51c1-4d15-8f79-edaf7ad7a8dd',
 'e5501080-0f44-4da3-ae7d-3ec0e50bafb9',
 '55bb5198-0f5f-49cd-beac-3442e9ca6e26',
 '13105d64-374c-4409-a125-9e522a0f0fff',
 '9672ab76-4278-46f4-90f0-092578fa1cd7',
 'b0074e46-ea43-4053-adec-05b02b2771a4',
 'bad46fe8-6a87-4346-856b-27403440f31e',
 '83d0dcc8-c63c-4c34-84cc-befd385708fd',
 'e56c5225-d96a-4434-af5e-77f35d712863',
 'eea570d6-02a0-4558-a7e6-9119ef3c4847',
 '88a16c8a-b681-4887-9e34-0f73abf92225',
 'c0ae9638-e414-47b0-9fc7-7bda8e50eb54',
 '8e112d6c-1a33-4c9c-8e02-2920b76fe539',
 '2e5f6117-b22e-4a34-9fbb-15cc6149c919',
 'cba24e85-2aad-4093-bc0e-9670d966331f',
 'c4004901-e611-47d5-808a-e070c258ff06',
 '77604b7b-1e5d-46b8-b295-a587c0609e94',
 '58b9a715-a73a-4853-81a3-7daa2913b9cb',
 '74345907-3607-4bf4-8936-5323527b0456',
 'fd6f62a8-4ea0-4f66-8cda-747df6c6add9',
 'c71858bd-bd0b-45fe-899b-7e727b934736',
 '3aa495b1-d330-4b8f-bd36-3c361a058b8c',
 '25126d51-5402-4b72-9a9c-d1e70987b3b2',
 '4e4e8fc6-50cb-433d-a1d5-2707a8ade83c',
 '18e3f493-0a51-

In [23]:
query_str = "I have been having a lot of headaches lately"

In [24]:
query_embedding = embed_model.get_query_embedding(query_str)

In [25]:
query_embedding = embed_model.get_query_embedding(query_str)

# construct vector store query
from llama_index.vector_stores import VectorStoreQuery

query_mode = "default"
# query_mode = "sparse"
# query_mode = "hybrid"

vector_store_query = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=2, mode=query_mode
)
# returns a VectorStoreQueryResult
query_result = vector_store.query(vector_store_query)
print(query_result.nodes[0].get_content())

In [26]:
# returns a VectorStoreQueryResult
query_result = vector_store.query(vector_store_query)
print(query_result.nodes[0].get_content())

It resolves spontaneously within three days in the absence of further consumption., Key Not found
Medication-overuse headache, Headache occurring on 15 or more days per month developing as a consequence of regular overuse of acute or symptomatic headache medication (on 10 or more or 15 or more days per month, depending on the medication) for more than three months. It usually, but not invariably, resolves after the overuse is stopped., ['MOH - [Medication-overuse headache]']
Opioid-overuse headache, Medication-overuse headache caused by regular overuse of one or more opioids for more than three months. It usually, but not invariably, resolves after the overuse is stopped., Key Not found
Combination analgesic-overuse headache, Medication-overuse headache caused by regular overuse of one or more analgesic formulations combining drugs of two or more classes, each with analgesic effect or acting as adjuvants, for more than three months. It usually, but not invariably, resolves after the ov

In [27]:
from llama_index.schema import NodeWithScore
from typing import Optional

nodes_with_scores = []
for index, node in enumerate(query_result.nodes):
    score: Optional[float] = None
    if query_result.similarities is not None:
        score = query_result.similarities[index]
    nodes_with_scores.append(NodeWithScore(node=node, score=score))


In [28]:
from llama_index import QueryBundle
from llama_index.retrievers import BaseRetriever
from typing import Any, List


class VectorDBRetriever(BaseRetriever):
    """Retriever over a postgres vector store."""

    def __init__(
        self,
        vector_store: PGVectorStore,
        embed_model: Any,
        query_mode: str = "default",
        similarity_top_k: int = 2,
    ) -> None:
        """Init params."""
        self._vector_store = vector_store
        self._embed_model = embed_model
        self._query_mode = query_mode
        self._similarity_top_k = similarity_top_k
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve."""
        query_embedding = embed_model.get_query_embedding(
            query_bundle.query_str
        )
        vector_store_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            mode=self._query_mode,
        )
        query_result = vector_store.query(vector_store_query)

        nodes_with_scores = []
        for index, node in enumerate(query_result.nodes):
            score: Optional[float] = None
            if query_result.similarities is not None:
                score = query_result.similarities[index]
            nodes_with_scores.append(NodeWithScore(node=node, score=score))

        return nodes_with_scores


In [29]:
retriever = VectorDBRetriever(
    vector_store, embed_model, query_mode="default", similarity_top_k=2
)

In [30]:
from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(
    retriever, service_context=service_context
)

In [31]:
# query_str = "Can you tell me about the key concepts for safety finetuning"
query_str = "I have been having a lot of headaches lately"

response = query_engine.query(query_str)


llama_print_timings:        load time =   12962.43 ms
llama_print_timings:      sample time =      42.81 ms /   256 runs   (    0.17 ms per token,  5980.33 tokens per second)
llama_print_timings: prompt eval time =   26682.89 ms /  2405 tokens (   11.09 ms per token,    90.13 tokens per second)
llama_print_timings:        eval time =   19101.28 ms /   255 runs   (   74.91 ms per token,    13.35 tokens per second)
llama_print_timings:       total time =   46449.16 ms


In [32]:
print(str(response))


Based on the information provided in the code, it seems that you may be experiencing medication overuse headache. This type of headache is caused by regular overuse of acute or symptomatic headache medication for more than three months. It usually, but not invariably, resolves after the overuse is stopped.
It's important to note that headaches can have many causes, including medication overuse, and it's always best to consult with a healthcare professional for proper diagnosis and treatment. They may recommend changes to your medication regimen or other treatments to help manage your headaches.
In the meantime, there are some things you can try at home to help relieve your headaches:
1. Keep a headache diary to track when your headaches occur, how severe they are, and what you were doing before they started. This can help you identify any patterns or triggers.
2. Stay hydrated by drinking plenty of water throughout the day. Dehydration can cause or worsen headaches.
3. Avoid triggers 

In [33]:
print(response.source_nodes[0].get_content())

It resolves spontaneously within three days in the absence of further consumption., Key Not found
Medication-overuse headache, Headache occurring on 15 or more days per month developing as a consequence of regular overuse of acute or symptomatic headache medication (on 10 or more or 15 or more days per month, depending on the medication) for more than three months. It usually, but not invariably, resolves after the overuse is stopped., ['MOH - [Medication-overuse headache]']
Opioid-overuse headache, Medication-overuse headache caused by regular overuse of one or more opioids for more than three months. It usually, but not invariably, resolves after the overuse is stopped., Key Not found
Combination analgesic-overuse headache, Medication-overuse headache caused by regular overuse of one or more analgesic formulations combining drugs of two or more classes, each with analgesic effect or acting as adjuvants, for more than three months. It usually, but not invariably, resolves after the ov

In [39]:
# close postgres connection
conn.close()