# Pharmaceutical LLM RAG

##### In this notebook we are testing the following models to analyze the performance of answering different pharmaceutical question:
 - Asi mini
 - Llama 3(locally)
 - ChatGPT 3.5

For each model a langchain is built to give the models context of different drugs but we are also testing the models without RAG to analyse the performance.

For testing **Giskard** is used to generate from pdfs questions and answers which are going to be compared against the output of the models mentioned above. Giskard used GPT 3.5 to analyse Asi mini and Llama 3, and GPT 4 to analyse GPT 3.5

Afterwards query translations are used for one of the models(Asi mini) to see how it enhances the performance of the given tasks. The following querry translations methods are used:
- Multi query
- RAG-Fusion
- Decomposition
- Step Back
- HyDE

##### Imports & Variables

In [1]:
import os
import requests
import pandas as pd

from operator import itemgetter
from pathlib import Path

from langchain_openai import OpenAIEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain_ollama import OllamaLLM 
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.prompts import ChatPromptTemplate
from langchain.load import dumps, loads
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import FewShotChatMessagePromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

import giskard
from giskard.rag import KnowledgeBase
from giskard.rag import generate_testset
from giskard.rag import evaluate
from giskard.rag import QATestset
from giskard.rag.question_generators import ComplexQuestionsGenerator

from dotenv import load_dotenv
load_dotenv()

ASI_ONE_KEY = os.getenv("ASI_ONE_KEY")

In [2]:
PDF_PATH = "pdfs_100"
VECTORSTORE_PATH = "./chroma_db_100"
TESTSET_PATH = "test-set-100.jsonl"

MODEL = "asi1-mini"
MODEL_LLAMA3 = "llama3"
MODEL_CHATGPT_3_5 = "gpt-3.5-turbo"

SEARCH_KWARGS = 3
NUM_QUESTIONS = 30

### Pre-processing ###

In [3]:
full_documents = []

pdf_folder = Path(PDF_PATH)
pdf_files = list(pdf_folder.glob("*.pdf"))

for filepath in pdf_files:
    try:
        loader = PyPDFLoader(filepath)
        docs = loader.load()
        full_documents.extend(docs)
        print(f"Parsed {filepath} with {len(docs)} chunks.")
    except Exception as e:
        print(f"Error parsing {filepath}: {e}")

Parsed pdfs_100/doc_87.pdf with 16 chunks.
Parsed pdfs_100/doc_52.pdf with 14 chunks.
Parsed pdfs_100/doc_46.pdf with 12 chunks.
Parsed pdfs_100/doc_5.pdf with 21 chunks.
Parsed pdfs_100/doc_90.pdf with 3 chunks.
Parsed pdfs_100/doc_14.pdf with 11 chunks.
Parsed pdfs_100/doc_27.pdf with 5 chunks.
Parsed pdfs_100/doc_11.pdf with 5 chunks.
Parsed pdfs_100/doc_57.pdf with 28 chunks.
Parsed pdfs_100/doc_85.pdf with 3 chunks.
Parsed pdfs_100/doc_66.pdf with 13 chunks.
Parsed pdfs_100/doc_63.pdf with 10 chunks.
Parsed pdfs_100/doc_55.pdf with 3 chunks.
Parsed pdfs_100/doc_8.pdf with 3 chunks.
Parsed pdfs_100/doc_54.pdf with 19 chunks.
Parsed pdfs_100/doc_64.pdf with 23 chunks.
Parsed pdfs_100/doc_68.pdf with 15 chunks.
Parsed pdfs_100/doc_34.pdf with 3 chunks.
Parsed pdfs_100/doc_82.pdf with 4 chunks.
Parsed pdfs_100/doc_58.pdf with 20 chunks.
Parsed pdfs_100/doc_98.pdf with 3 chunks.
Parsed pdfs_100/doc_74.pdf with 3 chunks.
Parsed pdfs_100/doc_15.pdf with 5 chunks.
Parsed pdfs_100/doc_38.p

In [4]:
len(full_documents)

1337

In [6]:
full_documents[0].page_content



### Split

In [7]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

splits = text_splitter.split_documents(full_documents)

In [8]:
len(splits)

3483

### Index with Retrieval

In [9]:
### Create vectorstore only if needed
create = False

if create:
    print("Creating vectorstore")
    vectorstore = Chroma.from_documents(documents=splits, 
                                        embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
                                        persist_directory=VECTORSTORE_PATH
    )
    vectorstore.persist()
else:
    print("Using existing vector store")
    embedding_function = OpenAIEmbeddings(model="text-embedding-3-small")
    vectorstore = Chroma(
        persist_directory=VECTORSTORE_PATH,
        embedding_function=embedding_function
    )

Using existing vector store


  vectorstore = Chroma(


In [10]:
retriever = vectorstore.as_retriever(search_kwargs={"k": SEARCH_KWARGS})

### Models

In [10]:
template = """
    Answer the given question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [11]:
template_rag = """
Answer the question only based on the context below use only the given knowledge from the context do not invent or add details that you were treained. 
If you dont know the answer say that you dont know.

Context: {context}

Question: {question}
"""
prompt_rag = ChatPromptTemplate.from_template(template_rag)

#### Asi One Mini RAG

In [91]:
def call_asi_one(prompt):
    if hasattr(prompt, "to_string"):
        prompt = prompt.to_string()

    url = "https://api.asi1.ai/v1/chat/completions"
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {ASI_ONE_KEY}'
    }
    payload = {
        "model": MODEL,
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.2,
        "max_tokens": 1000
    }
    response = requests.post(url, headers=headers, json=payload)
    return response.json().get("choices", [{}])[0].get("message", {}).get("content", "No response")

In [92]:
chain_asi_rag = ({
    "context": itemgetter("question") | vectorstore.as_retriever(),
    "question": itemgetter("question"),
}
| prompt_rag
| call_asi_one
| StrOutputParser()
)

#### Asi One Mini


In [38]:
chain_asi = ({
    "question": itemgetter("question"),
}
| prompt
| call_asi_one
| StrOutputParser()
)

#### Llama 3 RAG


In [72]:
model_llama = OllamaLLM(model=MODEL_LLAMA3)

chain_llama3_rag = (
    {
        "context": itemgetter("question") | vectorstore.as_retriever(),
        "question": itemgetter("question"),
    }
    | prompt_rag
    | model_llama
    | StrOutputParser()
)

#### LLama 3

In [73]:
chain_llama3 = ({
    "question": itemgetter("question"),
}
| prompt
| model_llama
| StrOutputParser()
)

#### Gpt 3.5 RAG

In [41]:
model_gpt_3_5 = ChatOpenAI(model=MODEL_CHATGPT_3_5)

chain_gpt_3_5_rag = (
    {
        "context": itemgetter("question") | vectorstore.as_retriever(),
        "question": itemgetter("question"),
    }
    | prompt_rag
    | model_gpt_3_5
    | StrOutputParser()
)

#### Gpt 3.5

In [48]:
chain_gpt_3_5 = ({
    "question": itemgetter("question"),
}
| prompt
| model_gpt_3_5
| StrOutputParser()
)

### Testing models (Giskard)

#### Preparint testsets

In [14]:
giskard.llm.set_llm_model("gpt-3.5-turbo")
giskard.llm.set_embedding_model("text-embedding-3-small")

In [15]:
len(full_documents)

1337

In [106]:
df1 = pd.DataFrame([d.page_content for d in full_documents[:100]], columns=["text"])
knowledge_base1 = KnowledgeBase(df1)

In [107]:
df1

Unnamed: 0,text
0,DAPSONE- dapsone gel \n \nMayne Pharma\n------...
1,5 WARNINGS AND PRECAUTIONS\n5.1 Methemoglobine...
2,"If there is no improvement after 12 weeks, tre..."
3,dapsone treatment. No events of peripheral neu...
4,vehicle controlled trials are presented in Tab...
...,...
95,vascular disease and/or renal disease should b...
96,"Hydrochlorothiazide, a sulfonamide, can cause ..."
97,Surgery/Anesthesia:\nIn patients undergoing su...
98,Laboratory Tests\nThe hydrochlorothiazide comp...


In [37]:
df2 = pd.DataFrame([d.page_content for d in full_documents[25:50]], columns=["text"])
knowledge_base2 = KnowledgeBase(df2)

In [38]:
df3 = pd.DataFrame([d.page_content for d in full_documents[:10]], columns=["text"])
knowledge_base3 = KnowledgeBase(df3)

In [108]:
generate_tests = True

if generate_tests:
    print("Create new test set")
    testset = generate_testset(
        knowledge_base=knowledge_base1,
        num_questions=15,
        question_generators = ComplexQuestionsGenerator(),
        agent_description="A chatbot answering questions about medicine drugs based on a given context",
    )

    testset.save("test-set-1.jsonl")
else:
    print("Using existing test set")
    testset = QATestset.load("test-set-1.jsonl")

Create new test set
2025-06-16 14:23:22,262 pid:77331 MainThread giskard.rag  INFO     Finding topics in the knowledge base.




2025-06-16 14:23:32,706 pid:77331 MainThread giskard.rag  INFO     Found 7 topics in the knowledge base.


Generating questions:   0%|          | 0/15 [00:00<?, ?it/s]

In [None]:
generate_tests_gpt_4 = False

if generate_tests_gpt_4:
    print("Create new test set")
    giskard.llm.set_llm_model("gpt-4o")
    testset_gpt4o = generate_testset(
        knowledge_base=knowledge_base,
        num_questions=NUM_QUESTIONS,
        question_generators = ComplexQuestionsGenerator(),
        agent_description="A chatbot answering questions about medicine drugs based on a given context",
    )

    testset_gpt4o.save("test-set-gpt-4o.jsonl")
else:
    print("Using existing test set")
    testset = QATestset.load("test-set-gpt-4o.jsonl")

Create new test set


Generating questions:   0%|          | 0/110 [00:00<?, ?it/s]

#### Asi mini RAG

In [109]:
def answer_fn_asi_rag(question, history=None):
    return chain_asi_rag.invoke({"question": question})

In [110]:
report = evaluate(answer_fn_asi_rag, testset=testset)
display(report)

Asking questions to the agent:   0%|          | 0/15 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/15 [00:00<?, ?it/s]

In [37]:
report.to_html("report-asi-mini-rag-100_1.html")

In [111]:
failures = report.get_failures()
failures

Unnamed: 0_level_0,question,reference_answer,reference_context,conversation_history,metadata,agent_answer,correctness,correctness_reason
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6b6098f9-7f17-4a5d-b744-a9b905d92242,What are the potential adverse reactions most ...,The most common adverse reactions associated w...,Document 42: LOPRESSOR- metoprolol tartrate ta...,[],"{'question_type': 'complex', 'seed_document_id...",According to the prescribing information provi...,False,The agent provided a detailed list of adverse ...
936fa8dd-ddcf-48d8-8180-1feb75f70e27,What are the inactive ingredients present in t...,The inactive ingredients in the diluting solut...,Document 28: Inactive Ingredients\nIngredient ...,[],"{'question_type': 'complex', 'seed_document_id...",The inactive ingredients in the diluting solut...,False,The agent did not mention lactose as one of th...
9d65a570-7d66-4a28-a283-ee1c7e23179f,What are some of the metabolic disturbances th...,Hydrochlorothiazide may alter glucose toleranc...,"Document 96: Hydrochlorothiazide, a sulfonamid...",[],"{'question_type': 'complex', 'seed_document_id...","Based on the provided context, hydrochlorothia...",False,The agent provided additional information abou...
b7990c46-bc99-4f6e-8539-0e6b5e9605fa,What specific health conditions or situations ...,You should consult a doctor before using this ...,"Document 67: 2 days, is accompanied or followe...",[],"{'question_type': 'complex', 'seed_document_id...","According to the context provided, you should ...",False,The agent provided additional information that...
ffe1b4b9-34df-48b9-b432-617209cf91ff,What are the inactive ingredients present in t...,The inactive ingredients in the 100mg metoprol...,Document 60: Product Type\nHUMAN PRESCRIPTION ...,[],"{'question_type': 'complex', 'seed_document_id...","Based on the provided context, the inactive in...",False,The agent missed some inactive ingredients pre...


In [84]:
failures.iloc[40].agent_answer

'No response'

In [36]:
report.correctness_by_question_type()

Unnamed: 0_level_0,correctness
question_type,Unnamed: 1_level_1
complex,0.318182


#### Asi Mini

In [39]:
def answer_fn_asi(question, history=None):
    return chain_asi.invoke({"question": question})

In [40]:
report = evaluate(answer_fn_asi, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/110 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/110 [00:00<?, ?it/s]

#### Llama 3 RAG

In [69]:
def answer_fn_llama3_rag(question, history=None):
    return chain_llama3_rag.invoke({"question": question})

In [70]:
report = evaluate(answer_fn_llama3_rag, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

#### Llama 3

In [74]:
def answer_fn_llama3(question, history=None):
    return chain_llama3.invoke({"question": question})

In [75]:
report = evaluate(answer_fn_llama3, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

#### Gpt 3.5 RAG

In [43]:
def answer_fn_chain_gpt_3_5_rag(question, history=None):
    return chain_gpt_3_5_rag.invoke({"question": question})

In [44]:
report = evaluate(answer_fn_chain_gpt_3_5_rag, testset=testset_gpt4o, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/110 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/110 [00:00<?, ?it/s]

In [45]:
report.to_html("report-gpt-3-5-rag-100.html")

#### Gpt 3.5

In [49]:
def answer_fn_chain_gpt_3_5(question, history=None):
    return chain_gpt_3_5.invoke({"question": question})

In [50]:
report = evaluate(answer_fn_chain_gpt_3_5, testset=testset_gpt4o, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/110 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/110 [00:00<?, ?it/s]

In [51]:
report.to_html("report-gpt-3-5-100.html")

## Asi Mini with query translations

#### Multi query

In [90]:
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}
"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_perspectives 
    | call_asi_one 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

def get_unique_union(documents: list[list]):
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

retrieval_chain = generate_queries | retriever.map() | get_unique_union

chain_multiquery = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt_rag
    | call_asi_one
    | StrOutputParser()
)

In [91]:
def answer_fn_chain_multiquery(question, history=None):
    return chain_multiquery.invoke({"question": question})

In [92]:
report = evaluate(answer_fn_chain_multiquery, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

  return [loads(doc) for doc in unique_docs]


CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

#### RAG-Fusion

In [95]:
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_rag_fusion 
    | call_asi_one
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

def reciprocal_rank_fusion(results: list[list], k=60):
    fused_scores = {}

    for docs in results:
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            previous_score = fused_scores[doc_str]
            fused_scores[doc_str] += 1 / (rank + k)

    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion

chain_rag_fusion = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt_rag
    | call_asi_one
    | StrOutputParser()
)

In [96]:
def answer_fn_chain_rag_fusion(question, history=None):
    return chain_rag_fusion.invoke({"question": question})

In [97]:
report = evaluate(answer_fn_chain_rag_fusion, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

#### Decomposition

In [99]:
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

def format_qa_pair(question, answer):
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

generate_queries_decomposition = (
    prompt_decomposition 
    | call_asi_one 
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

def answer_fn_chain_decomposition(question, history=None):
    questions = generate_queries_decomposition.invoke({"question": question})
    
    q_a_pairs = ""
    
    for q in questions:
        rag_chain = (
            {
                "context": itemgetter("question") | retriever,
                "question": itemgetter("question"),
                "q_a_pairs": itemgetter("q_a_pairs")
            }
            | decomposition_prompt
            | call_asi_one
            | StrOutputParser()
        )
        
        answer = rag_chain.invoke({"question": q, "q_a_pairs": q_a_pairs})
        q_a_pair = format_qa_pair(q, answer)
        q_a_pairs = q_a_pairs + "\n---\n" + q_a_pair
    
    chain_decomposition = (
        {
            "context": itemgetter("question") | retriever,
            "question": itemgetter("question"),
            "q_a_pairs": itemgetter("q_a_pairs")
        }
        | decomposition_prompt
        | call_asi_one
        | StrOutputParser()
    )
    
    return chain_decomposition.invoke({"question": question, "q_a_pairs": q_a_pairs})

In [100]:
report = evaluate(answer_fn_chain_decomposition, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

#### Step Back

In [107]:
examples = [
    {
        "input": "Can I take ibuprofen if I’m pregnant?",
        "output": "What are the risks of taking ibuprofen during pregnancy?",
    },
    {
        "input": "Will Paracetamol interact with my blood pressure medication?",
        "output": "What drug interactions involve Paracetamol?",
    }]

example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}"),
])
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

step_back_template = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in medicine drugs. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:"),
    few_shot_prompt,
    ("user", "{question}"),
])
generate_queries_step_back = step_back_template | call_asi_one | StrOutputParser()

response_prompt_template = """You are an expert in medicine drugs. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

def answer_fn_chain_step_back(question):
    chain_step_back = (
        {
            "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
            "step_back_context": generate_queries_step_back | retriever,
            "question": lambda x: x["question"],
        }
        | response_prompt
        | call_asi_one
        | StrOutputParser()
    )
    return chain_step_back.invoke({"question": question})

In [108]:
report = evaluate(answer_fn_chain_step_back, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]

#### HyDE

In [111]:
hyde_template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(hyde_template)

generate_docs_for_retrieval = (
    prompt_hyde | call_asi_one | StrOutputParser()
)

chain_hyde = (
    prompt_rag
    | call_asi_one
    | StrOutputParser()
)

def answer_fn_chain_hyde(question):
    pseudo_doc = generate_docs_for_retrieval.invoke({"question": question})
    
    retrieved_docs = retriever.invoke(pseudo_doc)
    
    return chain_hyde.invoke({
        "context": retrieved_docs,
        "question": question
    })

In [112]:
report = evaluate(answer_fn_chain_hyde, testset=testset, knowledge_base=knowledge_base)
display(report)

Asking questions to the agent:   0%|          | 0/2 [00:00<?, ?it/s]

CorrectnessMetric evaluation:   0%|          | 0/2 [00:00<?, ?it/s]