# Building a RAG application from scratch

Let's start by loading the environment variables we need to use.

In [22]:
import os

For this example, we'll use a simple `StrOutputParser` to extract the answer as a string.

In [23]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

In [24]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="nomic-embed-text")

We want to provide the model with some context and the question. [Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/quick_start) are a simple way to define and reuse prompts.

In [25]:
from langchain.prompts import ChatPromptTemplate

template = """
You are an AI assistant, trained to provide understandable and accurate information about pharmacogenomics and drugs.
You will base your responses on the context and information provided. Output both your answer and a score of how confident you are,
 and also cite the source
If the information related to the question is not in the context and or in the information provided in the prompt, 
you will say 'I don't know'.
You are not a healthcare provider and you will not provide medical care or make assumptions about treatment.

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

Let's start by loading the transcription in memory:

In [26]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import JSONLoader
from langchain_community.document_loaders.csv_loader import CSVLoader

folder_path = "/home/dhanushb/Wellytics/RAG_data/pdf"
docs = []
for filename in os.listdir(folder_path):
    file_path = os.path.join(folder_path, filename)
    loader = PyPDFLoader(file_path)
    doc = loader.load()
    docs.extend(doc)
docs



[Document(page_content='HLA-B*5701:\tabacavir\n2356\n48%\tof\tthe\tHLA-B*5701-positive\tpatients\tdevelop\ta\tsevere\tand\tpotentially\tlife-threatening\thypersensitivity\treaction\tto\tabacavir.\nAbacavir\tis\tcontra-indicated\tfor\tHLA-B*5701-positive\tpatients.\navoid\tabacavir\nLiterature:\n1.\t\nSousa-Pinto\tB\tet\tal.\tPharmacogenetics\tof\tabacavir\thypersensitivity:\ta\tsystematic\treview\tand\tmeta-analysis\tof\tthe\tassociation\twith\tHLA-B*57:01.\tJ\tAllergy\tClin\tImmunol\n2015;136:1092-4.e3.\n2.\t\nTangamornsuksan\tW\tet\tal.\tAssociation\tof\tHLA-B*5701\tgenotypes\tand\tabacavir-induced\thypersensitivity\treaction:\ta\tsystematic\treview\tand\tmeta-analysis.\tJ\tPharm\tPharm\nSci\t2015;18:68-76.\n3.\t\nCargnin\tS\tet\tal.\tDiagnostic\taccuracy\tof\tHLA-B*57:01\tscreening\tfor\tthe\tprediction\tof\tabacavir\thypersensitivity\tand\tclinical\tutility\tof\tthe\ttest:\ta\tmeta-analytic\treview.\nPharmacogenomics\t2014;15:963-76.\n4.\t\nSaag\tM\tet\tal.\tHigh\tsensitivity\tof\t

For our specific application, let's use 1000 characters instead:

In [27]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)
documents

In [28]:
from langchain_community.vectorstores import DocArrayInMemorySearch

#vectorstore = DocArrayInMemorySearch.from_documents(documents, embedding=embeddings)

from langchain_community.vectorstores import Qdrant

#vectorstore = Qdrant.from_documents(documents, embedding=embeddings)

from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents, embedding=embeddings)

In [29]:
retriever = vectorstore.as_retriever()

We can get a retriever directly from the vector store we created before: 

We can create a map with the two inputs by using the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) and [`RunnablePassthrough`](https://python.langchain.com/docs/expression_language/how_to/passthrough) classes. This will allow us to pass the context and question to the prompt as a map with the keys "context" and "question."

In [30]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(context=retriever, question=RunnablePassthrough())

In [35]:
print("Question: I'm Indian, and my family has a history of epilepsy.  My doctor is worried about the seizure event I experienced last week. My results from the genetic test she recommended show that I have an HLA-B*15:02 variation. Will this affect how I use anticonvulsants? What other options are there? Does my ethnicity play a role? Tell me where you found the information.")

from langchain_community.llms import Ollama
MODELS = ["mistral", "gemma", "llama2", "llama3"] # also mixtral
for MODEL in MODELS:
    print("/nmodel: {}/n".format(MODEL))
    model = Ollama(model=MODEL)
    chain = setup | prompt | model | parser
    print(chain.invoke(" I'm Indian, and my family has a history of epilepsy.  My doctor is worried about the seizure event I experienced last week. My results from the genetic test she recommended show that I have an HLA-B*15:02 variation. Will this affect how I use anticonvulsants? What other options are there? Does my ethnicity play a role? Tell me where you found the information."))

Question: I'm Indian, and my family has a history of epilepsy.  My doctor is worried about the seizure event I experienced last week. My results from the genetic test she recommended show that I have an HLA-B*15:02 variation. Will this affect how I use anticonvulsants? What other options are there? Does my ethnicity play a role? Tell me where you found the information.
model: mistral
 The information provided suggests that your HLA-B*15:02 variation might be associated with an increased risk of severe cutaneous adverse reactions to certain antiepileptic drugs (AEDs), such as carbamazepine and phenytoin, in some populations like Thai. A study by Locharernkul et al., published in Epilepsia in 2008, indicates that carbamazepine and phenytoin-induced Stevens-Johnson syndrome (SJS) is associated with the HLA-B*1502 allele in Thai population.

It's essential to note that this association does not necessarily mean you will experience adverse reactions, but it suggests a potential increased ri

In [33]:
print(chain.invoke(" As part of my liver transplant, I take tacrolimus. My doctor recently informed me that I had a high chance of graft rejection and performed a pharmacogenetic test to determine whether my dosage needs to be adjusted. What does it indicate that I have CYP3A5 extensive metabolizer, according to my test results that I received today?"))

 According to the studies provided in your document, having the CYP3A5 extensive metabolizer genotype could potentially influence the pharmacokinetics (how your body processes and responds to tacrolimus), possibly requiring a higher dose of tacrolimus to achieve the desired therapeutic effect while minimizing the risk of toxicity or graft rejection. However, it is essential to discuss these results with your healthcare provider for personalized advice regarding your specific case.


In [34]:
print(chain.invoke(" I'm Indian, and my family has a history of epilepsy.  My doctor is worried about the seizure event I experienced last week. My results from the genetic test she recommended show that I have an HLA-B*15:02 variation. Will this affect how I use anticonvulsants? What other options are there? Does my ethnicity play a role? Tell me where you found the information."))

 The information provided suggests that your HLA-B*15:02 variant may be associated with an increased risk of severe cutaneous adverse reactions to certain antiepileptic drugs, such as phenytoin and carbamazepine. A study in Indian patients (Kesavan et al., 2010) found an influence of CYP2C9 and CYP2C19 genetic polymorphisms on phenytoin-induced neurological toxicity, which may be relevant for your case. Additionally, it is important to note that Thai patients have been reported to have a higher risk of Stevens-Johnson syndrome and toxic epidermal necrolysis induced by carbamazepine and phenytoin, associated with the HLA-B*1502 allele (Locharernkul et al., 2008).

Since your genetic test result shows an HLA-B*15:02 variant, it might be beneficial to consider alternative antiepileptic drugs that are less likely to cause severe cutaneous adverse reactions. For instance, Lamotrigine and Levetiracetam have shown lower risk for these side effects (Kapoor et al., 2016).

Your ethnicity plays 