## This notebook contains:
* Running Llama2 in the cloud hosted on Replicate
* Using LangChain to ask Llama general questions and follow up questions
* Using LangChain to load PDF docs - (Kenya Law Documents) - and chat about it.
* The end result will be a chatbot that will be able answer questions about the data not publicly available when Llama2 was trained, or about your own data. RAG is one way to prevent LLM's hallucination

Let's start by installing the necessary packages:
- sentence-transformers for text embeddings
- chromadb gives us database capabilities 
- langchain provides necessary RAG tools for this demo

And setting up the Replicate token.

In [None]:
%pip install langchain replicate sentence-transformers chromadb pypdf

Next we call the Llama 2 model from replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).

The model is added in the format: model_name/version


In [2]:
# set up the Replicate API token
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

# Access the environment variable
REPLICATE_API_TOKEN = os.getenv('REPLICATE_API_TOKEN')

print("Getting Replicate API token...")
REPLICATE_API_TOKEN = "r8_9HSSIir2ZgP4D6iPNvE7H8JU4ejBMkC0TssZD" 
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
print("Replicate API token set.")

# initialize the Llama2 model

from langchain.llms import Replicate

print("Initializing Llama2 model...")
llama2_13b = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"
llm = Replicate(
    model=llama2_13b,
    model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens":500}
)
print("Llama2 model initialized.")


Getting Replicate API token...
Replicate API token set.
Initializing Llama2 model...
Llama2 model initialized.


With the model set up, it is now possible to ask some questions.

[`ConversationBufferMemory`](https://python.langchain.com/docs/modules/memory/types/buffer) is used to pass the chat history to the model and give it the capability to handle follow up questions.

In [3]:
# using ConversationBufferMemory to pass memory (chat history) for follow up questions
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

Once this is set up, let us repeat the steps from before and ask the model a simple question.

Then we pass the question and answer back into the model for context along with the follow up question.

Load the documents from the data path

Using Llama 2 to answer questions using documents for context. 
This gives us the ability to update Llama 2's knowledge thus giving it better context without needing to fine-tune. 

In [4]:
from langchain.document_loaders import PyPDFDirectoryLoader

DATA_PATH = 'data'

In [5]:
loader = PyPDFDirectoryLoader(DATA_PATH)
docs = loader.load()

In [6]:
import os

# Check the number of PDF files in the data directory
pdf_files = [f for f in os.listdir(DATA_PATH) if f.endswith('.pdf')]
print(f"Number of PDF files: {len(pdf_files)}")

Number of PDF files: 5


Storing the documents. There are more than 30 vector stores (DBs) supported by LangChain. 
In this case [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma) is used which is light-weight and in memory so it's easy to get started with.

We will also import the HuggingFaceEmbeddings and RecursiveCharacterTextSplitter to assist in storing the documents.

In [7]:
from langchain.vectorstores import Chroma

# embeddings are numerical representations of the question and answer text
from langchain.embeddings import HuggingFaceEmbeddings

# use a common text splitter to split text into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

To store the documents, we will need to split them into chunks using [`RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and create vector representations of these chunks using [`HuggingFaceEmbeddings`](https://www.google.com/search?q=langchain+hugging+face+embeddings&sca_esv=572890011&ei=ARUoZaH4LuumptQP48ah2Ac&oq=langchian+hugg&gs_lp=Egxnd3Mtd2l6LXNlcnAiDmxhbmdjaGlhbiBodWdnKgIIADIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCkjeHlC5Cli5D3ABeAGQAQCYAV6gAb4CqgEBNLgBAcgBAPgBAcICChAAGEcY1gQYsAPiAwQYACBBiAYBkAYI&sclient=gws-wiz-serp) on them before storing them into our vector database. 

In general, you should use larger chuck sizes for highly structured text such as code and smaller size for less structured text. You may need to experiment with different chunk sizes and overlap values to find out the best numbers.

In [8]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(docs)

# create the vector db to store all the split chunks as embeddings
embeddings = HuggingFaceEmbeddings()
vectordb = Chroma.from_documents(
    documents=all_splits,
    embedding=embeddings,
)

  from .autonotebook import tqdm as notebook_tqdm


We then use ` RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to Llama to answer the question.

In [9]:
# use LangChain's RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

System Prompt

In [10]:
system_prompt = "You are a medical doctor trying to investigate and come up with the next investigation plan both lab and imaging for the patient specified and fill gaps in history. Provide additional questions for the history and suitable medical investigation that would lead you to a diagnosis."

First Question

In [11]:
question = "What investigations are relevant for the patient W.F to come up with a diagnosis?"
# Query the model with the question and system prompt
result = qa_chain({"query": question, "system_prompt": system_prompt})
print(result['result'])
# Save the question and answer to the chat history
chat_history = [(question, result["result"])]

  warn_deprecated(


 Based on the information provided, the following investigations may be relevant for the patient W.F to come up with a diagnosis:

1. Complete Blood Count (CBC) and Differential: To assess for signs of infection, inflammation, and blood cell counts.
2. Electrolyte Panel: To evaluate for any electrolyte imbalances that may be contributing to the patient's symptoms.
3. Renal Function Tests: To assess for any renal impairment or damage.
4. Liver Function Tests: To evaluate for any liver dysfunction or injury.
5. HIV Viral Load and CD4 Count: To monitor the patient's HIV status and determine if there is any progression of the disease.
6. Throat Swab Culture: To identify any bacterial or viral infections causing the sore throat.
7. Blood Cultures: To detect any bacteremia or sepsis.
8. Urinalysis: To evaluate for any urinary tract infections or inflammation.
9. Chest X-ray: To rule out any respiratory infections or complications.
10. Erythrocyte Sedimentation Rate (ESR) and C-Reactive Prote

As we did before, let us use the `ConversationalRetrievalChain` package to give the model context of our previous question so we can add follow up questions.

In [24]:
# use ConversationalRetrievalChain to pass chat history for follow up questions
from langchain.chains import ConversationalRetrievalChain
chat_chain = ConversationalRetrievalChain.from_llm(llm, vectordb.as_retriever(), return_source_documents=True)

Follow Up Questions

In [27]:
# Follow-up question
follow_up_question = "What findings do we expect in Lumbar puncture for the patient J.O?"
# Append the follow-up question to the chat history
chat_history.append((follow_up_question, ""))
# Query the model with the follow-up question, chat history, and system prompt
followup_answer = chat_chain({"question": follow_up_question, "chat_history": chat_history, "system_prompt": system_prompt})

print(followup_answer['answer'])  # Changed 'result' to 'answer'

# Append the follow-up answer to the chat history
chat_history.append((follow_up_question, followup_answer['answer']))  # Changed 'result' to 'answer'


 As a helpful assistant, I will do my best to provide an answer based on the given context. However, please note that I'm just an AI and not a medical professional, so my response should not be taken as medical advice.

Based on the information provided, it seems that patient J.O is presenting with symptoms consistent with meningitis, such as stiff neck, headache, photophobia, and nausea. A lumbar puncture (LP) would be a valuable diagnostic tool in evaluating the patient's condition.

The expected findings from an LP include:

1. Cerebrospinal fluid (CSF) analysis: The CSF sampled during the LP can be analyzed for various parameters, including white blood cell count, protein levels, glucose levels, and bacterial cultures. Abnormalities in these values can help support a diagnosis of meningitis.
2. Cell count: An elevated white blood cell count in the CSF may indicate inflammation or infection in the central nervous system.
3. Protein levels: Elevated protein levels in the CSF can sugg

In [None]:
# Follow-up question
follow_up_question = "What additional questions in the history would be relevant for the patient J.O in coming up with a diagnosis?"
# Append the follow-up question to the chat history
chat_history.append((follow_up_question, ""))
# Query the model with the follow-up question, chat history, and system prompt
followup_answer = chat_chain({"question": follow_up_question, "chat_history": chat_history, "system_prompt": system_prompt})

print(followup_answer['answer'])  # Changed 'result' to 'answer'

# Append the follow-up answer to the chat history
chat_history.append((follow_up_question, followup_answer['answer']))  # Changed 'result' to 'answer'
