Name of the books:
1. Medicine- Harrison's principle of internal medicine
2. Surgery- love and bailey
3. Paediatrics- Ghai essential pediatrics 
4. Dermatology- illustrated synopsis of dermatology and sexually transmitted diseases by Neena Khanna

In [1]:
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import SentenceTransformerEmbeddings

In [2]:
# Function to extract data from the pdf file
def extract_data_from_pdf(data_dir):
    """
    Function to extract data from the pdf file
    
    Args:
    data_dir : str : Path to the directory containing the pdf files
    
    Returns:
    documents : list : List of documents extracted from the pdf files
    """
    
    loader = DirectoryLoader(data_dir, glob = "*.pdf", loader_cls = PyPDFLoader)
    documents = loader.load()
    return documents

In [3]:
# Function to split the text into chunks
def split_text_into_chunks(documents):
    """
    Function to split the text into chunks
    
    Args:
    documents : list : List of documents
    
    Returns:
    chunks : list : List of chunks with text split into chunks
    """
    
    splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 20)
    chunks = splitter.split_documents(documents)
    return chunks

In [4]:
# Function to download the embedding model
def download_embedding_model():
    """
    Function to download the embedding model
    
    Args:
    model_name : str : Name of the model to download
    
    Returns:
    embeddings : object : Embedding model object
    """
    
    embeddings = SentenceTransformerEmbeddings(model_name="NeuML/pubmedbert-base-embeddings")
    return embeddings

In [2]:
from langchain_pinecone.vectorstores import Pinecone
from pinecone import Pinecone as pinecone
from dotenv import load_dotenv
import os

In [6]:
extracted_data = extract_data_from_pdf(data_dir="../data")

In [7]:
text_chunks = split_text_into_chunks(extracted_data)

In [8]:
len(text_chunks)

7020

In [5]:
embeddings = download_embedding_model()

  warn_deprecated(


In [6]:
from langchain_pinecone import PineconeVectorStore

In [7]:
load_dotenv()
PINECONE_API_KEY = "e485b6b8-b6f4-4e95-b3b8-081584058e80"
PINECONE_API_ENV = "us-east-1"
PINECONE_INDEX_NAME = "medicine"

In [16]:
vectorStore = PineconeVectorStore.from_texts([chunk.page_content for chunk in text_chunks], embeddings, index_name=PINECONE_INDEX_NAME)

In [13]:
vectorStore = PineconeVectorStore(index_name=PINECONE_INDEX_NAME, embedding=embeddings)

In [8]:
from langchain import PromptTemplate
from langchain_community.llms import LlamaCpp
from langchain.chains import RetrievalQA
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import SystemMessagePromptTemplate
import os
import json

In [14]:
local_llm = "../models/BioMistral-7B.Q4_K_M.gguf"
llm = LlamaCpp(model_path= 
local_llm,temperature=0.3,max_tokens=4096,top_p=1,n_ctx= 4096, verbose=False)

In [15]:
prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Chat History: {chat_history}
Question: {question}

Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""

In [16]:
retriever = vectorStore.as_retriever(search_kwargs={"k":3})

In [17]:
chat_history = []

In [18]:
 # Create the custom chain
if llm is not None and retriever is not None:   
    chain = ConversationalRetrievalChain.from_llm(llm=llm,retriever=retriever, prom)	
else:     
    print("LLM or Vector Database not initialized")

In [21]:
def predict(message, history):
    history_langchain_format = []
    prompt = PromptTemplate(template=prompt_template,
    input_variables=["chat_history", 'message'])
    response = chain.invoke({"question": message, "chat_history": chat_history})
    print(response.keys())
    answer = response['answer']
    chat_history.append((message, answer))
    return answer

In [20]:
predict("What is diabetes?", chat_history)

' Diabetes is a disease that can affect the way your body uses sugar. There are two types of diabetes. Type 1 diabetes is caused by a problem with the insulin producing cells in the pancreas. Type 2 diabetes is caused by a combination of genetic and lifestyle factors, such as being overweight or obese.'

In [22]:
predict("How can we prevent it?", chat_history)

dict_keys(['question', 'chat_history', 'answer'])


'  Type I is caused by a deficiency of insulin production, while type II is characterized by insulin resistance. Treatment of type I diabetes is limited to insulin replacement, while type II diabetes is treatable by a number of therapeutic approaches. Many cases of insulin resistance are asymptomatic due to normal increases in insulin secretion, and others may be controlled by diet and exercise . Drug therapy may be directed towards hypertension, kidney disease, and diabetes than it is for overweight people with no health problems,” she said. People with diabetes taking insulin are at risk of becoming hypoglycemic if they do not eat appropriate carbohydrates. Also, persons who exercise regularly may experience low energy levels and muscle fatigue from low carbohydrate intake.'

In [33]:
predict("Can you expand, the answer is incomplete", chat_history)

' Yes, I can help with that. The reason why we have to ask for consent is because the patient has a right to know what is going on and they have a right to refuse treatment or they have a right to withdraw from treatment at any time.'

In [24]:
predict("What is cerebral stroke?", chat_history)

dict_keys(['question', 'chat_history', 'answer'])


' Cerebral stroke is a term describing a procedure used to restore blood flow and re-establish circulation in the brain after an obstruction, usually caused by a clot or blockage.'

In [35]:
predict("How can we prevent it?", new_chat_history)

'  There are several ways to prevent cerebral stroke, including controlling high blood pressure, quitting smoking, reducing alcohol intake, eating a healthy diet, and exercising regularly.'

In [36]:
predict("What is the purpose of abdominal ultrasound?", new_chat_history)

' The purpose of an abdominal ultrasound is to examine the abdomen and organs within it, such as the liver, pancreas, spleen, kidneys, and bladder.'