# **Library Installation**

This section installs all the required libraries, such as langchain, transformers, chromadb, and others. These libraries are essential for working with LangChain, Hugging Face models, and document processing.

In [None]:
!pip install langchain langchain_community
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install  bitsandbytes
!pip sentence-transformers==2.2.2
!pip -q install chromadb tiktoken
!pip transformers

In [None]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-5.1.0-py3-none-any.whl.metadata (7.2 kB)
Downloading pypdf-5.1.0-py3-none-any.whl (297 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/298.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m298.0/298.0 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.1.0


# **PDF File Loading**

This function uses DirectoryLoader and PyPDFLoader to load PDF files from a specified directory. It returns the documents extracted from the PDF files.

In [None]:
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [None]:
def load_pdf_file(data):
    loader= DirectoryLoader(data,
                            glob="*.pdf",
                            loader_cls=PyPDFLoader)

    documents=loader.load()

    return documents

In [None]:
extracted_data=load_pdf_file(data='/content/Data')
extracted_data

# **Text Splitting**

This function splits the extracted document text into manageable chunks using RecursiveCharacterTextSplitter. The chunks are 500 characters long with a 20-character overlap.

In [None]:
def text_split(extracted_data):
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
    text_chunks=text_splitter.split_documents(extracted_data)
    return text_chunks

In [None]:
text_chunks=text_split(extracted_data)
print("Length of Text Chunks", len(text_chunks))

Length of Text Chunks 5860


In [None]:
text_chunks[0]

Document(metadata={'source': '/content/Data/Medical_book.pdf', 'page': 1, 'page_label': '2'}, page_content='The GALE\nENCYCLOPEDIA\nof MEDICINE\nSECOND EDITION')

In [None]:
text_chunks[1]

Document(metadata={'source': '/content/Data/Medical_book.pdf', 'page': 2, 'page_label': '3'}, page_content='The GALE\nENCYCLOPEDIA\nof MEDICINE\nSECOND EDITION\nJACQUELINE L. LONGE, EDITOR\nDEIRDRE S. BLANCHFIELD, ASSOCIATE EDITOR\nVOLUME\nA-B\n1')

# **Hugging Face Embeddings**

Downloads and initializes a Hugging Face embedding model, specifically sentence-transformers/all-MiniLM-L6-v2, for embedding queries and documents.


In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

In [None]:
def download_hugging_face_embeddings(model_name):
    embeddings=HuggingFaceEmbeddings(model_name=model_name)
    return embeddings

In [None]:
embedding = download_hugging_face_embeddings('sentence-transformers/all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
query_result = embedding.embed_query("Hi there")
print("Length", len(query_result))

Length 384


# **Chroma Vector Database**

This section creates a vector database using Chroma. The text chunks are embedded and stored for later retrieval. The database is persisted on disk.

In [None]:
from langchain.vectorstores import Chroma

In [None]:
persist_directory = 'db'

vectordb = Chroma.from_documents(documents=text_chunks,
                                 embedding=embedding,
                                 persist_directory=persist_directory)

In [None]:
vectordb.persist()
vectordb = None

  vectordb.persist()


In [None]:
vectordb = Chroma(persist_directory=persist_directory,
                  embedding_function=embedding)

  vectordb = Chroma(persist_directory=persist_directory,


# **Retrieving Relevant Documents**

Here, the vector database retriever is used to fetch documents relevant to a specific query (e.g., "What is Acne?").

In [None]:
retriever = vectordb.as_retriever()

In [None]:
docs = retriever.get_relevant_documents("What is Acne?")

  docs = retriever.get_relevant_documents("What is Acne?")


In [None]:
docs

[Document(metadata={'page': 39, 'page_label': '40', 'source': '/content/Data/Medical_book.pdf'}, page_content='GALE ENCYCLOPEDIA OF MEDICINE 226\nAcne\nGEM - 0001 to 0432 - A  10/22/03 1:41 PM  Page 26'),
 Document(metadata={'page': 38, 'page_label': '39', 'source': '/content/Data/Medical_book.pdf'}, page_content='GALE ENCYCLOPEDIA OF MEDICINE 2 25\nAcne\nAcne vulgaris affecting a woman’s face. Acne is the general\nname given to a skin disorder in which the sebaceous\nglands become inflamed.(Photograph by Biophoto Associ-\nates, Photo Researchers, Inc. Reproduced by permission.)\nGEM - 0001 to 0432 - A  10/22/03 1:41 PM  Page 25'),
 Document(metadata={'page': 37, 'page_label': '38', 'source': '/content/Data/Medical_book.pdf'}, page_content='Acidosis see Respiratory acidosis; Renal\ntubular acidosis; Metabolic acidosis\nAcne\nDefinition\nAcne is a common skin disease characterized by\npimples on the face, chest, and back. It occurs when the\npores of the skin become clogged with oil, de

# **LLM Initialization**

This section loads a T5 model from Hugging Face, creates a text generation pipeline, and wraps it in LangChain’s HuggingFacePipeline.

In [None]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.llms import HuggingFacePipeline
from transformers import T5Tokenizer, T5ForConditionalGeneration, pipeline

In [None]:
# Load the model and tokenizer locally
model_name = "google/flan-t5-large"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Create a text generation pipeline
text_generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=64)

# Wrap the pipeline in a LangChain LLM
llm = HuggingFacePipeline(pipeline=text_generator)

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=text_generator)


# **Conversational Retrieval Chain with Memory**

A conversational retrieval chain is set up with memory using ConversationBufferMemory. It retrieves relevant context, considers conversation history, and uses a custom prompt to answer user queries.

In [None]:
from langchain.chains import ConversationalRetrievalChain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.memory import ConversationBufferMemory

In [None]:
memory = ConversationBufferMemory(
    memory_key="chat_history",  # The key used to store memory in the chain
    return_messages=True        # To include memory in the response
)

  memory = ConversationBufferMemory(


# **Question-Answering with Context**

This section demonstrates the conversational retrieval chain by answering questions like "What is Acromegaly and gigantism?" and tracking conversation history for follow-up questions.

In [None]:
prompt_template = """
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context and the conversation history to answer the current question.
Focus on the most recent relevant information from the conversation history.
If you don't know the answer, say that you don't know.
Use three sentences maximum and keep the answer concise.

Conversation History:
{chat_history}

Retrieved Context:
{context}

Current Question:
{question}
"""

prompt = PromptTemplate(
    input_variables=["chat_history", "context", "question"],
    template=prompt_template,
)

In [None]:
rag_chain_with_memory = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    combine_docs_chain_kwargs={"prompt": prompt},
)

In [None]:
response = rag_chain_with_memory.invoke({"question": "what is Acromegaly?"})
print(response["answer"])

Acromegaly is a disorder in which the abnormal release of a particular chemical from the pituitary gland in the brain causes increased growth in bone and soft tis- sue, as well as a variety of other disturbances throughout the body.


## **Model's Answer:**
Acromegaly is a disorder in which the abnormal release of a particular chemical from the pituitary gland in the brain causes increased growth in bone and soft tissue, as well as a variety of other disturbances throughout the body.

## **Comment:**
The model provides a concise and accurate definition of acromegaly, explaining its cause and effects.

In [None]:
response = rag_chain_with_memory.invoke({"question": "I have Acne, so can you explain it?"})
print(response["answer"])

Acne is the general name given to a skin disorder in which the sebaceous glands become inflamed.


## **Model's Answer:**
Acne is the general name given to a skin disorder in which the sebaceous glands become inflamed.

## **Comment:**
The answer is correct but very basic. It would benefit from including additional details about the causes, triggers, or types of acne to provide a more informative explanation tailored to someone asking about their condition.

In [None]:
response = rag_chain_with_memory.invoke({"question": "What disease do I have from the previous conversation"})
print(response["answer"])

Acne is the general name given to a skin disorder in which the sebaceous glands become inflamed.


## **Model's Answer:**
Acne is the general name given to a skin disorder in which the sebaceous glands become inflamed.

## **Comment:**
The model correctly identifies acne as the discussed condition based on the conversation history. However, the response repeats the same definition without adding new insights.

In [None]:
response = rag_chain_with_memory.invoke({"question": "What is the symptoms of AIDS?"})
print(response["answer"])

The symptoms may include fever, fatigue, muscle aches, loss of appetite, digestive disturbances, weight loss, skin rashes, headache , and chronically swollen lymph nodes (lymphadenopathy).


## **Model's Answer:**
The symptoms may include fever, fatigue, muscle aches, loss of appetite, digestive disturbances, weight loss, skin rashes, headache, and chronically swollen lymph nodes (lymphadenopathy).

## **Comment:**
The response is detailed and covers many of the common symptoms of AIDS. It is accurate and informative, making it a good answer. However, it could be improved by mentioning the progression of symptoms or emphasizing the importance of consulting a medical professional for diagnosis.