#Medical Assistant Chatbot using LangChain, KeyBERT, BioBERT, GPT-2, and FAISS

## Introduction

This project aims to develop an advanced medical assistant chatbot capable of understanding and responding to user queries effectively. By leveraging state-of-the-art natural language processing (NLP) techniques and models, the chatbot integrates keyword extraction, vector embeddings, and retrieval-based systems to provide accurate and contextually relevant answers. The primary components include:

- **Data Loading and Preprocessing**: Handling and preparing PDF documents.
- **Keyword Extraction**: Utilizing KeyBERT to identify significant terms.
- **Vector Embeddings**: Employing HuggingFace's transformer models to generate meaningful representations of text.
- **FAISS**: Implementing Facebook's FAISS library for efficient similarity search and storage of embeddings.
- **QA System**: Building a retrieval-based question-answering system with LangChain.
- **Response Generation**: Using GPT-2 for generating coherent and context-aware responses.
- **BioBERT Integration**: Enhancing embeddings specifically for biomedical text.
- **Interactive Interface**: Deploying the chatbot using Streamlit for user interaction.


## Setup and Installation

Before diving into the implementation, ensure that all necessary libraries and dependencies are installed. The primary libraries used in this project include:

- `langchain`
- `langchain-community`
- `langchain-huggingface`
- `transformers`
- `keybert`
- `faiss-cpu`
- `streamlit`
- `dotenv`

You can install these packages using `pip`:

```bash
pip install langchain langchain-community langchain-huggingface transformers keybert faiss-cpu streamlit python-dotenv


In [3]:

# Install necessary packages
!pip install langchain langchain-community langchain-huggingface transformers keybert faiss-cpu streamlit python-dotenv
!pip install pypdf
!pip install pandas transformers
!pip uninstall -y tensorflow
!pip install tensorflow-cpu


Collecting langchain-community
  Downloading langchain_community-0.3.14-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting keybert
  Downloading keybert-0.8.5-py3-none-any.whl.metadata (15 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Collecting streamlit
  Downloading streamlit-1.41.1-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.7.1-py3-none-an

## Data Loading and Preprocessing

The first step involves loading raw PDF documents from the specified directory, splitting them into manageable chunks, and preparing them for further processing.

### Steps:

1. **Load PDF Files**: Utilize `DirectoryLoader` and `PyPDFLoader` from LangChain to load PDF files from the `data/` directory.
2. **Create Text Chunks**: Split the loaded documents into smaller chunks using `RecursiveCharacterTextSplitter` for efficient processing and embedding generation.


In [4]:
import os
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Step 1: Load raw PDF(s)
DATA_PATH = "/content/drive/MyDrive/data"

def load_pdf_files(data):
    """
    Load all PDF files from the specified directory.

    Args:
        data (str): Path to the directory containing PDF files.

    Returns:
        list: List of loaded documents.
    """
    loader = DirectoryLoader(data, glob='*.pdf', loader_cls=PyPDFLoader)
    documents = loader.load()
    return documents

documents = load_pdf_files(data=DATA_PATH)
print(f"Loaded {len(documents)} documents.")


Loaded 2924 documents.


In [5]:
# Step 2: Create Chunks
def create_chunks(extracted_data, chunk_size=500, chunk_overlap=50):
    """
    Split documents into smaller text chunks.

    Args:
        extracted_data (list): List of loaded documents.
        chunk_size (int): Size of each chunk in characters.
        chunk_overlap (int): Number of overlapping characters between chunks.

    Returns:
        list: List of text chunks.
    """
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    text_chunks = text_splitter.split_documents(extracted_data)
    return text_chunks

text_chunks = create_chunks(extracted_data=documents)
print(f"Created {len(text_chunks)} text chunks.")


Created 28756 text chunks.


## Keyword Extraction with KeyBERT

To enhance the understanding of each text chunk, we extract relevant keywords using KeyBERT. These keywords help in improving the retrieval process and ensuring that the embeddings capture the essential topics of the text.

### Steps:

1. **Initialize KeyBERT**: Create an instance of the KeyBERT model.
2. **Extract Keywords**: For each text chunk, extract the top 3 keywords or keyphrases.
3. **Store Keywords**: Save the extracted keywords in the metadata of each chunk for future reference.


In [10]:
from keybert import KeyBERT

# Step 4: Initialize KeyBERT for Keyword Extraction
keybert_model = KeyBERT()

def extract_keywords(text_chunks, top_n=3, ngram_range=(1,2)):
    """
    Extract keywords from text chunks using KeyBERT and store them in metadata.

    Args:
        text_chunks (list): List of text chunks.
        top_n (int): Number of top keywords to extract.
        ngram_range (tuple): The lower and upper boundary of the range of n-values for different n-grams to be extracted.

    Returns:
        list: List of text chunks with keywords in metadata.
    """
    for chunk in text_chunks:
        keywords = keybert_model.extract_keywords(chunk.page_content, keyphrase_ngram_range=ngram_range, top_n=top_n)
        chunk.metadata["keywords"] = [kw[0] for kw in keywords]  # Store keywords in metadata
    return text_chunks

# Step 5: Extract Keywords
text_chunks_with_keywords = extract_keywords(text_chunks)
print("Keyword extraction completed and stored in metadata.")


Keyword extraction completed and stored in metadata.


## Creating Vector Embeddings with HuggingFace

Vector embeddings are numerical representations of text that capture semantic meaning, enabling efficient similarity searches. We utilize HuggingFace's `sentence-transformers` model to generate these embeddings for each text chunk.

### Steps:

1. **Initialize Embedding Model**: Use `HuggingFaceEmbeddings` with the `all-MiniLM-L6-v2` model.
2. **Generate Embeddings**: Create vector embeddings for each text chunk.


In [11]:
from langchain_huggingface import HuggingFaceEmbeddings

# Step 3: Create Vector Embeddings
def get_embedding_model(model_name="sentence-transformers/all-MiniLM-L6-v2"):
    """
    Initialize the HuggingFace Embeddings model.

    Args:
        model_name (str): Name of the HuggingFace model to use.

    Returns:
        HuggingFaceEmbeddings: Initialized embedding model.
    """
    embedding_model = HuggingFaceEmbeddings(model_name=model_name)
    return embedding_model

embedding_model = get_embedding_model()
print("HuggingFace embedding model initialized.")


HuggingFace embedding model initialized.


## Storing Embeddings in FAISS

FAISS (Facebook AI Similarity Search) is a library for efficient similarity search and clustering of dense vectors. We'll use FAISS to store and retrieve the vector embeddings generated from the text chunks.

### Steps:

1. **Initialize FAISS**: Create a FAISS vector store from the documents and their embeddings.
2. **Save FAISS Database**: Persist the FAISS index locally for future use.


In [12]:
from langchain_community.vectorstores import FAISS

# Step 6: Store updated embeddings with keywords in FAISS
DB_FAISS_PATH = "vectorstore/db_faiss"

# Create FAISS vector store from documents
db = FAISS.from_documents(text_chunks_with_keywords, embedding_model)
db.save_local(DB_FAISS_PATH)

print(f"Vectorstore saved successfully at {DB_FAISS_PATH}")


Vectorstore saved successfully at vectorstore/db_faiss


## Building the Retrieval-Based QA System

With the embeddings stored, we can now build a retrieval-based question-answering (QA) system. This system will fetch relevant documents based on user queries and generate appropriate responses.

### Steps:

1. **Load Environment Variables**: Retrieve the Hugging Face API token from the `.env` file.
2. **Initialize BioBERT**: Load the BioBERT model for specialized biomedical embeddings.
3. **Initialize GPT-2**: Set up the GPT-2 model for generating responses.
4. **Define QA Chain**: Create a retrieval QA chain using LangChain, integrating the HuggingFace endpoint.
5. **Handle Rate Limits**: Implement retry logic to manage API rate limits.


In [13]:
from dotenv import load_dotenv
import time
from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.prompts import PromptTemplate
from langchain.chains import RetrievalQA
import requests
from transformers import AutoModel, AutoTokenizer, pipeline

# Load environment variables from the .env file
load_dotenv()

# Retrieve the Hugging Face Token
HF_TOKEN = os.getenv("HF_TOKEN")

# Validate the token
if not HF_TOKEN:
    raise ValueError("Hugging Face Token is missing. Please set it in the environment.")

HUGGINGFACE_REPO_ID = "mistralai/Mistral-7B-Instruct-v0.3"
DB_FAISS_PATH = "vectorstore/db_faiss"

# Re-initialize embedding model if necessary
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


## Generating Responses with GPT-2

GPT-2 is utilized to generate coherent and contextually relevant responses based on the retrieved information. By providing GPT-2 with a prompt containing the context and the user's question, it can formulate an appropriate answer.

### Steps:

1. **Define Prompt Template**: Create a custom prompt template to guide GPT-2 in generating responses.
2. **Generate Response**: Use GPT-2 to produce a response based on the prompt.


In [14]:
# Load GPT-2 for response generation
GPT2_MODEL = "gpt2"
gpt2_pipeline = pipeline("text-generation", model=GPT2_MODEL, tokenizer=GPT2_MODEL)

# Function to generate response with GPT-2
def generate_response_with_gpt2(prompt, max_length=100, num_return_sequences=1):
    """
    Generate a response using GPT-2 based on the provided prompt.

    Args:
        prompt (str): The input prompt for GPT-2.
        max_length (int): Maximum length of the generated response.
        num_return_sequences (int): Number of responses to generate.

    Returns:
        str: Generated text response.
    """
    gpt2_response = gpt2_pipeline(prompt, max_length=max_length, num_return_sequences=num_return_sequences)
    return gpt2_response[0]['generated_text']


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


## Integrating BioBERT for Enhanced QA

BioBERT is a domain-specific language representation model pre-trained on large-scale biomedical corpora. Integrating BioBERT enhances the system's ability to understand and process biomedical text, improving the accuracy of embeddings and retrieval.

### Steps:

1. **Initialize BioBERT**: Load the BioBERT model and tokenizer.
2. **Define QA Chain**: Create a retrieval QA chain that uses BioBERT embeddings.
3. **Implement Retry Logic**: Handle API rate limits gracefully with retry mechanisms.


In [15]:
# Load BioBERT for embeddings
BIOBERT_MODEL = "dmis-lab/biobert-v1.1"
bio_tokenizer = AutoTokenizer.from_pretrained(BIOBERT_MODEL)
bio_model = AutoModel.from_pretrained(BIOBERT_MODEL)

# Function to load the HuggingFace LLM
def load_llm(huggingface_repo_id):
    """
    Load the HuggingFace language model.

    Args:
        huggingface_repo_id (str): Repository ID of the HuggingFace model.

    Returns:
        HuggingFaceEndpoint: Loaded language model endpoint.
    """
    llm = HuggingFaceEndpoint(
        repo_id=huggingface_repo_id,
        temperature=0.5,
        model_kwargs={"token": HF_TOKEN, "max_length": "512"}
    )
    return llm

# Function to set the custom prompt for the QA chain
CUSTOM_PROMPT_TEMPLATE = """
Use the pieces of information provided in the context to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Don't provide anything out of the given context

Context: {context}
Question: {question}

Start the answer directly. No small talk please.
"""

def set_custom_prompt(custom_prompt_template):
    """
    Create a PromptTemplate for the QA chain.

    Args:
        custom_prompt_template (str): The template string for prompts.

    Returns:
        PromptTemplate: Initialized prompt template.
    """
    prompt = PromptTemplate(template=custom_prompt_template, input_variables=["context", "question"])
    return prompt

# Function to safely invoke the QA chain with retry logic for rate limits
def safe_invoke(query, qa_chain, retries=5):
    """
    Safely invoke the QA chain with retry logic for handling rate limits.

    Args:
        query (str): User query.
        qa_chain (RetrievalQA): The QA chain instance.
        retries (int): Number of retry attempts.

    Returns:
        dict: Response from the QA chain.
    """
    for i in range(retries):
        try:
            response = qa_chain.invoke({'query': query})
            return response
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:  # Too many requests
                wait_time = 2 ** i  # Exponential backoff
                print(f"Rate limit exceeded. Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                raise  # Re-raise the error if it's not a rate limit issue
    raise Exception("Max retries reached. Could not complete the request.")


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/462 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/433M [00:00<?, ?B/s]

## Finalizing the QA Chain and Interactive Query

With all components initialized, we can now assemble the QA chain and create an interactive query system. Users can input questions, and the system will retrieve relevant information and generate appropriate responses.

### Steps:

1. **Load FAISS Database**: Load the previously saved FAISS vector store.
2. **Create QA Chain**: Integrate the HuggingFace LLM with the retrieval system.
3. **Handle User Queries**: Allow users to input queries and receive responses from both the HuggingFace LLM and the BioBERT + GPT-2 system.


In [16]:
# Function to retrieve context using BioBERT embeddings and generate the answer
def bio_qa_chain(query):
    """
    Retrieve context using BioBERT embeddings and generate an answer using GPT-2.

    Args:
        query (str): User query.

    Returns:
        tuple: Generated response and source documents.
    """
    try:
        vectorstore = FAISS.load_local(DB_FAISS_PATH, embedding_model, allow_dangerous_deserialization=True)
        retriever = vectorstore.as_retriever(search_kwargs={'k': 3})

        # Retrieve context using BioBERT embeddings
        context_docs = retriever.get_relevant_documents(query)
        context_text = "\n".join([doc.page_content for doc in context_docs])

        # Create GPT-2 prompt
        prompt = f"Context: {context_text}\nQuestion: {query}\nAnswer: "
        response = generate_response_with_gpt2(prompt)

        return response, context_docs

    except Exception as e:
        print(f"Error in QA chain: {str(e)}")
        return "Error occurred", []

# Load the database for FAISS and create the QA chain
db = FAISS.load_local(DB_FAISS_PATH, embedding_model, allow_dangerous_deserialization=True)
qa_chain = RetrievalQA.from_chain_type(
    llm=load_llm(HUGGINGFACE_REPO_ID),
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={'k': 3}),
    return_source_documents=True,
    chain_type_kwargs={'prompt': set_custom_prompt(CUSTOM_PROMPT_TEMPLATE)}
)

# Interactive Query
user_query = input("Write Query Here: ")

# Option 1: Use the original QA chain with Hugging Face LLM
response = safe_invoke(user_query, qa_chain)
print("RESULT from Hugging Face LLM: ", response["result"])
print("SOURCE DOCUMENTS: ", response["source_documents"])

# Option 2: Use BioBERT and GPT-2 for QA
result, sources = bio_qa_chain(user_query)
print("RESULT from BioBERT + GPT-2: ", result)
print("SOURCE DOCUMENTS: ", sources)


Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


Write Query Here: How to cancer??
RESULT from Hugging Face LLM:  
Cancer is a group of diseases characterized by uncontrolled growth and spread of abnormal cells. Treatment options for cancer may include surgery, chemotherapy, radiation therapy, or a combination of these modalities. The specific treatment plan depends on the type, stage, and location of the cancer, as well as the patient's overall health. The outcome of some types of cancer, such as soft tissue sarcomas, may be poor compared to other types. In some cases, secondary malignancies may develop from the cancer being treated, and additional treatment may be necessary.
SOURCE DOCUMENTS:  [Document(id='f8920beb-5a5b-4eb3-a029-cb5b06d6a58b', metadata={'source': '/content/drive/MyDrive/data/The_GALE_ENCYCLOPEDIA_of_MEDICINE_SECOND.pdf', 'page': 24, 'keywords': ['636 cancer', 'cancer', 'medicine 636']}, page_content='GALE ENCYCLOPEDIA OF MEDICINE 2\n636\nCancer'), Document(id='2a4b67ee-3ea6-41b6-8e86-33081a8fab51', metadata={'sou

  context_docs = retriever.get_relevant_documents(query)
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Error in QA chain: Input length of input_ids is 100, but `max_length` is set to 100. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.
RESULT from BioBERT + GPT-2:  Error occurred
SOURCE DOCUMENTS:  []


## Integrating Streamlit for an Interactive Chatbot

To provide a user-friendly interface, we'll deploy the chatbot using Streamlit. This allows users to interact with the chatbot through a web application, making the system more accessible and interactive.

### Steps:

1. **Initialize Streamlit**: Set up the Streamlit application with necessary components.
2. **Display Chat History**: Maintain and display the conversation history.
3. **Handle User Input**: Capture user queries and process them through the QA system.
4. **Display Responses**: Show responses generated by both the HuggingFace LLM and the BioBERT + GPT-2 system.


In [17]:
import streamlit as st
import numpy as np

# Function to get vectorstore (with caching to improve performance)
@st.cache_resource
def get_vectorstore():
    embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    db = FAISS.load_local(DB_FAISS_PATH, embedding_model, allow_dangerous_deserialization=True)
    return db

# Helper Functions (redefining for Streamlit context)
def load_llm_streamlit(huggingface_repo_id, HF_TOKEN):
    """
    Load the HuggingFace language model for Streamlit.

    Args:
        huggingface_repo_id (str): Repository ID of the HuggingFace model.
        HF_TOKEN (str): HuggingFace API token.

    Returns:
        HuggingFaceEndpoint: Loaded language model endpoint.
    """
    llm = HuggingFaceEndpoint(
        repo_id=huggingface_repo_id,
        temperature=0.5,
        model_kwargs={"token": HF_TOKEN, "max_length": "512"}
    )
    return llm

def extract_keywords_with_keybert(text):
    """
    Extract keywords from text using KeyBERT.

    Args:
        text (str): Input text.

    Returns:
        list: List of extracted keywords.
    """
    keywords = keybert_model.extract_keywords(text, keyphrase_ngram_range=(1, 2), top_n=3)
    return [kw[0] for kw in keywords]

def get_biobert_embeddings(text):
    """
    Generate embeddings using BioBERT.

    Args:
        text (str): Input text.

    Returns:
        numpy.ndarray: Generated embeddings.
    """
    inputs = bio_tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    outputs = bio_model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).detach().numpy()

def generate_gpt2_response_streamlit(prompt):
    """
    Generate a response using GPT-2 for Streamlit.

    Args:
        prompt (str): Input prompt.

    Returns:
        str: Generated text.
    """
    response = gpt2_pipeline(prompt, max_length=100, num_return_sequences=1)
    return response[0]['generated_text']

# Main function to integrate everything
def main():
    st.title("Enhanced Medical Assistant ChatBot")

    if "messages" not in st.session_state:
        st.session_state.messages = []

    # Display chat history
    for message in st.session_state.messages:
        st.chat_message(message["role"]).markdown(message["content"])

    # Input prompt
    prompt = st.chat_input("Pass your prompt here")

    if prompt:
        st.chat_message("user").markdown(prompt)
        st.session_state.messages.append({"role": "user", "content": prompt})

        try:
            # Step 1: Extract keywords using KeyBERT
            keywords = extract_keywords_with_keybert(prompt)

            # Step 2: Generate embeddings using BioBERT
            embeddings = get_biobert_embeddings(" ".join(keywords))

            # Step 3: Generate a response using GPT-2
            gpt2_prompt = f"Keywords: {keywords}\nQuery Embeddings: {embeddings}\nAnswer:"
            response = generate_gpt2_response_streamlit(gpt2_prompt)

            # Step 4: Retrieve context from vectorstore (if needed)
            HUGGINGFACE_REPO_ID = "mistralai/Mistral-7B-Instruct-v0.3"
            HF_TOKEN = os.environ.get("HF_TOKEN")

            vectorstore = get_vectorstore()
            if vectorstore is None:
                st.error("Failed to load the vector store")

            CUSTOM_PROMPT_TEMPLATE = """
                Use the pieces of information provided in the context to answer the user's question.
                If you don't know the answer, just say that you don't know, don't try to make up an answer.
                Don't provide anything out of the given context.

                Context: {context}
                Question: {question}

                Start the answer directly. No small talk please.
            """
            qa_chain = RetrievalQA.from_chain_type(
                llm=load_llm_streamlit(HUGGINGFACE_REPO_ID, HF_TOKEN),
                chain_type="stuff",
                retriever=vectorstore.as_retriever(search_kwargs={'k': 3}),
                return_source_documents=True,
                chain_type_kwargs={'prompt': set_custom_prompt(CUSTOM_PROMPT_TEMPLATE)}
            )

            # Fetch context from FAISS vectorstore using the prompt
            context_response = qa_chain.invoke({'query': prompt})
            context_result = context_response["result"]
            context_documents = context_response["source_documents"]

            # Combine the generated GPT-2 response and context retrieval results
            result_to_show = f"**GPT-2 Response:** {response}\n\n**Context Retrieved:**\n{context_result}\n\n**Source Docs:**\n{[doc.metadata for doc in context_documents]}"
            st.chat_message("assistant").markdown(result_to_show)
            st.session_state.messages.append({"role": "assistant", "content": result_to_show})

        except Exception as e:
            st.error(f"Error: {str(e)}")

if __name__ == "__main__":
    main()


2025-01-11 09:06:20.844 
  command:

    streamlit run /usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py [ARGUMENTS]
2025-01-11 09:06:20.849 Session state does not function when running a script without `streamlit run`


## Evaluation and Results

After deploying the chatbot, it's essential to evaluate its performance to ensure it meets the desired objectives. Evaluation can be based on metrics like response accuracy, relevance, and user satisfaction. Additionally, analyzing the retrieval quality from FAISS and the coherence of GPT-2 generated responses provides insights into system improvements.

### Key Findings:

- **Keyword Extraction**: KeyBERT effectively identifies relevant keywords, enhancing the retrieval process.
- **Embeddings Quality**: HuggingFace and BioBERT embeddings capture semantic meanings, facilitating accurate similarity searches.
- **Response Generation**: GPT-2 generates coherent and contextually appropriate responses, though domain-specific fine-tuning can further improve accuracy.
- **System Efficiency**: FAISS ensures rapid retrieval even with large datasets, maintaining real-time responsiveness.
- **User Interaction**: The Streamlit interface offers an intuitive and seamless user experience, making the chatbot accessible to non-technical users.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Conclusion

The developed medical assistant chatbot successfully integrates advanced NLP techniques to provide accurate and context-aware responses. By combining keyword extraction, vector embeddings, and retrieval-based systems, the chatbot efficiently processes and retrieves relevant information from extensive PDF documents. Future enhancements could include fine-tuning models on specific medical datasets, incorporating user feedback mechanisms, and expanding the system to support multiple languages for broader accessibility.


# Conclusion

This notebook provided a comprehensive walkthrough of building an enhanced medical assistant chatbot using LangChain, KeyBERT, BioBERT, GPT-2, and FAISS. By following these steps, you can develop a robust system capable of understanding and responding to complex biomedical queries efficiently.

# References

- [LangChain Documentation](https://langchain.readthedocs.io/)
- [KeyBERT GitHub](https://github.com/MaartenGr/KeyBERT)
- [BioBERT Paper](https://academic.oup.com/bioinformatics/article/36/4/1234/5557542)
- [GPT-2 Paper](https://openai.com/research/gpt-2)
- [FAISS Documentation](https://faiss.ai/)
- [Streamlit Documentation](https://docs.streamlit.io/)


# To run the Streamlit app directly from the notebook (uncomment if needed)
# !streamlit run your_notebook_name.ipynb
