<a href="https://colab.research.google.com/github/dntwaritag/Medimind_ChatBot/blob/main/Medimind.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Medimind - Medical Information Assistant**

## **Problem Statement**
Patients often struggle to find reliable, understandable information about medications.
This chatbot addresses:
- Medication safety concerns
- Side effect information gaps
- Drug interaction questions

## **Domain Justification**
Healthcare was chosen because:
- High demand for accurate medical info
- Reduces burden on healthcare professionals
- Prevents misinformation risks

In [1]:
# Install required packages
!pip install -q python-dotenv langchain langchain-groq langchain-huggingface langchain-chroma pandas gradio sentence-transformers kaggle unzip datasets langchain_groq

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.3/19.3 MB[0m [31m80.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.2/130.2 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m54.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
!pip install -q kaggle unzip datasets
!pip install -q langchain_groq

In [3]:
# Important libraries
from dotenv import load_dotenv
import os
import pandas as pd
from langchain_groq import ChatGroq

# Langchain
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from transformers.integrations import integration_utils

# # Import embedding model
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import PromptTemplate

# Vector store...
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

import warnings         # Ignore all warnings
warnings.filterwarnings("ignore")

In [4]:
#!/bin/bash
!kaggle datasets download adilmohammed/medical-data

Traceback (most recent call last):
  File "/usr/local/bin/kaggle", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kaggle/cli.py", line 68, in main
    out = args.func(**command_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kaggle/api/kaggle_api_extended.py", line 1741, in dataset_download_cli
    with self.build_kaggle_client() as kaggle:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/kaggle/api/kaggle_api_extended.py", line 688, in build_kaggle_client
    username=self.config_values['username'],
             ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'username'


In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
# Load in the dataset
context = pd.read_csv("/content/drive/MyDrive/Medimind/drugs_side_effects_drugs_com.csv")
context.head()

Unnamed: 0,drug_name,medical_condition,side_effects,generic_name,drug_classes,brand_names,activity,rx_otc,pregnancy_category,csa,alcohol,related_drugs,medical_condition_description,rating,no_of_reviews,drug_link,medical_condition_url
0,doxycycline,Acne,"(hives, difficult breathing, swelling in your ...",doxycycline,"Miscellaneous antimalarials, Tetracyclines","Acticlate, Adoxa CK, Adoxa Pak, Adoxa TT, Alod...",87%,Rx,D,N,X,amoxicillin: https://www.drugs.com/amoxicillin...,Acne Other names: Acne Vulgaris; Blackheads; B...,6.8,760.0,https://www.drugs.com/doxycycline.html,https://www.drugs.com/condition/acne.html
1,spironolactone,Acne,hives ; difficulty breathing; swelling of your...,spironolactone,"Aldosterone receptor antagonists, Potassium-sp...","Aldactone, CaroSpir",82%,Rx,C,N,X,amlodipine: https://www.drugs.com/amlodipine.h...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.2,449.0,https://www.drugs.com/spironolactone.html,https://www.drugs.com/condition/acne.html
2,minocycline,Acne,"skin rash, fever, swollen glands, flu-like sym...",minocycline,Tetracyclines,"Dynacin, Minocin, Minolira, Solodyn, Ximino, V...",48%,Rx,D,N,,amoxicillin: https://www.drugs.com/amoxicillin...,Acne Other names: Acne Vulgaris; Blackheads; B...,5.7,482.0,https://www.drugs.com/minocycline.html,https://www.drugs.com/condition/acne.html
3,Accutane,Acne,problems with your vision or hearing; muscle o...,isotretinoin (oral),"Miscellaneous antineoplastics, Miscellaneous u...",,41%,Rx,X,N,X,doxycycline: https://www.drugs.com/doxycycline...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.9,623.0,https://www.drugs.com/accutane.html,https://www.drugs.com/condition/acne.html
4,clindamycin,Acne,hives ; difficult breathing; swelling of your ...,clindamycin topical,"Topical acne agents, Vaginal anti-infectives","Cleocin T, Clindacin ETZ, Clindacin P, Clindag...",39%,Rx,B,N,,doxycycline: https://www.drugs.com/doxycycline...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.4,146.0,https://www.drugs.com/mtm/clindamycin-topical....,https://www.drugs.com/condition/acne.html


In [9]:
# Remove NaN values from column 'A'
df_cleaned = context.dropna(subset=['side_effects'])

print("\nDataFrame after removing NaN values from column 'A':")
df_cleaned.head(2)


DataFrame after removing NaN values from column 'A':


Unnamed: 0,drug_name,medical_condition,side_effects,generic_name,drug_classes,brand_names,activity,rx_otc,pregnancy_category,csa,alcohol,related_drugs,medical_condition_description,rating,no_of_reviews,drug_link,medical_condition_url
0,doxycycline,Acne,"(hives, difficult breathing, swelling in your ...",doxycycline,"Miscellaneous antimalarials, Tetracyclines","Acticlate, Adoxa CK, Adoxa Pak, Adoxa TT, Alod...",87%,Rx,D,N,X,amoxicillin: https://www.drugs.com/amoxicillin...,Acne Other names: Acne Vulgaris; Blackheads; B...,6.8,760.0,https://www.drugs.com/doxycycline.html,https://www.drugs.com/condition/acne.html
1,spironolactone,Acne,hives ; difficulty breathing; swelling of your...,spironolactone,"Aldosterone receptor antagonists, Potassium-sp...","Aldactone, CaroSpir",82%,Rx,C,N,X,amlodipine: https://www.drugs.com/amlodipine.h...,Acne Other names: Acne Vulgaris; Blackheads; B...,7.2,449.0,https://www.drugs.com/spironolactone.html,https://www.drugs.com/condition/acne.html


In [10]:
from google.colab import userdata
groq_api_key = userdata.get ('GROQ_API_KEY')

os.environ["GROQ_API_KEY"] = groq_api_key

In [11]:
# Embedding model...
embed_model = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")

# Initialize the model
llm_model = ChatGroq(model="llama-3.3-70b-versatile", api_key=os.environ.get("GROQ_API_KEY"))

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/266 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/114k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/677 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/297 [00:00<?, ?B/s]

In [12]:
# Database for saving the documents
vectorstore = Chroma(
    collection_name="medical_dataset_store",
    embedding_function=embed_model,
    persist_directory="./",
)
vectorstore.get().keys()

# Because the vector store is empty... Add your context data.
vectorstore.add_texts(context)

['5113624e-2c70-45b3-889e-71fc4e8f4664',
 '5c3b9e41-8bce-4931-8673-ee2cfea6afbb',
 '9c3c8759-14c9-4a4b-8bb4-5d28d3143800',
 '78728bb0-67d6-485a-9985-c748597a0438',
 '6c1fe857-dc2d-48e4-b990-77beedae2bbe',
 '20ed0abd-6e7b-4137-95e3-6a66f713be3a',
 'feb4f750-4cdf-4535-864f-619d702f547a',
 '4211c3c4-9b0a-4ecf-a6ba-0c5a7cd320b0',
 '43fa062e-4606-4c3a-af17-4cd10aa59281',
 '802a7bca-9a9e-4769-85ec-649590b797bc',
 '28df15c7-6d11-4bf6-94be-a2a2f0838776',
 '2f1ce181-871a-4f1c-8a7c-21877a2602f1',
 '257f1dfb-75c2-4816-970f-0d4a360732a6',
 '4569f718-19ee-4c90-bd68-a510a1e45dfe',
 '86d50729-bab7-4837-b115-2ceadd750c69',
 'e8db3b0a-2d15-4ea2-a35a-039d27477de1',
 'c2121da4-bed9-4e07-b5ff-4a41ff51d39b']

In [13]:
# Load the retriever for fetching the data..
retriever = vectorstore.as_retriever()

In [14]:
template = """
You are a specialized medical assistant with expertise in pharmacology, evidence-based medicine, and medical diagnostics. Your role is to analyze queries, apply logical reasoning, and deliver accurate, reliable, and well-structured responses using the provided "{context}". Follow these guidelines:

### Context:
{context}

### Question:
{question}

### Answer:

1. **Understand the Scope**:
   - Ensure you get correctly the first question.
   - General questions: Provide a broad response.
   - Specific questions: Provide detailed information.
   - Unclear questions: Ask for clarification.

2. **Critical Thinking**:
   - Analyze the question and cross-check with the context.
   - Avoid over-specificity unless explicitly requested.

3. **Accuracy and Alignment**:
   - Align answers with the question's intent.
   - Avoid vague responses; confirm facts or state limitations.

4. **Professional Tone and Formatting**:
   - Use non-technical language for general audiences.
   - Structure responses with bullet points or clear paragraphs.
   - Summarize long responses for clarity.

5. **Error Handling**:
   - If unsure, respond honestly: "Please consult a healthcare professional."
   - Request clarification for unclear questions.

6. **Advanced Reasoning**:
   - Side Effects: Categorize as common or severe, and mention when to seek medical attention.
   - Drug Interactions: Explain risks and suggest precautions.
   - Medical Conditions: Provide causes, symptoms, risks, treatments, and prevention tips.

7. **Handling Irrelevant Questions**:
   - If unrelated to the context or outside your scope: "Please consult a healthcare professional."
   - Correct any missed or irrelevant responses in the next answer.


"""
rag_prompt = PromptTemplate.from_template(template)

In [15]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm_model
    | StrOutputParser()
)

In [16]:
import gradio as gr

def rag_memory_stream(message, history):
    partial_text = ""
    for new_text in rag_chain.stream(message):
        partial_text += new_text
        yield partial_text

examples = [
    "What is a drug ?",
    "What are the side effects of lisinopril?"
]

description = "Real-Time AI-Powered Medical Assistant: Drug Side Effect Queries Chatbot"


title = "AI-Powered Medical Chatbot :) Try me!"
demo = gr.ChatInterface(fn=rag_memory_stream,
                        type="messages",
                        title=title,
                        description=description,
                        fill_height=True,
                        examples=examples,
                        theme="glass",
)

# Launch the application and make it sharable
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1203e1fa899d0a54e4.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


