# Medicine Recommendation System

## Introduction

This project, developed by Pravin Gupta (Reg No: 229310184, Section 7A) for the course Recommender Systems | AI 4103, focuses on building a Medicine Recommendation System. The increasing availability of health data and the advancements in machine learning present an opportunity to develop intelligent systems that can assist in identifying potential illnesses based on reported symptoms and provide relevant recommendations.

## Problem Statement

In many scenarios, individuals experience various symptoms but may lack immediate access to medical professionals for a preliminary assessment. This can lead to anxiety and uncertainty regarding potential health issues and appropriate initial steps. The problem addressed by this project is to create a system that can take a set of user-provided symptoms as input and predict a potential disease, subsequently offering relevant information such as a description of the disease, associated symptoms, recommended medications, necessary precautions, dietary suggestions, and relevant workout recommendations. This aims to provide users with a preliminary understanding and guidance based on their symptoms.

## Project Overview

This project leverages machine learning techniques to build a symptom-based disease prediction and recommendation system. The system is trained on a dataset containing various symptoms and their corresponding diseases. Upon receiving a set of symptoms, the trained model predicts the most likely disease. Subsequently, the system retrieves and presents comprehensive information related to the predicted disease, including:

*   **Disease Description:** A brief explanation of the predicted illness.
*   **Associated Symptoms:** A list of common symptoms related to the predicted disease.
*   **Recommended Medications:** Potential medications or types of treatment.
*   **Precautions:** Important precautions to take.
*   **Dietary Recommendations:** Suggested dietary adjustments.
*   **Workout Recommendations:** Relevant physical activities or exercises.

The project utilizes popular Python libraries for data manipulation (pandas), machine learning (scikit-learn), and interactive user interface development (ipywidgets).

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
df=pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/Training.csv')

In [6]:
df.head()

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze,prognosis
0,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
1,0,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
2,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
3,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
4,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection


In [7]:
df.tail()

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze,prognosis
4915,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,(vertigo) Paroymsal Positional Vertigo
4916,0,1,0,0,0,0,0,0,0,0,...,1,1,0,0,0,0,0,0,0,Acne
4917,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Urinary tract infection
4918,0,1,0,0,0,0,1,0,0,0,...,0,0,1,1,1,1,0,0,0,Psoriasis
4919,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,1,Impetigo


In [8]:
df.describe()

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,pus_filled_pimples,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze
count,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,...,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0,4920.0
mean,0.137805,0.159756,0.021951,0.045122,0.021951,0.162195,0.139024,0.045122,0.045122,0.021951,...,0.021951,0.021951,0.021951,0.023171,0.023171,0.023171,0.023171,0.023171,0.023171,0.023171
std,0.34473,0.366417,0.146539,0.207593,0.146539,0.368667,0.346007,0.207593,0.207593,0.146539,...,0.146539,0.146539,0.146539,0.150461,0.150461,0.150461,0.150461,0.150461,0.150461,0.150461
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [9]:
df.size

654360

In [10]:
df.shape

(4920, 133)

In [11]:
df.columns

Index(['itching', 'skin_rash', 'nodal_skin_eruptions', 'continuous_sneezing',
       'shivering', 'chills', 'joint_pain', 'stomach_pain', 'acidity',
       'ulcers_on_tongue',
       ...
       'blackheads', 'scurring', 'skin_peeling', 'silver_like_dusting',
       'small_dents_in_nails', 'inflammatory_nails', 'blister',
       'red_sore_around_nose', 'yellow_crust_ooze', 'prognosis'],
      dtype='object', length=133)

In [12]:
df['prognosis'].unique()

array(['Fungal infection', 'Allergy', 'GERD', 'Chronic cholestasis',
       'Drug Reaction', 'Peptic ulcer diseae', 'AIDS', 'Diabetes ',
       'Gastroenteritis', 'Bronchial Asthma', 'Hypertension ', 'Migraine',
       'Cervical spondylosis', 'Paralysis (brain hemorrhage)', 'Jaundice',
       'Malaria', 'Chicken pox', 'Dengue', 'Typhoid', 'hepatitis A',
       'Hepatitis B', 'Hepatitis C', 'Hepatitis D', 'Hepatitis E',
       'Alcoholic hepatitis', 'Tuberculosis', 'Common Cold', 'Pneumonia',
       'Dimorphic hemmorhoids(piles)', 'Heart attack', 'Varicose veins',
       'Hypothyroidism', 'Hyperthyroidism', 'Hypoglycemia',
       'Osteoarthristis', 'Arthritis',
       '(vertigo) Paroymsal  Positional Vertigo', 'Acne',
       'Urinary tract infection', 'Psoriasis', 'Impetigo'], dtype=object)

In [13]:
len(df['prognosis'].unique())

41

In [14]:
X=df.drop("prognosis", axis=1)
y=df['prognosis']

In [15]:
X

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,pus_filled_pimples,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze
0,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4915,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4916,0,1,0,0,0,0,0,0,0,0,...,1,1,1,0,0,0,0,0,0,0
4917,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4918,0,1,0,0,0,0,1,0,0,0,...,0,0,0,1,1,1,1,0,0,0


In [16]:
y

Unnamed: 0,prognosis
0,Fungal infection
1,Fungal infection
2,Fungal infection
3,Fungal infection
4,Fungal infection
...,...
4915,(vertigo) Paroymsal Positional Vertigo
4916,Acne
4917,Urinary tract infection
4918,Psoriasis


In [17]:
#split data
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0 )

In [18]:
x_train.shape

(3936, 132)

In [19]:
y_train.shape

(3936,)

In [20]:
x_test.shape

(984, 132)

In [21]:
y_test.shape

(984,)

In [22]:
# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the target variable
y_encoded = label_encoder.fit_transform(y)

In [23]:
y_encoded

array([15, 15, 15, ..., 38, 35, 27])

In [24]:
#split data
x_train,x_test,y_train,y_test=train_test_split(X,y_encoded,test_size=0.2,random_state=0 )

# Logistic Regression

In [25]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Initialize the Logistic Regression model
lr_model = LogisticRegression()

# Train the model on the training set
lr_model.fit(x_train, y_train)

# Make predictions on the training set
y_train_pred_lr = lr_model.predict(x_train)
# Make predictions on the testing set
y_test_pred_lr = lr_model.predict(x_test)

# Evaluate the model
print("Training Accuracy:", accuracy_score(y_train, y_train_pred_lr))
print("Testing Accuracy:", accuracy_score(y_test, y_test_pred_lr))

# Display confusion matrix and classification report for testing set
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_lr))

print("\nClassification Report:")
print(classification_report(y_test, y_test_pred_lr))

Training Accuracy: 1.0
Testing Accuracy: 1.0

Confusion Matrix:
[[18  0  0 ...  0  0  0]
 [ 0 22  0 ...  0  0  0]
 [ 0  0 31 ...  0  0  0]
 ...
 [ 0  0  0 ... 20  0  0]
 [ 0  0  0 ...  0 28  0]
 [ 0  0  0 ...  0  0 24]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        22
           2       1.00      1.00      1.00        31
           3       1.00      1.00      1.00        24
           4       1.00      1.00      1.00        23
           5       1.00      1.00      1.00        23
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        27
           8       1.00      1.00      1.00        21
           9       1.00      1.00      1.00        21
          10       1.00      1.00      1.00        34
          11       1.00      1.00      1.00        18
          12       1.00      1.00      1.00        22

# Random forest classifier

In [26]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the Random Forest model
rf_model = RandomForestClassifier()

# Train the model on the training set
rf_model.fit(x_train, y_train)

# Make predictions on the training set
y_train_pred_rf = rf_model.predict(x_train)
# Make predictions on the testing set
y_test_pred_rf = rf_model.predict(x_test)

# Evaluate the model
print("Training Accuracy:", accuracy_score(y_train, y_train_pred_rf))
print("Testing Accuracy:", accuracy_score(y_test, y_test_pred_rf))

# Display confusion matrix and classification report for testing set
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_rf))

print("\nClassification Report:")
print(classification_report(y_test, y_test_pred_rf))


Training Accuracy: 1.0
Testing Accuracy: 1.0

Confusion Matrix:
[[18  0  0 ...  0  0  0]
 [ 0 22  0 ...  0  0  0]
 [ 0  0 31 ...  0  0  0]
 ...
 [ 0  0  0 ... 20  0  0]
 [ 0  0  0 ...  0 28  0]
 [ 0  0  0 ...  0  0 24]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        22
           2       1.00      1.00      1.00        31
           3       1.00      1.00      1.00        24
           4       1.00      1.00      1.00        23
           5       1.00      1.00      1.00        23
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        27
           8       1.00      1.00      1.00        21
           9       1.00      1.00      1.00        21
          10       1.00      1.00      1.00        34
          11       1.00      1.00      1.00        18
          12       1.00      1.00      1.00        22

# SVC

In [27]:
from sklearn.svm import SVC

# Initialize the Support Vector Classifier model
svc_model = SVC()

# Train the model on the training set
svc_model.fit(x_train, y_train)

# Make predictions on the training set
y_train_pred_svc = svc_model.predict(x_train)
# Make predictions on the testing set
y_test_pred_svc = svc_model.predict(x_test)

# Evaluate the model
print("Training Accuracy:", accuracy_score(y_train, y_train_pred_svc))
print("Testing Accuracy:", accuracy_score(y_test, y_test_pred_svc))

# Display confusion matrix and classification report for testing set
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_svc))

print("\nClassification Report:")
print(classification_report(y_test, y_test_pred_svc))

Training Accuracy: 1.0
Testing Accuracy: 1.0

Confusion Matrix:
[[18  0  0 ...  0  0  0]
 [ 0 22  0 ...  0  0  0]
 [ 0  0 31 ...  0  0  0]
 ...
 [ 0  0  0 ... 20  0  0]
 [ 0  0  0 ...  0 28  0]
 [ 0  0  0 ...  0  0 24]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        22
           2       1.00      1.00      1.00        31
           3       1.00      1.00      1.00        24
           4       1.00      1.00      1.00        23
           5       1.00      1.00      1.00        23
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        27
           8       1.00      1.00      1.00        21
           9       1.00      1.00      1.00        21
          10       1.00      1.00      1.00        34
          11       1.00      1.00      1.00        18
          12       1.00      1.00      1.00        22

# naive bayes

In [28]:
from sklearn.naive_bayes import GaussianNB

# Initialize the Gaussian Naive Bayes model
nb_model = GaussianNB()

# Train the model on the training set
nb_model.fit(x_train, y_train)

# Make predictions on the training set
y_train_pred_nb = nb_model.predict(x_train)
# Make predictions on the testing set
y_test_pred_nb = nb_model.predict(x_test)

# Evaluate the model
print("Training Accuracy:", accuracy_score(y_train, y_train_pred_nb))
print("Testing Accuracy:", accuracy_score(y_test, y_test_pred_nb))

# Display confusion matrix and classification report for testing set
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_nb))

print("\nClassification Report:")
print(classification_report(y_test, y_test_pred_nb))


Training Accuracy: 1.0
Testing Accuracy: 1.0

Confusion Matrix:
[[18  0  0 ...  0  0  0]
 [ 0 22  0 ...  0  0  0]
 [ 0  0 31 ...  0  0  0]
 ...
 [ 0  0  0 ... 20  0  0]
 [ 0  0  0 ...  0 28  0]
 [ 0  0  0 ...  0  0 24]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        22
           2       1.00      1.00      1.00        31
           3       1.00      1.00      1.00        24
           4       1.00      1.00      1.00        23
           5       1.00      1.00      1.00        23
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        27
           8       1.00      1.00      1.00        21
           9       1.00      1.00      1.00        21
          10       1.00      1.00      1.00        34
          11       1.00      1.00      1.00        18
          12       1.00      1.00      1.00        22

# GradientBoostingClassifier

In [29]:
from sklearn.ensemble import GradientBoostingClassifier

# Initialize the Gradient Boosting Classifier model
gb_model = GradientBoostingClassifier()

# Train the model on the training set
gb_model.fit(x_train, y_train)

# Make predictions on the training set
y_train_pred_gb = gb_model.predict(x_train)
# Make predictions on the testing set
y_test_pred_gb = gb_model.predict(x_test)

# Evaluate the model
print("Training Accuracy:", accuracy_score(y_train, y_train_pred_gb))
print("Testing Accuracy:", accuracy_score(y_test, y_test_pred_gb))

# Display confusion matrix and classification report for testing set
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_test_pred_gb))

print("\nClassification Report:")
print(classification_report(y_test, y_test_pred_gb))


Training Accuracy: 1.0
Testing Accuracy: 1.0

Confusion Matrix:
[[18  0  0 ...  0  0  0]
 [ 0 22  0 ...  0  0  0]
 [ 0  0 31 ...  0  0  0]
 ...
 [ 0  0  0 ... 20  0  0]
 [ 0  0  0 ...  0 28  0]
 [ 0  0  0 ...  0  0 24]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      1.00      1.00        22
           2       1.00      1.00      1.00        31
           3       1.00      1.00      1.00        24
           4       1.00      1.00      1.00        23
           5       1.00      1.00      1.00        23
           6       1.00      1.00      1.00        28
           7       1.00      1.00      1.00        27
           8       1.00      1.00      1.00        21
           9       1.00      1.00      1.00        21
          10       1.00      1.00      1.00        34
          11       1.00      1.00      1.00        18
          12       1.00      1.00      1.00        22

# Model comparison

In [30]:
# Create a dictionary to store models
models = {
    'Logistic Regression': lr_model,
    'Random Forest': rf_model,
    'Support Vector Classifier': svc_model,
    'Naive Bayes': nb_model,
    'Gradient Boosting Classifier': gb_model
}

# Initialize a dictionary to store accuracies
accuracies = {}

# Iterate over models
for name, model in models.items():
    # Make predictions on the testing set
    y_pred = model.predict(x_test)
    # Calculate accuracy and store it
    acc = accuracy_score(y_test, y_pred)
    accuracies[name] = acc

# Print accuracies
for name, acc in accuracies.items():
    print(f'{name} Accuracy:', acc)

Logistic Regression Accuracy: 1.0
Random Forest Accuracy: 1.0
Support Vector Classifier Accuracy: 1.0
Naive Bayes Accuracy: 1.0
Gradient Boosting Classifier Accuracy: 1.0


# Saving model

In [31]:
import pickle

# Save the trained SVC model
with open('svc_model.pkl', 'wb') as file:
    pickle.dump(svc_model, file)

print("SVC model saved successfully.")

SVC model saved successfully.


# loading the model

In [32]:
import pickle

# Load the saved SVC model
with open('svc_model.pkl', 'rb') as file:
    loaded_svc_model = pickle.load(file)

print("SVC model loaded successfully.")


SVC model loaded successfully.


# Testing the model

In [33]:
#test 1
prediction1 = svc_model.predict(x_test.iloc[[0]])
print("Predicted Label :",prediction1)
print("Actual Label :",y_test[0])

Predicted Label : [18]
Actual Label : 18


In [34]:
#test 2
prediction2 = svc_model.predict(x_test.iloc[[10]])
print("Predicted Label :",prediction2)
print("Actual Label :",y_test[10])

Predicted Label : [38]
Actual Label : 38


### Recommendation System and Prediction

In [35]:
symtoms = pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/symtoms_df.csv')

In [36]:
symtoms.head()

Unnamed: 0.1,Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4
0,0,Fungal infection,itching,skin_rash,nodal_skin_eruptions,dischromic _patches
1,1,Fungal infection,skin_rash,nodal_skin_eruptions,dischromic _patches,
2,2,Fungal infection,itching,nodal_skin_eruptions,dischromic _patches,
3,3,Fungal infection,itching,skin_rash,dischromic _patches,
4,4,Fungal infection,itching,skin_rash,nodal_skin_eruptions,


In [37]:
precautions = pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/precautions_df.csv')

In [38]:
precautions.head()

Unnamed: 0.1,Unnamed: 0,Disease,Precaution_1,Precaution_2,Precaution_3,Precaution_4
0,0,Drug Reaction,stop irritation,consult nearest hospital,stop taking drug,follow up
1,1,Malaria,Consult nearest hospital,avoid oily food,avoid non veg food,keep mosquitos out
2,2,Allergy,apply calamine,cover area with bandage,,use ice to compress itching
3,3,Hypothyroidism,reduce stress,exercise,eat healthy,get proper sleep
4,4,Psoriasis,wash hands with warm soapy water,stop bleeding using pressure,consult doctor,salt baths


In [39]:
workout = pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/workout_df.csv')

In [40]:
workout.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,disease,workout
0,0,0,Fungal infection,Avoid sugary foods
1,1,1,Fungal infection,Consume probiotics
2,2,2,Fungal infection,Increase intake of garlic
3,3,3,Fungal infection,Include yogurt in diet
4,4,4,Fungal infection,Limit processed foods


In [41]:
description = pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/description.csv')

In [42]:
description.head()

Unnamed: 0,Disease,Description
0,Fungal infection,Fungal infection is a common skin condition ca...
1,Allergy,Allergy is an immune system reaction to a subs...
2,GERD,GERD (Gastroesophageal Reflux Disease) is a di...
3,Chronic cholestasis,Chronic cholestasis is a condition where bile ...
4,Drug Reaction,Drug Reaction occurs when the body reacts adve...


In [43]:
medications = pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/medications.csv')

In [44]:
medications.head()

Unnamed: 0,Disease,Medication
0,Fungal infection,"['Antifungal Cream', 'Fluconazole', 'Terbinafi..."
1,Allergy,"['Antihistamines', 'Decongestants', 'Epinephri..."
2,GERD,"['Proton Pump Inhibitors (PPIs)', 'H2 Blockers..."
3,Chronic cholestasis,"['Ursodeoxycholic acid', 'Cholestyramine', 'Me..."
4,Drug Reaction,"['Antihistamines', 'Epinephrine', 'Corticoster..."


In [45]:
diets = pd.read_csv('/kaggle/input/medicine-recommendation-system-dataset/diets.csv')

In [46]:
diets.head()

Unnamed: 0,Disease,Diet
0,Fungal infection,"['Antifungal Diet', 'Probiotics', 'Garlic', 'C..."
1,Allergy,"['Elimination Diet', 'Omega-3-rich foods', 'Vi..."
2,GERD,"['Low-Acid Diet', 'Fiber-rich foods', 'Ginger'..."
3,Chronic cholestasis,"['Low-Fat Diet', 'High-Fiber Diet', 'Lean prot..."
4,Drug Reaction,"['Antihistamine Diet', 'Omega-3-rich foods', '..."


#### build a dictionarnies for symptoms & diseases to Map the predicted value to the corresponding disease OR symptom

In [47]:
symptoms_dict = {'itching': 0, 'skin_rash': 1, 'nodal_skin_eruptions': 2, 'continuous_sneezing': 3, 'shivering': 4, 'chills': 5, 'joint_pain': 6, 'stomach_pain': 7, 'acidity': 8, 'ulcers_on_tongue': 9, 'muscle_wasting': 10, 'vomiting': 11, 'burning_micturition': 12, 'spotting_ urination': 13, 'fatigue': 14, 'weight_gain': 15, 'anxiety': 16, 'cold_hands_and_feets': 17, 'mood_swings': 18, 'weight_loss': 19, 'restlessness': 20, 'lethargy': 21, 'patches_in_throat': 22, 'irregular_sugar_level': 23, 'cough': 24, 'high_fever': 25, 'sunken_eyes': 26, 'breathlessness': 27, 'sweating': 28, 'dehydration': 29, 'indigestion': 30, 'headache': 31, 'yellowish_skin': 32, 'dark_urine': 33, 'nausea': 34, 'loss_of_appetite': 35, 'pain_behind_the_eyes': 36, 'back_pain': 37, 'constipation': 38, 'abdominal_pain': 39, 'diarrhoea': 40, 'mild_fever': 41, 'yellow_urine': 42, 'yellowing_of_eyes': 43, 'acute_liver_failure': 44, 'fluid_overload': 45, 'swelling_of_stomach': 46, 'swelled_lymph_nodes': 47, 'malaise': 48, 'blurred_and_distorted_vision': 49, 'phlegm': 50, 'throat_irritation': 51, 'redness_of_eyes': 52, 'sinus_pressure': 53, 'runny_nose': 54, 'congestion': 55, 'chest_pain': 56, 'weakness_in_limbs': 57, 'fast_heart_rate': 58, 'pain_during_bowel_movements': 59, 'pain_in_anal_region': 60, 'bloody_stool': 61, 'irritation_in_anus': 62, 'neck_pain': 63, 'dizziness': 64, 'cramps': 65, 'bruising': 66, 'obesity': 67, 'swollen_legs': 68, 'swollen_blood_vessels': 69, 'puffy_face_and_eyes': 70, 'enlarged_thyroid': 71, 'brittle_nails': 72, 'swollen_extremeties': 73, 'excessive_hunger': 74, 'extra_marital_contacts': 75, 'drying_and_tingling_lips': 76, 'slurred_speech': 77, 'knee_pain': 78, 'hip_joint_pain': 79, 'muscle_weakness': 80, 'stiff_neck': 81, 'swelling_joints': 82, 'movement_stiffness': 83, 'spinning_movements': 84, 'loss_of_balance': 85, 'unsteadiness': 86, 'weakness_of_one_body_side': 87, 'loss_of_smell': 88, 'bladder_discomfort': 89, 'foul_smell_of urine': 90, 'continuous_feel_of_urine': 91, 'passage_of_gases': 92, 'internal_itching': 93, 'toxic_look_(typhos)': 94, 'depression': 95, 'irritability': 96, 'muscle_pain': 97, 'altered_sensorium': 98, 'red_spots_over_body': 99, 'belly_pain': 100, 'abnormal_menstruation': 101, 'dischromic _patches': 102, 'watering_from_eyes': 103, 'increased_appetite': 104, 'polyuria': 105, 'family_history': 106, 'mucoid_sputum': 107, 'rusty_sputum': 108, 'lack_of_concentration': 109, 'visual_disturbances': 110, 'receiving_blood_transfusion': 111, 'receiving_unsterile_injections': 112, 'coma': 113, 'stomach_bleeding': 114, 'distention_of_abdomen': 115, 'history_of_alcohol_consumption': 116, 'fluid_overload.1': 117, 'blood_in_sputum': 118, 'prominent_veins_on_calf': 119, 'palpitations': 120, 'painful_walking': 121, 'pus_filled_pimples': 122, 'blackheads': 123, 'scurring': 124, 'skin_peeling': 125, 'silver_like_dusting': 126, 'small_dents_in_nails': 127, 'inflammatory_nails': 128, 'blister': 129, 'red_sore_around_nose': 130, 'yellow_crust_ooze': 131}
diseases_list = {15: 'Fungal infection', 4: 'Allergy', 16: 'GERD', 9: 'Chronic cholestasis', 14: 'Drug Reaction', 33: 'Peptic ulcer diseae', 1: 'AIDS', 12: 'Diabetes ', 17: 'Gastroenteritis', 6: 'Bronchial Asthma', 23: 'Hypertension ', 30: 'Migraine', 7: 'Cervical spondylosis', 32: 'Paralysis (brain hemorrhage)', 28: 'Jaundice', 29: 'Malaria', 8: 'Chicken pox', 11: 'Dengue', 37: 'Typhoid', 40: 'hepatitis A', 19: 'Hepatitis B', 20: 'Hepatitis C', 21: 'Hepatitis D', 22: 'Hepatitis E', 3: 'Alcoholic hepatitis', 36: 'Tuberculosis', 10: 'Common Cold', 34: 'Pneumonia', 13: 'Dimorphic hemmorhoids(piles)', 18: 'Heart attack', 39: 'Varicose veins', 26: 'Hypothyroidism', 24: 'Hyperthyroidism', 25: 'Hypoglycemia', 31: 'Osteoarthristis', 5: 'Arthritis', 0: '(vertigo) Paroymsal  Positional Vertigo', 2: 'Acne', 38: 'Urinary tract infection', 35: 'Psoriasis', 27: 'Impetigo'}

In [48]:
# Predict using the svc_model on the first row of x_test
prediction = svc_model.predict(x_test.iloc[[0]])

# Map the predicted value to the corresponding disease
predicted_disease = diseases_list[prediction[0]]

print("Predicted Disease:", predicted_disease)

Predicted Disease: Heart attack


In [49]:
# Extract the description information for the predicted disease
predicted_description_info = description[description['Disease'] == predicted_disease]['Description'].values

# Extract the symptoms information for the predicted disease and remove duplicates
predicted_symptoms_info = symtoms[symtoms['Disease'] == predicted_disease][['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4']].values
predicted_symptoms_info = pd.unique(predicted_symptoms_info.ravel())

# Extract the medications information for the predicted disease
predicted_medications_info = medications[medications['Disease'] == predicted_disease]['Medication'].values

# Extract the precautions information for the predicted disease and remove duplicates
predicted_precautions_info = precautions[precautions['Disease'] == predicted_disease][['Precaution_1', 'Precaution_2', 'Precaution_3','Precaution_4']].values
predicted_precautions_info = pd.unique(predicted_precautions_info.ravel())

# Extract the diet information for the predicted disease
predicted_diet_info = diets[diets['Disease'] == predicted_disease]['Diet'].values

# Extract the workout information for the predicted disease
predicted_workout_info = workout[workout['disease'] == predicted_disease]['workout'].values

print("Predicted Disease:", predicted_disease)
print("")
print("Predicted Disease Description Information:", predicted_description_info)
print("")
print("Predicted Disease Symptoms Information:", predicted_symptoms_info)
print("")
print("Predicted Disease precautions Information:", predicted_precautions_info)
print("")
print("Predicted Disease Medications Information:", predicted_medications_info)
print("")
print("Predicted Disease Diet Information:", predicted_diet_info)
print("")
print("Predicted Disease workout Information:", predicted_workout_info)

Predicted Disease: Heart attack

Predicted Disease Description Information: ['Heart attack is a sudden and severe reduction in blood flow to the heart muscle.']

Predicted Disease Symptoms Information: [' vomiting' ' breathlessness' ' sweating' ' chest_pain' nan]

Predicted Disease precautions Information: ['call ambulance' 'chew or swallow asprin' 'keep calm' nan]

Predicted Disease Medications Information: ["['Compression stockings', 'Exercise', 'Elevating the legs', 'Sclerotherapy', 'Laser treatments']"]

Predicted Disease Diet Information: ["['Heart-Healthy Diet', 'Low-sodium foods', 'Fruits and vegetables', 'Whole grains', 'Lean proteins']"]

Predicted Disease workout Information: ['Follow a heart-healthy diet' 'Limit sodium intake'
 'Include fiber-rich foods' 'Consume healthy fats' 'Include lean proteins'
 'Limit sugary foods and beverages' 'Stay hydrated'
 'Consult a healthcare professional' 'Follow medical recommendations'
 'Engage in regular exercise']


# building program

In [89]:
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output

def predict_disease(symptoms):
    # Create a zero-filled numpy array with the same shape as the training data (excluding 'prognosis')
    input_data = np.zeros(len(symptoms_dict))

    # Set the corresponding symptom values to 1 based on the selected symptoms
    for symptom in symptoms:
        if symptom in symptoms_dict:
            input_data[symptoms_dict[symptom]] = 1

    # Reshape the input data to match the model's expected input shape (1 sample, 132 features)
    input_data = input_data.reshape(1, -1)

    # Predict using the loaded SVC model
    prediction = loaded_svc_model.predict(input_data)

    # Map the predicted value to the corresponding disease
    predicted_disease = diseases_list[prediction[0]]

    return predicted_disease

def display_information(predicted_disease):
    # Extract information for the predicted disease
    predicted_description_info = description[description['Disease'] == predicted_disease]['Description'].values

    # Extract the symptoms information for the predicted disease and remove duplicates
    predicted_symptoms_info = symtoms[symtoms['Disease'] == predicted_disease][['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4']].values
    predicted_symptoms_info = pd.unique(predicted_symptoms_info.ravel())
    predicted_symptoms_info = pd.Series(predicted_symptoms_info).dropna().tolist()  # Drop NaN values and convert to list

    predicted_medications_info = medications[medications['Disease'] == predicted_disease]['Medication'].values

    predicted_precautions_info = precautions[precautions['Disease'] == predicted_disease][['Precaution_1', 'Precaution_2', 'Precaution_3','Precaution_4']].values
    predicted_precautions_info = pd.unique(predicted_precautions_info.ravel())
    predicted_precautions_info = pd.Series(predicted_precautions_info).dropna().tolist()  # Drop NaN values and convert to list

    # Extract the diet information for the predicted disease
    predicted_diet_info = diets[diets['Disease'] == predicted_disease]['Diet'].values

    # Extract the workout information for the predicted disease
    predicted_workout_info = workout[workout['disease'] == predicted_disease]['workout'].values

    # Initialize HTML output string with basic styling
    html_output = f"""
    <style>
        body {{
            font-family: 'Arial', sans-serif;
            line-height: 1.6;
            margin: 20px;
            background-color: #f8f9fa;
            color: #343a40;
        }}
        h1 {{
            color: #0056b3;
            border-bottom: 2px solid #0056b3;
            padding-bottom: 10px;
            margin-bottom: 20px;
            text-align: center;
        }}
        h2 {{
            color: #007bff;
            margin-top: 25px;
            margin-bottom: 15px;
            border-bottom: 1px solid #007bff;
            padding-bottom: 5px;
        }}
        p {{
            margin-bottom: 15px;
        }}
        ul {{
            list-style-type: disc;
            margin-left: 20px;
            margin-bottom: 15px;
        }}
        li {{
            margin-bottom: 5px;
        }}
        table {{
            border-collapse: collapse;
            width: 100%;
            margin-bottom: 20px;
            box-shadow: 0 2px 5px rgba(0,0,0,0.1);
        }}
        th, td {{
            border: 1px solid #dee2e6;
            padding: 10px;
            text-align: left;
        }}
        th {{
            background-color: #007bff;
            color: white;
            font-weight: bold;
        }}
        tr:nth-child(even) {{
            background-color: #e9ecef;
        }}
    </style>
    <h1>Predicted Disease: {predicted_disease}</h1>
    """

    # Add Description
    if len(predicted_description_info) > 0:
        html_output += f"<h2>Description:</h2><p>{predicted_description_info[0]}</p>"

    # Add Symptoms
    if predicted_symptoms_info:
        html_output += "<h2>Symptoms:</h2><table><tr><th>Symptom</th></tr>"
        for symptom in predicted_symptoms_info:
            html_output += f"<tr><td>{symptom}</td></tr>"
        html_output += "</table>"

    # Add Medications
    if len(predicted_medications_info) > 0:
        html_output += "<h2>Medications:</h2><table><tr><th>Medication</th></tr>"
        medications_list = eval(predicted_medications_info[0])
        for medication in medications_list:
            html_output += f"<tr><td>{medication}</td></tr>"
        html_output += "</table>"
    else:
        html_output += "<h2>Medications:</h2><p>No medication information available.</p>"

    # Add Precautions
    if predicted_precautions_info:
        html_output += "<h2>Precautions:</h2><table><tr><th>Precaution</th></tr>"
        for precaution in predicted_precautions_info:
            html_output += f"<tr><td>{precaution}</td></tr>"
        html_output += "</table>"

    # Add Diet
    if len(predicted_diet_info) > 0:
        html_output += "<h2>Diet:</h2><table><tr><th>Diet</th></tr>"
        diet_list = eval(predicted_diet_info[0])
        for diet_item in diet_list:
            html_output += f"<tr><td>{diet_item}</td></tr>"
        html_output += "</table>"
    else:
        html_output += "<h2>Diet:</h2><p>No diet information available.</p>"

    # Add Workout
    if len(predicted_workout_info) > 0:
        html_output += "<h2>Workout:</h2><table><tr><th>Workout</th></tr>"
        for workout_item in predicted_workout_info:
            html_output += f"<tr><td>{workout_item}</td></tr>"
        html_output += "</table>"
    else:
        html_output += "<h2>Workout:</h2><p>No workout information available.</p>"

    return html_output


# Get the list of all possible symptoms
all_symptoms = list(symptoms_dict.keys())

# Create a SelectMultiple widget for symptom selection
symptom_select = widgets.SelectMultiple(
    options=all_symptoms,
    description='Select Symptoms:',
    disabled=False,
    layout=widgets.Layout(width='50%', height='200px', align_self='center'), # Added align_self='center'
    style={'description_color': 'black'}  # Set description color to black
)

# Create a VBox to display selected symptoms as buttons
selected_symptoms_box = widgets.VBox([], layout=widgets.Layout(width='50%', align_self='center')) # Added align_self='center'

# Function to update the selected symptoms display
def update_selected_symptoms_display(selected_symptoms):
    selected_symptoms_box.children = []
    for symptom in selected_symptoms:
        symptom_button = widgets.Button(description=symptom, button_style='info', layout=widgets.Layout(width='auto'))
        symptom_button.on_click(lambda b: remove_symptom(b.description))
        selected_symptoms_box.children = list(selected_symptoms_box.children) + [symptom_button]

# Function to handle symptom selection
def on_symptom_select_change(change):
    selected_symptoms = list(change['new'])
    update_selected_symptoms_display(selected_symptoms)

# Function to remove a symptom
def remove_symptom(symptom):
    current_selected_symptoms = list(symptom_select.value)
    if symptom in current_selected_symptoms:
        current_selected_symptoms.remove(symptom)
        symptom_select.value = current_selected_symptoms
        update_selected_symptoms_display(current_selected_symptoms)

# Observe changes in the symptom selection widget
symptom_select.observe(on_symptom_select_change, names='value')

predict_button = widgets.Button(
    description="Predict Disease",
    button_style='success',
    layout=widgets.Layout(width='auto', align_self='center') # Added align_self='center'
)
output_widget = widgets.Output()

# Function to handle prediction button click
def on_predict_button_clicked(_):
    selected_symptoms = list(symptom_select.value)
    if selected_symptoms:
        predicted_disease = predict_disease(selected_symptoms)
        with output_widget:
            clear_output()
            html_output = display_information(predicted_disease)
            display(HTML(html_output))
    else:
        with output_widget:
            clear_output()
            print("Please select at least one symptom.")

# Link button clicks to functions
predict_button.on_click(on_predict_button_clicked)

# Arrange widgets in a VBox for better layout
input_widgets = widgets.VBox([symptom_select, selected_symptoms_box, predict_button], layout=widgets.Layout(align_items='center')) # Added layout for centering

# Display the widgets
display(input_widgets, output_widget)

VBox(children=(SelectMultiple(description='Select Symptoms:', layout=Layout(align_self='center', height='200px…

Output()

Symptom
itching
skin_rash
stomach_pain
burning_micturition
spotting_ urination

Medication
Antihistamines
Epinephrine
Corticosteroids
Antibiotics
Antifungal Cream

Precaution
stop irritation
consult nearest hospital
stop taking drug
follow up

Diet
Antihistamine Diet
Omega-3-rich foods
Vitamin C-rich foods
Quercetin-rich foods
Probiotics

Workout
Discontinue offending medication
Stay hydrated
Include anti-inflammatory foods
Consume antioxidants
Avoid trigger foods
Include omega-3 fatty acids
Limit caffeine and alcohol
Stay hydrated
Eat a balanced diet
Consult a healthcare professional


## Conclusion

This Medicine Recommendation System successfully demonstrates the application of machine learning for symptom-based disease prediction and information retrieval. By training various classification models, including Logistic Regression, Random Forest, Support Vector Classifier, Naive Bayes, and Gradient Boosting Classifier, the project achieved high accuracy in predicting diseases based on the provided symptom dataset. The system provides users with valuable preliminary information about potential illnesses and related recommendations for medications, precautions, diet, and workout.

While the models show excellent performance on the training data, it is crucial to note that this system is intended for informational purposes only and should not be considered a substitute for professional medical advice. Future enhancements could include incorporating a larger and more diverse dataset, exploring more advanced deep learning models, implementing a symptom search functionality, and potentially integrating with external medical databases for real-time information. This project serves as a foundational step towards building more comprehensive and user-friendly health recommendation systems.