# Title: Personalized Medical Recommendation System with Machine Learning

# Description:

Welcome to our cutting-edge Personalized Medical Recommendation System, a powerful platform designed to assist users in understanding and managing their health. Leveraging the capabilities of machine learning, our system analyzes user-input symptoms to predict potential diseases accurately.

# load dataset & tools

In [18]:
import  pandas as pd
import numpy as np
from imblearn.over_sampling import RandomOverSampler

In [19]:
dataset = pd.read_csv('Training.csv')

In [20]:
dataset

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze,prognosis
0,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
1,0,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
2,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
3,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
4,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
405,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,Impetigo
406,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,Impetigo
407,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,0,Impetigo
408,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,0,Impetigo


In [21]:
vals = dataset.values.flatten()

In [22]:
dataset.shape

(410, 133)

# train test split

In [23]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

In [24]:
X = dataset.drop('prognosis', axis=1)
y = dataset['prognosis']

# ecoding prognonsis
le = LabelEncoder()
le.fit(y)
Y = le.transform(y)
    
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=20)

In [25]:
unique_labels, counts = np.unique(y, return_counts=True)
print("Before oversampling:")
for label, count in zip(unique_labels, counts):
    print(f"Label {label}: {count} samples")

Before oversampling:
Label (vertigo) Paroymsal Positional Vertigo: 10 samples
Label AIDS: 10 samples
Label Acne: 10 samples
Label Alcoholic hepatitis: 10 samples
Label Allergy: 10 samples
Label Arthritis: 10 samples
Label Bronchial Asthma: 10 samples
Label Cervical spondylosis        : 10 samples
Label Chicken pox: 10 samples
Label Chronic cholestasis: 10 samples
Label Common Cold: 10 samples
Label Dengue: 10 samples
Label Diabetes: 10 samples
Label Dimorphic hemmorhoids(piles): 10 samples
Label Drug Reaction: 10 samples
Label Fungal infection: 10 samples
Label GERD: 10 samples
Label Gastroenteritis: 10 samples
Label Heart attack: 10 samples
Label Hepatitis A: 10 samples
Label Hepatitis B: 10 samples
Label Hepatitis C: 10 samples
Label Hepatitis D: 10 samples
Label Hepatitis E: 10 samples
Label Hypertension: 10 samples
Label Hyperthyroidism: 10 samples
Label Hypoglycemia: 10 samples
Label Hypothyroidism: 10 samples
Label Impetigo: 10 samples
Label Jaundice: 10 samples
Label Malaria: 10

In [26]:
desired_samples_per_label = 100
oversampler = RandomOverSampler(sampling_strategy={label: desired_samples_per_label for label in unique_labels}, random_state=42)
X_resampled, y_resampled = oversampler.fit_resample(X, y)

# Check the label distribution after oversampling
unique_labels_resampled, counts_resampled = np.unique(y_resampled, return_counts=True)
print("\nAfter oversampling:")
for label, count in zip(unique_labels_resampled, counts_resampled):
    print(f"Label {label}: {count} samples")


After oversampling:
Label (vertigo) Paroymsal Positional Vertigo: 100 samples
Label AIDS: 100 samples
Label Acne: 100 samples
Label Alcoholic hepatitis: 100 samples
Label Allergy: 100 samples
Label Arthritis: 100 samples
Label Bronchial Asthma: 100 samples
Label Cervical spondylosis        : 100 samples
Label Chicken pox: 100 samples
Label Chronic cholestasis: 100 samples
Label Common Cold: 100 samples
Label Dengue: 100 samples
Label Diabetes: 100 samples
Label Dimorphic hemmorhoids(piles): 100 samples
Label Drug Reaction: 100 samples
Label Fungal infection: 100 samples
Label GERD: 100 samples
Label Gastroenteritis: 100 samples
Label Heart attack: 100 samples
Label Hepatitis A: 100 samples
Label Hepatitis B: 100 samples
Label Hepatitis C: 100 samples
Label Hepatitis D: 100 samples
Label Hepatitis E: 100 samples
Label Hypertension: 100 samples
Label Hyperthyroidism: 100 samples
Label Hypoglycemia: 100 samples
Label Hypothyroidism: 100 samples
Label Impetigo: 100 samples
Label Jaundice:

In [27]:
X = X_resampled
y = y_resampled

In [28]:
unique_labels, counts = np.unique(y, return_counts=True)
print("Before oversampling:")
for label, count in zip(unique_labels, counts):
    print(f"Label {label}: {count} samples")

Before oversampling:
Label (vertigo) Paroymsal Positional Vertigo: 100 samples
Label AIDS: 100 samples
Label Acne: 100 samples
Label Alcoholic hepatitis: 100 samples
Label Allergy: 100 samples
Label Arthritis: 100 samples
Label Bronchial Asthma: 100 samples
Label Cervical spondylosis        : 100 samples
Label Chicken pox: 100 samples
Label Chronic cholestasis: 100 samples
Label Common Cold: 100 samples
Label Dengue: 100 samples
Label Diabetes: 100 samples
Label Dimorphic hemmorhoids(piles): 100 samples
Label Drug Reaction: 100 samples
Label Fungal infection: 100 samples
Label GERD: 100 samples
Label Gastroenteritis: 100 samples
Label Heart attack: 100 samples
Label Hepatitis A: 100 samples
Label Hepatitis B: 100 samples
Label Hepatitis C: 100 samples
Label Hepatitis D: 100 samples
Label Hepatitis E: 100 samples
Label Hypertension: 100 samples
Label Hyperthyroidism: 100 samples
Label Hypoglycemia: 100 samples
Label Hypothyroidism: 100 samples
Label Impetigo: 100 samples
Label Jaundice:

In [29]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [30]:
# Split the resampled data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.3, random_state=42)

In [31]:
X_train.shape

(2870, 132)

In [32]:
# Train a Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Test the model
y_pred = rf_classifier.predict(X_test)

# Evaluate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Random Forest Accuracy: {accuracy:.4f}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Random Forest Accuracy: 0.9911

Classification Report:
                                        precision    recall  f1-score   support

(vertigo) Paroymsal Positional Vertigo       1.00      1.00      1.00        33
                                  AIDS       1.00      1.00      1.00        34
                                  Acne       1.00      1.00      1.00        26
                   Alcoholic hepatitis       1.00      1.00      1.00        29
                               Allergy       1.00      1.00      1.00        30
                             Arthritis       1.00      1.00      1.00        33
                      Bronchial Asthma       0.90      1.00      0.95        26
          Cervical spondylosis               1.00      1.00      1.00        35
                           Chicken pox       1.00      1.00      1.00        30
                   Chronic cholestasis       1.00      1.00      1.00        31
                           Common Cold       1.00      0.81     

# Training top models

# single prediction

In [33]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# selecting SVC
svc = SVC(kernel='linear')
svc.fit(X_train, y_train)
ypred = svc.predict(X_test)
svc_accuracy = accuracy_score(y_test, ypred)
print(f"SVC Model Accuracy: {svc_accuracy}")

SVC Model Accuracy: 0.9910569105691057


In [34]:
# save svc
import pickle
pickle.dump(svc,open('svc.pkl','wb'))
pickle.dump(svc, open('rf.pkl', 'wb'))

In [35]:
# load model
svc = pickle.load(open('svc.pkl','rb'))

In [36]:
# Test 1:
# If X_test is a NumPy array
print("Predicted disease:", svc.predict(X_test[0].reshape(1, -1))[0])

# If y_test is a NumPy array, you should use:
print("Actual Disease:", y_test[0])

KeyError: 0

In [None]:
# test 2:
# Test with NumPy arrays
print("Predicted disease:", svc.predict(X_test[100].reshape(1, -1))[0])
print("Actual Disease:", y_test[100])

Predicted disease: 0
Actual Disease: 0


# Recommendation System and Prediction

# load database and use logic for recommendations

In [37]:
sym_des = pd.read_csv("datasets/symtoms_df.csv")
precautions = pd.read_csv("precautions_df.csv")
workout = pd.read_csv("datasets/workout_df.csv")
description = pd.read_csv("datasets/description.csv")
medications = pd.read_csv('datasets/medications.csv')
diets = pd.read_csv("datasets/diets.csv")

In [47]:
#============================================================
# custome and helping functions
#==========================helper funtions================
def helper(dis):
    desc = description[description['Disease'] == dis]['Description']
    desc = " ".join([w for w in desc])

    pre = precautions[precautions['Disease'] == dis][['Precaution_1', 'Precaution_2', 'Precaution_3', 'Precaution_4']]
    pre = [col for col in pre.values]

    med = medications[medications['Disease'] == dis]['Medication']
    med = [med for med in med.values]

    die = diets[diets['Disease'] == dis]['Diet']
    die = [die for die in die.values]

    wrkout = workout[workout['disease'] == dis] ['workout']


    return desc,pre,med,die,wrkout

symptoms_dict = {'itching': 0, 'skin_rash': 1, 'nodal_skin_eruptions': 2, 'continuous_sneezing': 3, 'shivering': 4, 'chills': 5, 'joint_pain': 6, 'stomach_pain': 7, 'acidity': 8, 'ulcers_on_tongue': 9, 'muscle_wasting': 10, 'vomiting': 11, 'burning_micturition': 12, 'spotting_ urination': 13, 'fatigue': 14, 'weight_gain': 15, 'anxiety': 16, 'cold_hands_and_feets': 17, 'mood_swings': 18, 'weight_loss': 19, 'restlessness': 20, 'lethargy': 21, 'patches_in_throat': 22, 'irregular_sugar_level': 23, 'cough': 24, 'high_fever': 25, 'sunken_eyes': 26, 'breathlessness': 27, 'sweating': 28, 'dehydration': 29, 'indigestion': 30, 'headache': 31, 'yellowish_skin': 32, 'dark_urine': 33, 'nausea': 34, 'loss_of_appetite': 35, 'pain_behind_the_eyes': 36, 'back_pain': 37, 'constipation': 38, 'abdominal_pain': 39, 'diarrhoea': 40, 'mild_fever': 41, 'yellow_urine': 42, 'yellowing_of_eyes': 43, 'acute_liver_failure': 44, 'fluid_overload': 45, 'swelling_of_stomach': 46, 'swelled_lymph_nodes': 47, 'malaise': 48, 'blurred_and_distorted_vision': 49, 'phlegm': 50, 'throat_irritation': 51, 'redness_of_eyes': 52, 'sinus_pressure': 53, 'runny_nose': 54, 'congestion': 55, 'chest_pain': 56, 'weakness_in_limbs': 57, 'fast_heart_rate': 58, 'pain_during_bowel_movements': 59, 'pain_in_anal_region': 60, 'bloody_stool': 61, 'irritation_in_anus': 62, 'neck_pain': 63, 'dizziness': 64, 'cramps': 65, 'bruising': 66, 'obesity': 67, 'swollen_legs': 68, 'swollen_blood_vessels': 69, 'puffy_face_and_eyes': 70, 'enlarged_thyroid': 71, 'brittle_nails': 72, 'swollen_extremeties': 73, 'excessive_hunger': 74, 'extra_marital_contacts': 75, 'drying_and_tingling_lips': 76, 'slurred_speech': 77, 'knee_pain': 78, 'hip_joint_pain': 79, 'muscle_weakness': 80, 'stiff_neck': 81, 'swelling_joints': 82, 'movement_stiffness': 83, 'spinning_movements': 84, 'loss_of_balance': 85, 'unsteadiness': 86, 'weakness_of_one_body_side': 87, 'loss_of_smell': 88, 'bladder_discomfort': 89, 'foul_smell_of urine': 90, 'continuous_feel_of_urine': 91, 'passage_of_gases': 92, 'internal_itching': 93, 'toxic_look_(typhos)': 94, 'depression': 95, 'irritability': 96, 'muscle_pain': 97, 'altered_sensorium': 98, 'red_spots_over_body': 99, 'belly_pain': 100, 'abnormal_menstruation': 101, 'dischromic _patches': 102, 'watering_from_eyes': 103, 'increased_appetite': 104, 'polyuria': 105, 'family_history': 106, 'mucoid_sputum': 107, 'rusty_sputum': 108, 'lack_of_concentration': 109, 'visual_disturbances': 110, 'receiving_blood_transfusion': 111, 'receiving_unsterile_injections': 112, 'coma': 113, 'stomach_bleeding': 114, 'distention_of_abdomen': 115, 'history_of_alcohol_consumption': 116, 'fluid_overload.1': 117, 'blood_in_sputum': 118, 'prominent_veins_on_calf': 119, 'palpitations': 120, 'painful_walking': 121, 'pus_filled_pimples': 122, 'blackheads': 123, 'scurring': 124, 'skin_peeling': 125, 'silver_like_dusting': 126, 'small_dents_in_nails': 127, 'inflammatory_nails': 128, 'blister': 129, 'red_sore_around_nose': 130, 'yellow_crust_ooze': 131}
diseases_list = {15: 'Fungal infection', 4: 'Allergy', 16: 'GERD', 9: 'Chronic cholestasis', 14: 'Drug Reaction', 33: 'Peptic ulcer diseae', 1: 'AIDS', 12: 'Diabetes ', 17: 'Gastroenteritis', 6: 'Bronchial Asthma', 23: 'Hypertension ', 30: 'Migraine', 7: 'Cervical spondylosis', 32: 'Paralysis (brain hemorrhage)', 28: 'Jaundice', 29: 'Malaria', 8: 'Chicken pox', 11: 'Dengue', 37: 'Typhoid', 40: 'hepatitis A', 19: 'Hepatitis B', 20: 'Hepatitis C', 21: 'Hepatitis D', 22: 'Hepatitis E', 3: 'Alcoholic hepatitis', 36: 'Tuberculosis', 10: 'Common Cold', 34: 'Pneumonia', 13: 'Dimorphic hemmorhoids(piles)', 18: 'Heart attack', 39: 'Varicose veins', 26: 'Hypothyroidism', 24: 'Hyperthyroidism', 25: 'Hypoglycemia', 31: 'Osteoarthristis', 5: 'Arthritis', 0: '(vertigo) Paroymsal  Positional Vertigo', 2: 'Acne', 38: 'Urinary tract infection', 35: 'Psoriasis', 27: 'Impetigo'}

# Model Prediction function
def get_predicted_value(patient_symptoms):
    input_vector = np.zeros(len(symptoms_dict))
    print(input_vector)
    for item in patient_symptoms:
        input_vector[symptoms_dict[item]] = 1
    print(input_vector)
    predicted_disease_index = svc.predict([input_vector])[0]
    print(predicted_disease_index)
    # print(diseases_list[predicted_disease_index])
    # predicted_disease = diseases_list.get(predicted_disease_index, "Unknown Disease")
    return predicted_disease_index


In [48]:
# Test 1
# Split the user's input into a list of symptoms (assuming they are comma-separated) # itching,skin_rash,nodal_skin_eruptions
symptoms = input("Enter your symptoms.......")
user_symptoms = [s.strip() for s in symptoms.split(',[] ')]
# Remove any extra characters, if any
# user_symptoms = [symptom.strip("[]' ") for symptom in user_symptoms]
print(user_symptoms)
predicted_disease = get_predicted_value(user_symptoms)

desc, pre, med, die, wrkout = helper(predicted_disease)

print("=================predicted disease============")
print(predicted_disease)
print("=================description==================")
print(desc)
print("=================precautions==================")
i = 1
for p_i in pre[0]:
    print(i, ": ", p_i)
    i += 1

print("=================medications==================")
for m_i in med:
    print(i, ": ", m_i)
    i += 1

print("=================workout==================")
for w_i in wrkout:
    print(i, ": ", w_i)
    i += 1

print("=================diets==================")
for d_i in die:
    print(i, ": ", d_i)
    i += 1


['high_fever']
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
AIDS
AIDS
AIDS (Acquired Immunodeficiency Syndrome) is a disease caused by HIV that weakens the immune system.
1 :  avoid open cuts
2 :  wear ppe if possible
3 :  consult doctor
4



In [None]:
# Test 1
# Split the user's input into a list of symptoms (assuming they are comma-separated) # yellow_crust_ooze,red_sore_around_nose,small_dents_in_nails,inflammatory_nails,blister
symptoms = input("Enter your symptoms.......")
user_symptoms = [s.strip() for s in symptoms.split(',')]
# Remove any extra characters, if any
user_symptoms = [symptom.strip("[]' ") for symptom in user_symptoms]
predicted_disease = get_predicted_value(user_symptoms)

desc, pre, med, die, wrkout = helper(predicted_disease)

print("=================predicted disease============")
print(predicted_disease)
print("=================description==================")
print(desc)
print("=================precautions==================")
i = 1
for p_i in pre[0]:
    print(i, ": ", p_i)
    i += 1

print("=================medications==================")
for m_i in med:
    print(i, ": ", m_i)
    i += 1

print("=================workout==================")
for w_i in wrkout:
    print(i, ": ", w_i)
    i += 1

print("=================diets==================")
for d_i in die:
    print(i, ": ", d_i)
    i += 1


[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
1
AIDS
AIDS
AIDS (Acquired Immunodeficiency Syndrome) is a disease caused by HIV that weakens the immune system.
1 :  avoid open cuts
2 :  wear ppe if possible
3 :  consult doctor
4 :  follow up

In [None]:
# let's use pycharm flask app
# but install this version in pycharm
import sklearn
print(sklearn.__version__)

1.3.2
