# Project Discription
A Medical Recommendation System is an AI-powered tool that provides personalized health suggestions based on user symptoms. It helps users identify possible conditions, recommends treatments or medications, and suggests whether to consult a doctor and diet. This system aims to improve early diagnosis, reduce unnecessary hospital visits, and support better health decisions.

it is an multiclass classification problem

# import Library

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("Dataset/Training.csv")
df.head()

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze,prognosis
0,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
1,0,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
2,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
3,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection
4,1,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Fungal infection


In [3]:
df.shape

(4920, 133)

In [4]:
df['prognosis'].unique()

array(['Fungal infection', 'Allergy', 'GERD', 'Chronic cholestasis',
       'Drug Reaction', 'Peptic ulcer diseae', 'AIDS', 'Diabetes ',
       'Gastroenteritis', 'Bronchial Asthma', 'Hypertension ', 'Migraine',
       'Cervical spondylosis', 'Paralysis (brain hemorrhage)', 'Jaundice',
       'Malaria', 'Chicken pox', 'Dengue', 'Typhoid', 'hepatitis A',
       'Hepatitis B', 'Hepatitis C', 'Hepatitis D', 'Hepatitis E',
       'Alcoholic hepatitis', 'Tuberculosis', 'Common Cold', 'Pneumonia',
       'Dimorphic hemmorhoids(piles)', 'Heart attack', 'Varicose veins',
       'Hypothyroidism', 'Hyperthyroidism', 'Hypoglycemia',
       'Osteoarthristis', 'Arthritis',
       '(vertigo) Paroymsal  Positional Vertigo', 'Acne',
       'Urinary tract infection', 'Psoriasis', 'Impetigo'], dtype=object)

In [5]:
len(df['prognosis'].unique())

41

# Training test and split

In [6]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

In [7]:
X = df.drop("prognosis", axis = 1)
print(X)

# Convert first row to dictionary
# symptons_dict = X.iloc[0].to_dict()


      itching  skin_rash  nodal_skin_eruptions  continuous_sneezing  \
0           1          1                     1                    0   
1           0          1                     1                    0   
2           1          0                     1                    0   
3           1          1                     0                    0   
4           1          1                     1                    0   
...       ...        ...                   ...                  ...   
4915        0          0                     0                    0   
4916        0          1                     0                    0   
4917        0          0                     0                    0   
4918        0          1                     0                    0   
4919        0          1                     0                    0   

      shivering  chills  joint_pain  stomach_pain  acidity  ulcers_on_tongue  \
0             0       0           0             0        0         

In [8]:
symptons_dict = {col: i for i, col in enumerate(X.columns)}
print(symptons_dict)

{'itching': 0, 'skin_rash': 1, 'nodal_skin_eruptions': 2, 'continuous_sneezing': 3, 'shivering': 4, 'chills': 5, 'joint_pain': 6, 'stomach_pain': 7, 'acidity': 8, 'ulcers_on_tongue': 9, 'muscle_wasting': 10, 'vomiting': 11, 'burning_micturition': 12, 'spotting_ urination': 13, 'fatigue': 14, 'weight_gain': 15, 'anxiety': 16, 'cold_hands_and_feets': 17, 'mood_swings': 18, 'weight_loss': 19, 'restlessness': 20, 'lethargy': 21, 'patches_in_throat': 22, 'irregular_sugar_level': 23, 'cough': 24, 'high_fever': 25, 'sunken_eyes': 26, 'breathlessness': 27, 'sweating': 28, 'dehydration': 29, 'indigestion': 30, 'headache': 31, 'yellowish_skin': 32, 'dark_urine': 33, 'nausea': 34, 'loss_of_appetite': 35, 'pain_behind_the_eyes': 36, 'back_pain': 37, 'constipation': 38, 'abdominal_pain': 39, 'diarrhoea': 40, 'mild_fever': 41, 'yellow_urine': 42, 'yellowing_of_eyes': 43, 'acute_liver_failure': 44, 'fluid_overload': 45, 'swelling_of_stomach': 46, 'swelled_lymph_nodes': 47, 'malaise': 48, 'blurred_and

In [9]:
y = df['prognosis']
y

0                              Fungal infection
1                              Fungal infection
2                              Fungal infection
3                              Fungal infection
4                              Fungal infection
                         ...                   
4915    (vertigo) Paroymsal  Positional Vertigo
4916                                       Acne
4917                    Urinary tract infection
4918                                  Psoriasis
4919                                   Impetigo
Name: prognosis, Length: 4920, dtype: object

In [10]:
le = LabelEncoder()
le.fit(y)
Y = le.transform(y)
# Y  # 15 represent fungal infection

label_dict = {
    'original_labels': y,
    'encoded_labels': Y.tolist(),  # convert NumPy array to list for readability
    'label_mapping': {label: int(le.transform([label])[0]) for label in le.classes_}
}

# reverse_mapping
diseases_list = {v: k for k, v in label_dict['label_mapping'].items()}
diseases_list

{0: '(vertigo) Paroymsal  Positional Vertigo',
 1: 'AIDS',
 2: 'Acne',
 3: 'Alcoholic hepatitis',
 4: 'Allergy',
 5: 'Arthritis',
 6: 'Bronchial Asthma',
 7: 'Cervical spondylosis',
 8: 'Chicken pox',
 9: 'Chronic cholestasis',
 10: 'Common Cold',
 11: 'Dengue',
 12: 'Diabetes ',
 13: 'Dimorphic hemmorhoids(piles)',
 14: 'Drug Reaction',
 15: 'Fungal infection',
 16: 'GERD',
 17: 'Gastroenteritis',
 18: 'Heart attack',
 19: 'Hepatitis B',
 20: 'Hepatitis C',
 21: 'Hepatitis D',
 22: 'Hepatitis E',
 23: 'Hypertension ',
 24: 'Hyperthyroidism',
 25: 'Hypoglycemia',
 26: 'Hypothyroidism',
 27: 'Impetigo',
 28: 'Jaundice',
 29: 'Malaria',
 30: 'Migraine',
 31: 'Osteoarthristis',
 32: 'Paralysis (brain hemorrhage)',
 33: 'Peptic ulcer diseae',
 34: 'Pneumonia',
 35: 'Psoriasis',
 36: 'Tuberculosis',
 37: 'Typhoid',
 38: 'Urinary tract infection',
 39: 'Varicose veins',
 40: 'hepatitis A'}

In [11]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size= 0.3, random_state= 20)

In [12]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((3444, 132), (1476, 132), (3444,), (1476,))

# Modeling

In [13]:
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB

# to check Accuracy
from sklearn.metrics import accuracy_score, confusion_matrix

In [14]:
# create a dictionary to store models
models = {
    "SVC" : SVC(kernel = 'linear'),
    "RandomForest" : RandomForestClassifier(n_estimators=100 , random_state= 42),
    "GradientBoosting" : GradientBoostingClassifier(n_estimators=100 , random_state= 42),
    "KNeighbors" : KNeighborsClassifier(n_neighbors= 5),
    "MultinomialNB" : MultinomialNB()
}

for model_name, model in models.items():
    # print(model_name, ":" , model)

    # train model
    model.fit(X_train,y_train) 

    # test model
    predictions = model.predict(X_test)

    # calculate the accuracy
    accuracy = accuracy_score(y_test,predictions)

    # calculate Confusion Matrix
    cm = confusion_matrix(y_test, predictions)
    cm = np.array2string(cm, separator = ', ')

    print(f"{model_name} accuracy : {accuracy}")
    print(f"{model_name} confusion Matrix: \n {cm}")

SVC accuracy : 1.0
SVC confusion Matrix: 
 [[40,  0,  0, ...,  0,  0,  0],
 [ 0, 43,  0, ...,  0,  0,  0],
 [ 0,  0, 28, ...,  0,  0,  0],
 ...,
 [ 0,  0,  0, ..., 34,  0,  0],
 [ 0,  0,  0, ...,  0, 41,  0],
 [ 0,  0,  0, ...,  0,  0, 31]]
RandomForest accuracy : 1.0
RandomForest confusion Matrix: 
 [[40,  0,  0, ...,  0,  0,  0],
 [ 0, 43,  0, ...,  0,  0,  0],
 [ 0,  0, 28, ...,  0,  0,  0],
 ...,
 [ 0,  0,  0, ..., 34,  0,  0],
 [ 0,  0,  0, ...,  0, 41,  0],
 [ 0,  0,  0, ...,  0,  0, 31]]
GradientBoosting accuracy : 1.0
GradientBoosting confusion Matrix: 
 [[40,  0,  0, ...,  0,  0,  0],
 [ 0, 43,  0, ...,  0,  0,  0],
 [ 0,  0, 28, ...,  0,  0,  0],
 ...,
 [ 0,  0,  0, ..., 34,  0,  0],
 [ 0,  0,  0, ...,  0, 41,  0],
 [ 0,  0,  0, ...,  0,  0, 31]]
KNeighbors accuracy : 1.0
KNeighbors confusion Matrix: 
 [[40,  0,  0, ...,  0,  0,  0],
 [ 0, 43,  0, ...,  0,  0,  0],
 [ 0,  0, 28, ...,  0,  0,  0],
 ...,
 [ 0,  0,  0, ..., 34,  0,  0],
 [ 0,  0,  0, ...,  0, 41,  0],
 [ 0,  0, 

# single pridiction for model

In [15]:
svc = SVC(kernel = 'linear')
svc.fit(X_train, y_train)
ypred = svc.predict(X_test)
accuracy_score(y_test, ypred)

1.0

In [16]:
# model save
import pickle 
pickle.dump(svc,open("models/svc.pkl",'wb'))

In [17]:
# load model
svc = pickle.load(open("models/svc.pkl",'rb'))
X_test

Unnamed: 0,itching,skin_rash,nodal_skin_eruptions,continuous_sneezing,shivering,chills,joint_pain,stomach_pain,acidity,ulcers_on_tongue,...,pus_filled_pimples,blackheads,scurring,skin_peeling,silver_like_dusting,small_dents_in_nails,inflammatory_nails,blister,red_sore_around_nose,yellow_crust_ooze
4037,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4191,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
432,0,0,0,0,0,0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,0
1266,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3765,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2837,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2327,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1399,1,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1906,0,0,0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
# Test 1
print("Predicted label: ", svc.predict(X_test.iloc[0].values.reshape(1, -1)))
print("Actual label: ", y_test[0])

Predicted label:  [40]
Actual label:  40




In [19]:
# test 2
print("Predicted label: ", svc.predict(X_test.iloc[11].values.reshape(1, -1)))
print("Actual label: ", y_test[11])

Predicted label:  [30]
Actual label:  30




In [20]:
# load Database and use logic

In [49]:
# Symtons dataset
sym_dt = pd.read_csv('Dataset/symtoms_df.csv')
# sym_dt
precautions = pd.read_csv("Dataset/precautions_df.csv")
# precautions
workout = pd.read_csv("Dataset/workout_df.csv")
print(workout)
description = pd.read_csv("Dataset/description.csv")
# description
medications = pd.read_csv("Dataset/medications.csv")
# medications
diets = pd.read_csv("Dataset/diets.csv")
# diets

     Unnamed: 0.1  Unnamed: 0           disease  \
0               0           0  Fungal infection   
1               1           1  Fungal infection   
2               2           2  Fungal infection   
3               3           3  Fungal infection   
4               4           4  Fungal infection   
..            ...         ...               ...   
405           405         405          Impetigo   
406           406         406          Impetigo   
407           407         407          Impetigo   
408           408         408          Impetigo   
409           409         409          Impetigo   

                               workout  
0                   Avoid sugary foods  
1                   Consume probiotics  
2            Increase intake of garlic  
3               Include yogurt in diet  
4                Limit processed foods  
..                                 ...  
405  Consult a healthcare professional  
406     Follow medical recommendations  
407               

In [69]:
def finder(disease):
    descr = description[description['Disease'] == disease]['Description']
    descr = " ".join([w for w in descr])
    
    pre = precautions[precautions['Disease'] == disease][['Precaution_1','Precaution_2','Precaution_3','Precaution_4']]
    pre = [col for col in pre.values]

    med = medications[medications['Disease'] == disease]['Medication']
    med = [med for med in med.values]
    
    # Step 1: Remove square brackets and split the string
    med_str = med[0].strip("[]")  # remove the outer brackets
    items = med_str.split(", ")   # split by comma and space
    # Step 2: Remove extra quotes around each item
    med = [item.strip("'\"") for item in items]

    diet = diets[diets['Disease'] == disease]['Diet']
    diet = [diet for diet in diet.values]
    diet_str = diet[0].strip("[]")  # remove the outer brackets
    items = diet_str.split(", ")   # split by comma and space
    # Step 2: Remove extra quotes around each item
    diet = [item.strip("'\"") for item in items]

    wrkout = workout[workout['disease'] == disease]['workout']
    
    return descr, pre, med, diet, wrkout

In [70]:
print(symptons_dict)
print()
print(diseases_list)

{'itching': 0, 'skin_rash': 1, 'nodal_skin_eruptions': 2, 'continuous_sneezing': 3, 'shivering': 4, 'chills': 5, 'joint_pain': 6, 'stomach_pain': 7, 'acidity': 8, 'ulcers_on_tongue': 9, 'muscle_wasting': 10, 'vomiting': 11, 'burning_micturition': 12, 'spotting_ urination': 13, 'fatigue': 14, 'weight_gain': 15, 'anxiety': 16, 'cold_hands_and_feets': 17, 'mood_swings': 18, 'weight_loss': 19, 'restlessness': 20, 'lethargy': 21, 'patches_in_throat': 22, 'irregular_sugar_level': 23, 'cough': 24, 'high_fever': 25, 'sunken_eyes': 26, 'breathlessness': 27, 'sweating': 28, 'dehydration': 29, 'indigestion': 30, 'headache': 31, 'yellowish_skin': 32, 'dark_urine': 33, 'nausea': 34, 'loss_of_appetite': 35, 'pain_behind_the_eyes': 36, 'back_pain': 37, 'constipation': 38, 'abdominal_pain': 39, 'diarrhoea': 40, 'mild_fever': 41, 'yellow_urine': 42, 'yellowing_of_eyes': 43, 'acute_liver_failure': 44, 'fluid_overload': 45, 'swelling_of_stomach': 46, 'swelled_lymph_nodes': 47, 'malaise': 48, 'blurred_and

In [71]:

# model prediction function
def get_predicted_value(patient_symptoms):
    input_vector = np.zeros(len(symptons_dict))

    for item in patient_symptoms:
        input_vector[symptons_dict[item]] = 1
    return diseases_list[svc.predict([input_vector])[0]]
    

In [74]:
# TEST 1 itching,continuous_sneezing,chills,red_sore_around_nose
symptoms = input("Enter your symtoms ....")
user_symptoms = [s.strip() for s in symptoms.split(',')]
# Remove any extra characters, if any
user_symptoms = [symptom.strip("[]' ") for symptom in user_symptoms]

predicted_disease = get_predicted_value(user_symptoms)

descr, pre, med, diet, wrkout = finder(predicted_disease)

#finale print 
print ("============================predicted_disease=========================")
print(predicted_disease)
print ("===============================Description============================")
print(descr)
print ("===============================Precautions===========================")
i = 1
for p in pre[0]:
    print(i, ": ",p)
    i += 1
print ("===============================Medications===========================")
i = 1
for m in med:
    print(f"{i} : {m}")
    i += 1
print ("==================================Diets==============================")
i = 1
for d in diet:
    print(i, ": ", d)
    i += 1
print ("=================================Workout=============================")
i = 1
for w in wrkout:
    print(i, ": ", w)
    i += 1

Enter your symtoms .... itching,continuous_sneezing,chills,red_sore_around_nose


Allergy
Allergy is an immune system reaction to a substance in the environment.
1 :  apply calamine
2 :  cover area with bandage
3 :  nan
4 :  use ice to compress itching
1 : Antihistamines
2 : Decongestants
3 : Epinephrine
4 : Corticosteroids
5 : Immunotherapy
1 :  Elimination Diet
2 :  Omega-3-rich foods
3 :  Vitamin C-rich foods
4 :  Quercetin-rich foods
5 :  Probiotics
1 :  Avoid allergenic foods
2 :  Consume anti-inflammatory foods
3 :  Include omega-3 fatty acids
4 :  Stay hydrated
5 :  Eat foods rich in vitamin C
6 :  Include quercetin-rich foods
7 :  Consume local honey
8 :  Limit processed foods
9 :  Include ginger in diet
10 :  Avoid artificial additives




In [75]:
print(predicted_disease)
print(descr)
print()
print(pre)
print()
print(med)
print()
print(diet)
print()
print(wrkout)

Allergy
Allergy is an immune system reaction to a substance in the environment.

[array(['apply calamine', 'cover area with bandage', nan,
       'use ice to compress itching'], dtype=object)]

['Antihistamines', 'Decongestants', 'Epinephrine', 'Corticosteroids', 'Immunotherapy']

['Elimination Diet', 'Omega-3-rich foods', 'Vitamin C-rich foods', 'Quercetin-rich foods', 'Probiotics']

10             Avoid allergenic foods
11    Consume anti-inflammatory foods
12        Include omega-3 fatty acids
13                      Stay hydrated
14        Eat foods rich in vitamin C
15       Include quercetin-rich foods
16                Consume local honey
17              Limit processed foods
18             Include ginger in diet
19         Avoid artificial additives
Name: workout, dtype: object


In [76]:
import sklearn
print(sklearn.__version__)

1.7.0
