## Symptom Predictor using Random Tree Classifier

[Dataset](https://www.kaggle.com/datasets/itachi9604/disease-symptom-description-dataset)

[Google Colab Notebook](https://colab.research.google.com/drive/1wWUXdjikbLgcZw05jXDoU14zZUvno3Nx)

[Github](https://github.com/z5208980/machine-learning-health/tree/main/symptom_predictor)

Diagnosis of a health-related problem is important in order to treat and prevent further health problems. On some occasions, dignosis can be overlooked and hence can lead to adverse events if gone untreated.

The dataset can be used to assist medical professionals will predicting disease using an ML predictor model. This should be used alongside the intuition of the doctors and can act as a tool for early disease detection which can potentially save lives.

Two datasets are required to be processed. `data.csv` contains the dataset including all the features and targets and `severity.csv` which the weights for each of the symptoms that will be classified. The process of creating a dataset for the model is to add the weights provided in `severity.csv`

The main focus is `data.csv` where the column `Disease` will be the target that contains a predicted Disease. There is a max input of 17 symptoms can the input can range from 1 to 17. The way that the model will be used in providing the input with a list of valid symptoms. That can be found via this line of code,  

```py
df_severity['Symptom'].unique()
```

<details>
  <summary>Exhaustive list of Symptoms</summary>
  
  - itching
  - skin_rash
  - nodal_skin_eruptions
  - continuous_sneezing
  - shivering
  - chills
  - joint_pain
  - stomach_pain
  - acidity
  - ulcers_on_tongue
  - muscle_wasting
  - vomiting
  - burning_micturition
  - spotting_urination
  - fatigue
  - weight_gain
  - anxiety
  - cold_hands_and_feets
  - mood_swings
  - weight_loss
  - restlessness
  - lethargy
  - patches_in_throat
  - irregular_sugar_level
  - cough
  - high_fever
  - sunken_eyes
  - breathlessness
  - sweating
  - dehydration
  - indigestion
  - headache
  - yellowish_skin
  - dark_urine
  - nausea
  - loss_of_appetite
  - pain_behind_the_eyes
  - back_pain
  - constipation
  - abdominal_pain
  - diarrhoea
  - mild_fever
  - yellow_urine
  - yellowing_of_eyes
  - acute_liver_failure
  - fluid_overload
  - swelling_of_stomach
  - swelled_lymph_nodes
  - malaise
  - blurred_and_distorted_vision
  - phlegm
  - throat_irritation
  - redness_of_eyes
  - sinus_pressure
  - runny_nose
  - congestion
  - chest_pain
  - weakness_in_limbs
  - fast_heart_rate
  - pain_during_bowel_movements
  - pain_in_anal_region
  - bloody_stool
  - irritation_in_anus
  - neck_pain
  - dizziness
  - cramps
  - bruising
  - obesity
  - swollen_legs
  - swollen_blood_vessels
  - puffy_face_and_eyes
  - enlarged_thyroid
  - brittle_nails
  - swollen_extremeties
  - excessive_hunger
  - extra_marital_contacts
  - drying_and_tingling_lips
  - slurred_speech
  - knee_pain
  - hip_joint_pain
  - muscle_weakness
  - stiff_neck
  - swelling_joints
  - movement_stiffness
  - spinning_movements
  - loss_of_balance
  - unsteadiness
  - weakness_of_one_body_side
  - loss_of_smell
  - bladder_discomfort
  - foul_smell_ofurine
  - continuous_feel_of_urine
  - passage_of_gases
  - internal_itching
  - toxic_look_(typhos)
  - depression
  - irritability
  - muscle_pain
  - altered_sensorium
  - red_spots_over_body
  - belly_pain
  - abnormal_menstruation
  - dischromic_patches
  - watering_from_eyes
  - increased_appetite
  - polyuria
  - family_history
  - mucoid_sputum
  - rusty_sputum
  - lack_of_concentration
  - visual_disturbances
  - receiving_blood_transfusion
  - receiving_unsterile_injections
  - coma
  - stomach_bleeding
  - distention_of_abdomen
  - history_of_alcohol_consumption
  - blood_in_sputum
  - prominent_veins_on_calf
  - palpitations
  - painful_walking
  - pus_filled_pimples
  - blackheads
  - scurring
  - skin_peeling
  - silver_like_dusting
  - small_dents_in_nails
  - inflammatory_nails
  - blister
  - red_sore_around_nose
  - yellow_crust_ooze
  - prognosis
</details>


In [None]:
import numpy as np
import pandas as pd
import pickle

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import binarize, LabelEncoder, MinMaxScaler
from sklearn import metrics
from sklearn.metrics import accuracy_score, mean_squared_error, precision_recall_curve
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Loading and seeking the data

# Load dataset
df = pd.read_csv('https://raw.githubusercontent.com/z5208980/machine-learning-health/main/symptom_predictor/data/data.csv')

print(f"There have {df.shape[0]} rows with {df.shape[1]} columns including targets")

# Seek the dataset
df.head(5) 

There have 4920 rows with 18 columns including targets


Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4,Symptom_5,Symptom_6,Symptom_7,Symptom_8,Symptom_9,Symptom_10,Symptom_11,Symptom_12,Symptom_13,Symptom_14,Symptom_15,Symptom_16,Symptom_17
0,Fungal infection,itching,skin_rash,nodal_skin_eruptions,dischromic _patches,,,,,,,,,,,,,
1,Fungal infection,skin_rash,nodal_skin_eruptions,dischromic _patches,,,,,,,,,,,,,,
2,Fungal infection,itching,nodal_skin_eruptions,dischromic _patches,,,,,,,,,,,,,,
3,Fungal infection,itching,skin_rash,dischromic _patches,,,,,,,,,,,,,,
4,Fungal infection,itching,skin_rash,nodal_skin_eruptions,,,,,,,,,,,,,,


In [None]:
# Load severity weights for symptoms

# Load dataset
df_severity = pd.read_csv('https://raw.githubusercontent.com/z5208980/machine-learning-health/main/symptom_predictor/data/severity.csv')

# Seek the dataset
df_severity.head(5)

Unnamed: 0,Symptom,weight
0,itching,1
1,skin_rash,3
2,nodal_skin_eruptions,4
3,continuous_sneezing,4
4,shivering,5


In [None]:
# Use to encode the data as given weights
symptoms = df_severity.set_index('Symptom').T.to_dict()
weights = {}
for i in symptoms:
  weights[i] = symptoms[i]['weight']

In [None]:
df = df.fillna(0)

y = df.Disease
X = df.drop('Disease', axis=1)

for feature in X:
  X[feature] = X[feature].str.strip()
  X[feature] = X[feature].str.replace(' ', '_') # filled in
  X[feature] = X[feature].str.replace('__', '_') # filled in

  X[feature] = X[feature].map(weights)
  X[feature] = X[feature].fillna(0)

# Save
# filename = '/content/sample_data/processed.csv'
# df = X.append(y, ignore_index=True)
# df.to_csv(filename, index=False)

The choosen model use is **RandomForestClassifier** which yields a 98% accurancy in training and testing. No parameters is required.

## Using the model

As mention to use the model, it requires a list of syptoms given in `severity.csv`. To validate if the inputs are valid, the in process of checking if the symptoms exist in the columns and if the input list length if between 1 and 17.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=200)

def Knn():
    model = RandomForestClassifier()
    model.fit(X_train, y_train)
    
    y_pred_class = model.predict(X_test)

    print('RESULT')
    print('Accuracy:', metrics.accuracy_score(y_test, y_pred_class))

    return model

model = Knn()
filename = '/content/sample_data/model.sav'
pickle.dump(model, open(filename, 'wb'))

RESULT
Accuracy: 0.9853658536585366


In [None]:
def input_training(row):
  val = []
  for x in X_train.iloc[row]:
    val.append(x)

  return [val]

def input_custom(symptoms):
  val = []
  for index, value in enumerate(symptoms):
    try:
      val.append(weights[value])   # Encode them by the weights
    except:
      val.append(0)

  # Fill remaining symptoms list with zeros
  for i in range(0,17-len(symptoms)):
    val.append(0)

  return [val]

In [None]:
model = pickle.load(open('/content/sample_data/model.sav', 'rb'))   # load model

def custom_predict(symptoms):
  input = input_custom(symptoms)
  print(input)
  output = model.predict(input)

  print("X=%s, Predicted=%s" % (input[0], output[0]))

def training_predict(row):
  input = input_training(row)
  output = model.predict(input)

  print("X=%s, Predicted=%s, Actually=%s" % (input[0], output[0], y_train.iloc[row]))

In [None]:
symptoms = ['itching', 'skin_rash', 'nodal_skin_eruptions', 'dischromic _patches']
custom_predict(symptoms)

# row = 89
# training_predict(row)

[[1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
X=[1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], Predicted=Fungal infection
