## COVID 19 Symptom Checker using Unsupervised Learning

### I got the Dataset from Kaggle.com and below is the author's note
About Dataset
These data will help to identify whether any person is having a coronavirus disease or not based on some pre-defined standard symptoms. These symptoms are based on guidelines given by the World Health Organization (WHO)who.int and the Ministry of Health and Family Welfare, India.

*Disclaimer: The results or analysis of these data should not be taken as medical advice.*

The dataset contains seven major variables that will be having an impact on whether someone has coronavirus disease or not, the description of each variable are as follows,
Country: List of countries person visited.
Age: Classification of the age group for each person, based on WHO Age Group Standard
Symptoms: According to WHO, 5 are major symptoms of COVID-19, Fever, Tiredness, Difficulty in breathing, Dry cough, and sore throat.
Experience any other symptoms: Pains, Nasal Congestion, Runny Nose, Diarrhea and Other.
Severity: The level of severity, Mild, Moderate, Severe
Contact: Has the person contacted some other COVID-19 Patient

With all these categorical variables, a combination for each label in the variable will be generated and therefore, in total 316800 combinations are created.

Data: There are two CSV files uploaded,

Raw-Data: This file contains all the possible labels of variables, this file is used to generate the cleaned data.
Cleaned-Data: This file contains all possible combinations of data from Raw-Data.csv, can be used for analysis. Contains dummy variables after combination, can refer the [notebook](https://www.kaggle.com/iamhungundji/data-generation), for how this data is generated.

Application of the combined data:

Chatbot
Supervised Learning (Classification)
Unsupervised Learning (Clustering)

In [23]:
import pandas as pd

raw_data = pd.read_csv("C:/Users/Ahmad/Desktop/Python/DatasetsML/Raw-Data.csv")
raw_data.head()

Unnamed: 0,Country,Age,Gender,Symptoms,Experiencing_Symptoms,Severity,Contact
0,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Yes
1,Italy,10-19,Female,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breathing","Pains,Nasal-Congestion,Runny-Nose",Moderate,No
2,Iran,20-24,Transgender,"Fever,Tiredness,Dry-Cough","Pains,Nasal-Congestion",Severe,Dont-Know
3,Republic of Korean,25-59,,"Fever,Tiredness",Pains,,
4,France,60+,,Fever,"Nasal-Congestion,Runny-Nose,Diarrhea",,


# Data Preparation

In [24]:
raw_data.columns

Index(['Country', 'Age', 'Gender', 'Symptoms', 'Experiencing_Symptoms',
       'Severity', 'Contact'],
      dtype='object')

In [25]:
country = len(raw_data.Country.dropna().unique())
age = len(raw_data.Age.dropna().unique())
gender = len(raw_data.Gender.dropna().unique())
symptoms = len(raw_data.Symptoms.dropna().unique())
exp_symptoms = len(raw_data.Experiencing_Symptoms.dropna().unique())
severity = len(raw_data.Severity.dropna().unique())
contact = len(raw_data.Contact.dropna().unique())

print("Total Combination Possible: ", country * age * gender * symptoms * exp_symptoms * severity * contact)

Total Combination Possible:  316800


In [26]:
import itertools
columns = [raw_data.Country.dropna().unique().tolist(),
        raw_data.Age.dropna().unique().tolist(),
        raw_data.Gender.dropna().unique().tolist(),
        raw_data.Symptoms.dropna().unique().tolist(),
        raw_data.Experiencing_Symptoms.dropna().unique().tolist(),
        raw_data.Severity.dropna().unique().tolist(),
        raw_data.Contact.dropna().unique().tolist()]

prep_data = pd.DataFrame(list(itertools.product(*columns)), columns=raw_data.columns)
prep_data.shape

(316800, 7)

In [27]:
prep_data.head()

Unnamed: 0,Country,Age,Gender,Symptoms,Experiencing_Symptoms,Severity,Contact
0,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Yes
1,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,No
2,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Dont-Know
3,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Moderate,Yes
4,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Moderate,No


In [28]:
symptoms_list = prep_data['Symptoms'].str.split(',')

from collections import Counter
symptoms_counter = Counter(([a for b in symptoms_list.tolist() for a in b]))

for symptom in symptoms_counter.keys():
    prep_data[symptom] = 0
    prep_data.loc[prep_data['Symptoms'].str.contains(symptom), symptom] = 1

prep_data.head()

Unnamed: 0,Country,Age,Gender,Symptoms,Experiencing_Symptoms,Severity,Contact,Fever,Tiredness,Dry-Cough,Difficulty-in-Breathing,Sore-Throat,None_Sympton
0,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Yes,1,1,1,1,1,0
1,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,No,1,1,1,1,1,0
2,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Dont-Know,1,1,1,1,1,0
3,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Moderate,Yes,1,1,1,1,1,0
4,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Moderate,No,1,1,1,1,1,0


In [29]:
esymptoms_list = prep_data['Experiencing_Symptoms'].str.split(',')

from collections import Counter
exp_symptoms_counter = Counter(([a for b in esymptoms_list.tolist() for a in b]))

for esymptom in exp_symptoms_counter.keys():
    prep_data[esymptom] = 0
    prep_data.loc[prep_data['Experiencing_Symptoms'].str.contains(esymptom), esymptom] = 1

prep_data.head()

Unnamed: 0,Country,Age,Gender,Symptoms,Experiencing_Symptoms,Severity,Contact,Fever,Tiredness,Dry-Cough,Difficulty-in-Breathing,Sore-Throat,None_Sympton,Pains,Nasal-Congestion,Runny-Nose,Diarrhea,None_Experiencing
0,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Yes,1,1,1,1,1,0,1,1,1,1,0
1,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,No,1,1,1,1,1,0,1,1,1,1,0
2,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Mild,Dont-Know,1,1,1,1,1,0,1,1,1,1,0
3,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Moderate,Yes,1,1,1,1,1,0,1,1,1,1,0
4,China,0-9,Male,"Fever,Tiredness,Dry-Cough,Difficulty-in-Breath...","Pains,Nasal-Congestion,Runny-Nose,Diarrhea",Moderate,No,1,1,1,1,1,0,1,1,1,1,0


In [30]:
prep_data = prep_data.drop(['Symptoms', 'Experiencing_Symptoms'], axis=1)
dummies = pd.get_dummies(prep_data.drop('Country', axis=1))
dummies['Country'] = prep_data['Country']
prep_data = dummies
prep_data.head()

Unnamed: 0,Fever,Tiredness,Dry-Cough,Difficulty-in-Breathing,Sore-Throat,None_Sympton,Pains,Nasal-Congestion,Runny-Nose,Diarrhea,...,Gender_Male,Gender_Transgender,Severity_Mild,Severity_Moderate,Severity_None,Severity_Severe,Contact_Dont-Know,Contact_No,Contact_Yes,Country
0,1,1,1,1,1,0,1,1,1,1,...,1,0,1,0,0,0,0,0,1,China
1,1,1,1,1,1,0,1,1,1,1,...,1,0,1,0,0,0,0,1,0,China
2,1,1,1,1,1,0,1,1,1,1,...,1,0,1,0,0,0,1,0,0,China
3,1,1,1,1,1,0,1,1,1,1,...,1,0,0,1,0,0,0,0,1,China
4,1,1,1,1,1,0,1,1,1,1,...,1,0,0,1,0,0,0,1,0,China


In [31]:
prep_data.shape

(316800, 27)

In [32]:
prep_data.to_csv('Prep-Covid-Data', index=False, header=True)

##### Data has been prepared and is now ready for ML