# Project 

## Autistic Spectrum Disorder Screening using Machine Learning

The early detection and diagnosis of neurodevelopment disorders can improve treatment and significantly decrease the associated 
healthcare costs. This particular project is focusing on performing childhood autistic spectrum Disorder (ASD) screening based on individual characteristics and behavioral patterns using supervised learning. Also, we will build and deploy a neural network using the Keras API. 

This project will use a dataset provided by the UCI Machine Learning Repository that contains screening data for 292 patients. The dataset can be found at the following URL: 
https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++

### Importing Libraries and Loading the Data

I will import the data set as a Pandas DataFrame. I will obtain the data from the UCI Machine Learning Repository. 

In [66]:
import warnings

In [68]:
import pandas as pd
import sklearn
import keras

In [69]:
# Loading the Dataset
autism_data = pd.read_csv('C:/DS_Project/autism-diagnosis/autism-data.csv')

In [70]:
# print the shape of the DataFrame to see how many examples I have
print('Shape of the DataFrame :', autism_data.shape)
print (autism_data.loc[0])

Shape of the DataFrame : (292, 21)
A1_Score                               1
A2_Score                               1
A3_Score                               0
A4_Score                               0
A5_Score                               1
A6_Score                               1
A7_Score                               0
A8_Score                               1
A9_Score                               0
A10_Score                              0
age                                    6
gender                                 m
ethnicity                         Others
jundice                               no
family_history_of_austim              no
country_of_res                    Jordan
used_app_before                       no
result                                 5
age_desc                    '4-11 years'
relation                          Parent
class                                 NO
Name: 0, dtype: object


### Exploratory analysis of the Dataset

In [71]:
# Print out multiple patients at the same time
autism_data.head(10)

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,...,gender,ethnicity,jundice,family_history_of_austim,country_of_res,used_app_before,result,age_desc,relation,class
0,1,1,0,0,1,1,0,1,0,0,...,m,Others,no,no,Jordan,no,5,'4-11 years',Parent,NO
1,1,1,0,0,1,1,0,1,0,0,...,m,'Middle Eastern ',no,no,Jordan,no,5,'4-11 years',Parent,NO
2,1,1,0,0,0,1,1,1,0,0,...,m,?,no,no,Jordan,yes,5,'4-11 years',?,NO
3,0,1,0,0,1,1,0,0,0,1,...,f,?,yes,no,Jordan,no,4,'4-11 years',?,NO
4,1,1,1,1,1,1,1,1,1,1,...,m,Others,yes,no,'United States',no,10,'4-11 years',Parent,YES
5,0,0,1,0,1,1,0,1,0,1,...,m,?,no,yes,Egypt,no,5,'4-11 years',?,NO
6,1,0,1,1,1,1,0,1,0,1,...,m,White-European,no,no,'United Kingdom',no,7,'4-11 years',Parent,YES
7,1,1,1,1,1,1,1,1,0,0,...,f,'Middle Eastern ',no,no,Bahrain,no,8,'4-11 years',Parent,YES
8,1,1,1,1,1,1,1,0,0,0,...,f,'Middle Eastern ',no,no,Bahrain,no,7,'4-11 years',Parent,YES
9,0,0,1,1,1,0,1,1,0,0,...,f,?,no,yes,Austria,no,5,'4-11 years',?,NO


In [72]:
# print out a description analysis of the dataframe
autism_data.describe()

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,result
count,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0
mean,0.633562,0.534247,0.743151,0.55137,0.743151,0.712329,0.606164,0.496575,0.493151,0.726027,6.239726
std,0.482658,0.499682,0.437646,0.498208,0.437646,0.453454,0.489438,0.500847,0.500811,0.446761,2.284882
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
50%,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,6.0
75%,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,8.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,10.0


In [73]:
# Checking the datatypes of the dataframe to determine which features (attributes) are numerical and aren't.
autism_data.dtypes

A1_Score                     int64
A2_Score                     int64
A3_Score                     int64
A4_Score                     int64
A5_Score                     int64
A6_Score                     int64
A7_Score                     int64
A8_Score                     int64
A9_Score                     int64
A10_Score                    int64
age                         object
gender                      object
ethnicity                   object
jundice                     object
family_history_of_austim    object
country_of_res              object
used_app_before             object
result                       int64
age_desc                    object
relation                    object
class                       object
dtype: object

### Data Preprocessing

There are some columns in the DataFrame that we don't want to use when training the neural network. I will drop these columns first. Secondly, much of our data is reported using strings; as a result, we will convert our data to categorical labels. During our preprocessing, we will also split the dataset into X and Y datasets, where X has all of the attributes we want to use for prediction and Y has the class labels. 


In [74]:
# drop unwanted columns
autism_data = autism_data.drop(['result', 'age_desc'], axis=1)

In [75]:
autism_data.head(10)

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,age,gender,ethnicity,jundice,family_history_of_austim,country_of_res,used_app_before,relation,class
0,1,1,0,0,1,1,0,1,0,0,6,m,Others,no,no,Jordan,no,Parent,NO
1,1,1,0,0,1,1,0,1,0,0,6,m,'Middle Eastern ',no,no,Jordan,no,Parent,NO
2,1,1,0,0,0,1,1,1,0,0,6,m,?,no,no,Jordan,yes,?,NO
3,0,1,0,0,1,1,0,0,0,1,5,f,?,yes,no,Jordan,no,?,NO
4,1,1,1,1,1,1,1,1,1,1,5,m,Others,yes,no,'United States',no,Parent,YES
5,0,0,1,0,1,1,0,1,0,1,4,m,?,no,yes,Egypt,no,?,NO
6,1,0,1,1,1,1,0,1,0,1,5,m,White-European,no,no,'United Kingdom',no,Parent,YES
7,1,1,1,1,1,1,1,1,0,0,5,f,'Middle Eastern ',no,no,Bahrain,no,Parent,YES
8,1,1,1,1,1,1,1,0,0,0,11,f,'Middle Eastern ',no,no,Bahrain,no,Parent,YES
9,0,0,1,1,1,0,1,1,0,0,11,f,?,no,yes,Austria,no,?,NO


Now we will split the dataset into an x and y coordinates where x will be all of the features (attributes) used for predictions and our y will just be the class label indicating the child has autism or not.

In [76]:
# create X and Y datasets for training
x = autism_data.drop(['class'], 1)
y = autism_data['class']

In [77]:
# Check to make sure the class variable is removed from the x coordinate
x.head(5)

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,age,gender,ethnicity,jundice,family_history_of_austim,country_of_res,used_app_before,relation
0,1,1,0,0,1,1,0,1,0,0,6,m,Others,no,no,Jordan,no,Parent
1,1,1,0,0,1,1,0,1,0,0,6,m,'Middle Eastern ',no,no,Jordan,no,Parent
2,1,1,0,0,0,1,1,1,0,0,6,m,?,no,no,Jordan,yes,?
3,0,1,0,0,1,1,0,0,0,1,5,f,?,yes,no,Jordan,no,?
4,1,1,1,1,1,1,1,1,1,1,5,m,Others,yes,no,'United States',no,Parent


We still have some variables (columns) that have datatype of strings. Therefore, we will need to convert the string data into categorical variables using one hot encoding vector. Basically mapping string value to integer values using binary vector of 0 and 1 vaules.

In [78]:
# Convert the data to categorical values - one-hot-encoded vectors
X = pd.get_dummies(x)

In [79]:
X.head(10)

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,...,country_of_res_Syria,country_of_res_Turkey,used_app_before_no,used_app_before_yes,relation_'Health care professional',relation_?,relation_Parent,relation_Relative,relation_Self,relation_self
0,1,1,0,0,1,1,0,1,0,0,...,0,0,1,0,0,0,1,0,0,0
1,1,1,0,0,1,1,0,1,0,0,...,0,0,1,0,0,0,1,0,0,0
2,1,1,0,0,0,1,1,1,0,0,...,0,0,0,1,0,1,0,0,0,0
3,0,1,0,0,1,1,0,0,0,1,...,0,0,1,0,0,1,0,0,0,0
4,1,1,1,1,1,1,1,1,1,1,...,0,0,1,0,0,0,1,0,0,0
5,0,0,1,0,1,1,0,1,0,1,...,0,0,1,0,0,1,0,0,0,0
6,1,0,1,1,1,1,0,1,0,1,...,0,0,1,0,0,0,1,0,0,0
7,1,1,1,1,1,1,1,1,0,0,...,0,0,1,0,0,0,1,0,0,0
8,1,1,1,1,1,1,1,0,0,0,...,0,0,1,0,0,0,1,0,0,0
9,0,0,1,1,1,0,1,1,0,0,...,0,0,1,0,0,1,0,0,0,0


In [80]:
# print the new categorical column labels
X.columns.values

array(['A1_Score', 'A2_Score', 'A3_Score', 'A4_Score', 'A5_Score',
       'A6_Score', 'A7_Score', 'A8_Score', 'A9_Score', 'A10_Score',
       'age_10', 'age_11', 'age_4', 'age_5', 'age_6', 'age_7', 'age_8',
       'age_9', 'age_?', 'gender_f', 'gender_m',
       "ethnicity_'Middle Eastern '", "ethnicity_'South Asian'",
       'ethnicity_?', 'ethnicity_Asian', 'ethnicity_Black',
       'ethnicity_Hispanic', 'ethnicity_Latino', 'ethnicity_Others',
       'ethnicity_Pasifika', 'ethnicity_Turkish',
       'ethnicity_White-European', 'jundice_no', 'jundice_yes',
       'family_history_of_austim_no', 'family_history_of_austim_yes',
       "country_of_res_'Costa Rica'", "country_of_res_'Isle of Man'",
       "country_of_res_'New Zealand'", "country_of_res_'Saudi Arabia'",
       "country_of_res_'South Africa'", "country_of_res_'South Korea'",
       "country_of_res_'U.S. Outlying Islands'",
       "country_of_res_'United Arab Emirates'",
       "country_of_res_'United Kingdom'",
       "count

In [81]:
# print an example patient from the categorical data
X.loc[1]

A1_Score             1
A2_Score             1
A3_Score             0
A4_Score             0
A5_Score             1
                    ..
relation_?           0
relation_Parent      1
relation_Relative    0
relation_Self        0
relation_self        0
Name: 1, Length: 96, dtype: int64

In [82]:
# convert the class data to categorical values - one-hot-encoded vectors
Y = pd.get_dummies(y)

In [83]:
Y.iloc[:10]

Unnamed: 0,NO,YES
0,1,0
1,1,0
2,1,0
3,1,0
4,0,1
5,1,0
6,0,1
7,0,1
8,0,1
9,1,0


### Split the Dataset into Training and Testing Datasets

Now we need to split the dataset into training and testing datasets. This will allow us to test our network after we are done training to determine how well it will generalize to new data.

In [84]:
from sklearn import model_selection
# split the X and Y data into training and testing datasets
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = 0.2)

In [85]:
# Print out the shape of the X and Y coordinates
print (X_train.shape)
print (X_test.shape)
print (Y_train.shape)
print (Y_test.shape)

(233, 96)
(59, 96)
(233, 2)
(59, 2)


### Building the Neural Network - Keras

For this project, we are going to use Keras to build and train our network. The neural network will have one hidden layer, use an Adam optimizer, and a categorical crossentropy loss.

In [86]:
# build a neural network using Keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

In [87]:
# define a function to build the keras model
def create_model():
    # create model
    classifier = Sequential()
    classifier.add(Dense(8, input_dim=96, init ='normal', activation = 'relu'))
    classifier.add(Dense(4, init ='normal', activation = 'relu'))
    classifier.add(Dense(2, activation = 'sigmoid'))
    
    #compile model
    adam = Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model



In [88]:
# Create the model
model = create_model()
print(model.summary())

  """
  


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 8)                 776       
_________________________________________________________________
dense_5 (Dense)              (None, 4)                 36        
_________________________________________________________________
dense_6 (Dense)              (None, 2)                 10        
Total params: 822
Trainable params: 822
Non-trainable params: 0
_________________________________________________________________
None


### Training the Network

Now we will train the neural network by calling the model.fit() method.

In [89]:
# fit the model to the training data
nn_model = model.fit(X_train, Y_train, epochs=50, batch_size=10, verbose = 1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x65fdf9448>

### Testing and Performance Metrics

Now we need to test its performance on the testing dataset. The testing dataset allows us to determine whether or not the model will be able to generalize to information that wasn't used during its training phase. We will use some of the metrics provided by scikit-learn for this purpose.

In [101]:
# Generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score

predictions = model.predict_classes(X_test)
predictions

array([0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1], dtype=int64)

In [107]:
print('Results for Categorical Model')
print(accuracy_score(Y_test[['YES']], predictions))
print(classification_report(Y_test[['YES']], predictions))

Results for Categorical Model
0.9661016949152542
              precision    recall  f1-score   support

           0       0.97      0.97      0.97        29
           1       0.97      0.97      0.97        30

    accuracy                           0.97        59
   macro avg       0.97      0.97      0.97        59
weighted avg       0.97      0.97      0.97        59

