### Childhood Autistic Spectrum Disorder Screening using Machine Learning



![title](unnamed.jpg)

Autism spectrum disorder (ASD) refers to a range of conditions characterised by some degree of impaired social behaviour, communication and language, and a narrow range of interests and activities that are both unique to the individual and carried out repetitively.

ASDs begin in childhood and tend to persist into adolescence and adulthood. In most cases the conditions are apparent during the first 5 years of life.

Individuals with ASD often present other co-occurring conditions, including epilepsy, depression, anxiety and attention deficit hyperactivity disorder (ADHD). The level of intellectual functioning in individuals with ASDs is extremely variable, extending from profound impairment to superior levels.

![title](20192020graph.jpg)

This graph above shows that the symptoms of this disorder is also see to be grater amongst students as well over the years. So it is important look into this at a young age itself.

![title](AutismAwareness2stats.jpg)

The early diagnosis of neurodevelopment disorders can improve treatment and significantly decrease the associated 
healthcare costs. In this project, i have used supervised learning classification algorithem to diagnose Autistic Spectrum Disorder 
(ASD) based on behavioural features and individual characteristics. More specifically, we will build and deploy a neural network using the Keras API. 

This project will use a dataset provided by the UCI Machine Learning Repository that contains screening data for 292 patients. The dataset can be found at the following URL: 
https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Children++



In [1]:
import sys
import pandas as pd
import numpy as np
import os


### Importing the Dataset

Original the Dataset dosent contain a CSV fotmat of data i have manually converted it and placed in the file structure for the project

In [2]:
# import the dataset
data =pd.read_csv('Autism-Child-Data.csv')

In [3]:
# It print the shape of the DataFrame, so we can see how many examples we have
print ('Shape of DataFrame: {}'.format(data.shape))
print (data.loc[0])

Shape of DataFrame: (292, 22)
id                          1
A1_Score                    1
A2_Score                    1
A3_Score                    0
A4_Score                    0
A5_Score                    1
A6_Score                    1
A7_Score                    0
A8_Score                    1
A9_Score                    0
A10_Score                   0
age                         6
gender                      m
ethnicity              Others
jundice                    no
austim                     no
contry_of_res          Jordan
used_app_before            no
result                      5
age_desc           4-11 years
relation               Parent
Class/ASD                  NO
Name: 0, dtype: object


In [4]:
# This print out multiple patients at the same time
data.loc[:10]

Unnamed: 0,id,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,...,gender,ethnicity,jundice,austim,contry_of_res,used_app_before,result,age_desc,relation,Class/ASD
0,1,1,1,0,0,1,1,0,1,0,...,m,Others,no,no,Jordan,no,5,4-11 years,Parent,NO
1,2,1,1,0,0,1,1,0,1,0,...,m,Middle Eastern,no,no,Jordan,no,5,4-11 years,Parent,NO
2,3,1,1,0,0,0,1,1,1,0,...,m,?,no,no,Jordan,yes,5,4-11 years,?,NO
3,4,0,1,0,0,1,1,0,0,0,...,f,?,yes,no,Jordan,no,4,4-11 years,?,NO
4,5,1,1,1,1,1,1,1,1,1,...,m,Others,yes,no,United States,no,10,4-11 years,Parent,YES
5,6,0,0,1,0,1,1,0,1,0,...,m,?,no,yes,Egypt,no,5,4-11 years,?,NO
6,7,1,0,1,1,1,1,0,1,0,...,m,White-European,no,no,United Kingdom,no,7,4-11 years,Parent,YES
7,8,1,1,1,1,1,1,1,1,0,...,f,Middle Eastern,no,no,Bahrain,no,8,4-11 years,Parent,YES
8,9,1,1,1,1,1,1,1,0,0,...,f,Middle Eastern,no,no,Bahrain,no,7,4-11 years,Parent,YES
9,10,0,0,1,1,1,0,1,1,0,...,f,?,no,yes,Austria,no,5,4-11 years,?,NO


In [5]:
# This will print out a description of the dataframe
data.describe()

Unnamed: 0,id,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,result
count,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0,292.0
mean,146.5,0.633562,0.534247,0.743151,0.55137,0.743151,0.712329,0.606164,0.496575,0.493151,0.726027,6.239726
std,84.437354,0.482658,0.499682,0.437646,0.498208,0.437646,0.453454,0.489438,0.500847,0.500811,0.446761,2.284882
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,73.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
50%,146.5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,6.0
75%,219.25,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,8.0
max,292.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,10.0


### Data Preprocessing

This dataset require multiple preprocessing processes. First, i  have removed columns in the DataFrame (attributes) that are not needed when training our neural network. I have drop columns first. Secondly, much of our data is reported using strings; as a result,  i have converted data to categorical labels. During preprocessing, i have split the dataset into X and Y  from datasets, where X has all of the attributes we want to use for prediction and Y has the class labels. 

In [6]:
# drop unwanted columns
data = data.drop(['result', 'age_desc','id'], axis=1)

In [7]:
data.loc[:10]

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,age,gender,ethnicity,jundice,austim,contry_of_res,used_app_before,relation,Class/ASD
0,1,1,0,0,1,1,0,1,0,0,6,m,Others,no,no,Jordan,no,Parent,NO
1,1,1,0,0,1,1,0,1,0,0,6,m,Middle Eastern,no,no,Jordan,no,Parent,NO
2,1,1,0,0,0,1,1,1,0,0,6,m,?,no,no,Jordan,yes,?,NO
3,0,1,0,0,1,1,0,0,0,1,5,f,?,yes,no,Jordan,no,?,NO
4,1,1,1,1,1,1,1,1,1,1,5,m,Others,yes,no,United States,no,Parent,YES
5,0,0,1,0,1,1,0,1,0,1,4,m,?,no,yes,Egypt,no,?,NO
6,1,0,1,1,1,1,0,1,0,1,5,m,White-European,no,no,United Kingdom,no,Parent,YES
7,1,1,1,1,1,1,1,1,0,0,5,f,Middle Eastern,no,no,Bahrain,no,Parent,YES
8,1,1,1,1,1,1,1,0,0,0,11,f,Middle Eastern,no,no,Bahrain,no,Parent,YES
9,0,0,1,1,1,0,1,1,0,0,11,f,?,no,yes,Austria,no,?,NO


In [8]:
# Process of creating X and Y datasets for training

x = data.drop(['Class/ASD'], 1)
y = data['Class/ASD']

In [9]:
x.loc[:10]

Unnamed: 0,A1_Score,A2_Score,A3_Score,A4_Score,A5_Score,A6_Score,A7_Score,A8_Score,A9_Score,A10_Score,age,gender,ethnicity,jundice,austim,contry_of_res,used_app_before,relation
0,1,1,0,0,1,1,0,1,0,0,6,m,Others,no,no,Jordan,no,Parent
1,1,1,0,0,1,1,0,1,0,0,6,m,Middle Eastern,no,no,Jordan,no,Parent
2,1,1,0,0,0,1,1,1,0,0,6,m,?,no,no,Jordan,yes,?
3,0,1,0,0,1,1,0,0,0,1,5,f,?,yes,no,Jordan,no,?
4,1,1,1,1,1,1,1,1,1,1,5,m,Others,yes,no,United States,no,Parent
5,0,0,1,0,1,1,0,1,0,1,4,m,?,no,yes,Egypt,no,?
6,1,0,1,1,1,1,0,1,0,1,5,m,White-European,no,no,United Kingdom,no,Parent
7,1,1,1,1,1,1,1,1,0,0,5,f,Middle Eastern,no,no,Bahrain,no,Parent
8,1,1,1,1,1,1,1,0,0,0,11,f,Middle Eastern,no,no,Bahrain,no,Parent
9,0,0,1,1,1,0,1,1,0,0,11,f,?,no,yes,Austria,no,?


In [10]:
# to convert the data to categorical values useing one-hot-encoded vectors
X = pd.get_dummies(x)

In [11]:
# printing the new categorical column labels
X.columns.values

array(['A1_Score', 'A2_Score', 'A3_Score', 'A4_Score', 'A5_Score',
       'A6_Score', 'A7_Score', 'A8_Score', 'A9_Score', 'A10_Score',
       'age_10', 'age_11', 'age_4', 'age_5', 'age_6', 'age_7', 'age_8',
       'age_9', 'age_?', 'gender_f', 'gender_m', 'ethnicity_?',
       'ethnicity_Asian', 'ethnicity_Black', 'ethnicity_Hispanic',
       'ethnicity_Latino', 'ethnicity_Middle Eastern ',
       'ethnicity_Others', 'ethnicity_Pasifika', 'ethnicity_South Asian',
       'ethnicity_Turkish', 'ethnicity_White-European', 'jundice_no',
       'jundice_yes', 'austim_no', 'austim_yes',
       'contry_of_res_Afghanistan', 'contry_of_res_Argentina',
       'contry_of_res_Armenia', 'contry_of_res_Australia',
       'contry_of_res_Austria', 'contry_of_res_Bahrain',
       'contry_of_res_Bangladesh', 'contry_of_res_Bhutan',
       'contry_of_res_Brazil', 'contry_of_res_Bulgaria',
       'contry_of_res_Canada', 'contry_of_res_China',
       'contry_of_res_Costa Rica', 'contry_of_res_Egypt',
      

In [12]:
# printing the example patient from the categorical data
X.loc[1]

A1_Score                             1
A2_Score                             1
A3_Score                             0
A4_Score                             0
A5_Score                             1
                                    ..
relation_Health care professional    0
relation_Parent                      1
relation_Relative                    0
relation_Self                        0
relation_self                        0
Name: 1, Length: 96, dtype: int64

In [13]:
# converting the class data to categorical values - one-hot-encoded vectors
print(y)
Y = pd.get_dummies(y)#,drop_first=True)

0       NO
1       NO
2       NO
3       NO
4      YES
      ... 
287    YES
288     NO
289    YES
290    YES
291     NO
Name: Class/ASD, Length: 292, dtype: object


In [14]:
Y.iloc[:10,]


Unnamed: 0,NO,YES
0,1,0
1,1,0
2,1,0
3,1,0
4,0,1
5,1,0
6,0,1
7,0,1
8,0,1
9,1,0


### Split the Dataset into Training and Testing Datasets

Before training our neural network, the spliting of dataset into training and testing datasets need to be done. This will allow us to test our network after we are done training to determine how well it will generalize to new data. This step is incredibly easy when using the train_test_split() function present in scikit-learn!

In [15]:
from sklearn import model_selection
# split the X and Y data into training and testing datasets
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X,Y, test_size = 0.2)
print(Y_train)

     NO  YES
135   1    0
41    1    0
54    1    0
256   1    0
220   0    1
..   ..  ...
66    1    0
194   1    0
99    1    0
272   1    0
116   0    1

[233 rows x 2 columns]


In [16]:
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)

(233, 96)
(59, 96)
(233, 2)
(59, 2)


### Building the Network - Keras

In this project, i have used Keras to build and train the network. This model will be relatively simple and will only use dense (also known as fully connected) layers. This is the most common neural network layer. The network will have one hidden layer, use an Adam optimizer, and a categorical crossentropy loss.

In [17]:
#  Building the Network with Keras

from keras.models import Sequential
from keras.layers import Dense, Dropout,Activation
from keras.optimizers import Adam

# # define a function to build the keras model
# def create_model():
    # create model
model = Sequential()
model.add(Dense(8, input_dim=96, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='relu'))
model.add(Dense(2, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Activation('softmax'))
# compile model
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
# return model


print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 8)                 776       
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 36        
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 10        
_________________________________________________________________
dropout (Dropout)            (None, 2)                 0         
_________________________________________________________________
activation (Activation)      (None, 2)                 0         
Total params: 822
Trainable params: 822
Non-trainable params: 0
_________________________________________________________________
None


### Training the Network

Now it's time to train, Training a Keras model is as simple as calling model.fit().

In [18]:
# fit the model to the training data
model.fit(X_train, Y_train, epochs=100, batch_size=15, verbose = 1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x14ad2bc7108>

### Testing and Performance Metrics

Now that the model has been trained, the test of its performance on the testing dataset need to be performed. The model has never seen this information before; as a result, the testing dataset allows us to determine whether or not the model will be able to generalize to information that wasn't used during its training phase. We will use some of the metrics provided by scikit-learn to do this.

In [19]:
# generate classification report using predictions for categorical model
from sklearn.metrics import classification_report, accuracy_score

predictions = model.predict_classes(X_test)
predictions

Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).


array([0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0,
       1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int64)

In [20]:
print('Results for Categorical Model')
print(accuracy_score(Y_test[['YES']], predictions))
print(classification_report(Y_test[['YES']], predictions))

Results for Categorical Model
0.9491525423728814
              precision    recall  f1-score   support

           0       0.93      0.96      0.95        28
           1       0.97      0.94      0.95        31

    accuracy                           0.95        59
   macro avg       0.95      0.95      0.95        59
weighted avg       0.95      0.95      0.95        59



## This disorder even though it is also seen in adults the chances of it getting known is much better compared to infants and kids  incomparison, so i belive that that this project of mine shall be of some use and shall bring awareness amongst people in ower nation.
# Thank You