## To implement Artificial Neural Network (ANN)

* Artificial Neural Networks or ANN is an information processing paradigm that is inspired by the way the biological nervous system such as brain process information.
* It is composed of large number of highly interconnected processing elements(neurons) working in unison to solve a specific problem.

#### 1. Import the necessary libraries

In [2]:
import numpy as np
import pandas as pd

#### 2. The dataset is imported using the pandas library. It is imported in a dataframe from the data.csv file

The dataset that I have selected contains 24 features (or attributes) and 194 rows (or instances). It is composed of biomedical voice measurements from 31 people, out of which 23 have Parkinson's Disease.
* Each column is a particular voice measure
* Each row corresponds to the voice recordings of these individuals
* The aim here is to segregate healthy people from the people having Parkinson's Disease using the 'status' column which is set to '0' for healthy people and '1' for people with Parkinson

In [3]:
dataset = pd.read_csv('data.csv')
x = dataset.drop(['status', 'name'], axis=1)
#x=dataset.loc[:,dataset.columns!=['status','name']].values[:,1:]
y=dataset.loc[:,'status'].values
dataset.head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,RPDE,DFA,spread1,spread2,D2,PPE,status
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654,1
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674,1
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634,1
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975,1
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335,1


In [4]:
x.shape

(195, 22)

#### 3. Preprocessing the data using LabelEncoder and OneHotEncoder before training it

LabelEncoder and OneHotEncoder are parts of the SciKit Learn library in Python, and they are used to convert categorical data, or text data, into numbers, which our predictive models can better understand.

#### 4. Splitting our dataset into Training set and Test set

In [4]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 9)

#### 5. Using StandardScaler : Standardize features by removing the mean and scaling to unit variance

x_train and x_test can also the be redefined

In [5]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

- we can check how our training and testing data look after splitting

In [6]:
x

Unnamed: 0,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,MDVP:Shimmer(dB),...,MDVP:APQ,Shimmer:DDA,NHR,HNR,RPDE,DFA,spread1,spread2,D2,PPE
0,119.992,157.302,74.997,0.00784,0.00007,0.00370,0.00554,0.01109,0.04374,0.426,...,0.02971,0.06545,0.02211,21.033,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,122.400,148.650,113.819,0.00968,0.00008,0.00465,0.00696,0.01394,0.06134,0.626,...,0.04368,0.09403,0.01929,19.085,0.458359,0.819521,-4.075192,0.335590,2.486855,0.368674
2,116.682,131.111,111.555,0.01050,0.00009,0.00544,0.00781,0.01633,0.05233,0.482,...,0.03590,0.08270,0.01309,20.651,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,116.676,137.871,111.366,0.00997,0.00009,0.00502,0.00698,0.01505,0.05492,0.517,...,0.03772,0.08771,0.01353,20.644,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,0.584,...,0.04465,0.10470,0.01767,19.649,0.417356,0.823484,-3.747787,0.234513,2.332180,0.410335
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,174.188,230.978,94.261,0.00459,0.00003,0.00263,0.00259,0.00790,0.04087,0.405,...,0.02745,0.07008,0.02764,19.517,0.448439,0.657899,-6.538586,0.121952,2.657476,0.133050
191,209.516,253.017,89.488,0.00564,0.00003,0.00331,0.00292,0.00994,0.02751,0.263,...,0.01879,0.04812,0.01810,19.147,0.431674,0.683244,-6.195325,0.129303,2.784312,0.168895
192,174.688,240.005,74.287,0.01360,0.00008,0.00624,0.00564,0.01873,0.02308,0.256,...,0.01667,0.03804,0.10715,17.883,0.407567,0.655683,-6.787197,0.158453,2.679772,0.131728
193,198.764,396.961,74.904,0.00740,0.00004,0.00370,0.00390,0.01109,0.02296,0.241,...,0.01588,0.03794,0.07223,19.020,0.451221,0.643956,-6.744577,0.207454,2.138608,0.123306


In [7]:
y

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
      dtype=int64)

In [8]:
x_train

array([[-1.06242338, -0.66754502, -0.3532205 , ..., -0.20535403,
        -1.02964228,  1.28573572],
       [-1.02385982, -0.74709692, -0.29169292, ...,  1.19181865,
         0.72127969,  0.26512469],
       [ 0.07714639, -0.34447056,  0.71963229, ..., -0.50557235,
        -0.05914897, -0.467044  ],
       ...,
       [ 1.40887799,  0.75096311, -0.92489899, ..., -0.38396174,
         0.51407764, -0.6558168 ],
       [-0.13781152, -0.3250258 ,  0.55190075, ..., -1.61730373,
        -0.05918155, -0.21792211],
       [-0.37507351,  0.03912883, -0.85278774, ..., -1.62539991,
        -1.47926063,  0.10063703]])

In [9]:
x_test

array([[-0.17124421,  0.16856864, -0.83848487, ..., -0.3187725 ,
        -0.14097798, -0.48918124],
       [ 1.0362218 ,  0.24198901,  1.71828465, ..., -0.56970609,
        -2.31827202, -1.25918071],
       [ 0.78670663,  0.09655977,  1.25708032, ...,  0.53030682,
         0.04455522, -0.51583369],
       ...,
       [-0.793931  , -0.63832141, -0.0789636 , ...,  0.41983895,
        -1.38833891,  0.07385755],
       [-0.37401922,  0.11008755, -0.80168311, ..., -0.74858967,
        -0.74852831,  0.28328946],
       [ 1.13577016,  4.20024107,  1.35449134, ...,  0.47500092,
         1.477205  ,  0.74844864]])

In [10]:
y_train

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1], dtype=int64)

In [11]:
y_test

array([1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 1], dtype=int64)

#### 6. Import Tensorflow and Keras

* Import Sequential to initialize the neural network
* Import Dense to add layers in the Neural Network

In [12]:
import tensorflow as tf
import keras

Using TensorFlow backend.


In [13]:
from keras.models import Sequential   # this is to initialize the neural network
from keras.layers import Dense   # this is to add layers in the neural network

#### 7. Create a Sequential object which will be the classifier to be used on our dataset. This initializes our ANN.

In [14]:
classifier = Sequential()  

#### 8. Add the input layers and the hidden layers using .add(Dense()) function and compile it

In [15]:
classifier.add(Dense(units=108, kernel_initializer="uniform",input_dim = 216, activation = 'relu'))

* 'input_dim' gives the number of nodes in the input layer which is equal to the number of columns in x_train
* 'output_dim/units' give the number of layers in the hidden layers, which is equal to approximately (( nodes in i/p layer + nodes in the o/p layer) / 2)
* 'kernel_initializer/init' randomly initializes the weights for each node close to zero according to a uniform distribution
* 'activation' is for the activation function used by hidden layers 

In [16]:
# adding the second hidden layer
classifier.add(Dense(units = 108, kernel_initializer = 'uniform', activation = 'relu'))

#adding the output layer
classifier.add(Dense(units = 1, kernel_initializer= 'uniform', activation = 'sigmoid'))

Compile the ANN by applying stochastic gradient descent
* optimizer is the algo to find optimal(best) set of weights that will make our NN most powerful
* loss corresponds to loss function within the stochastic GD algo
* metrics is a list of criterion to evaluate our model/improve the performance epoch by epoch

In [17]:
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

#### 9. Fit the Aritificial Neural Network (ANN) to the training set

* 'batch_size' is no. of observations after which weights are updated
* 'nb_epoch' is the no. of total epochs to be completed for training
* 'epochs' is the no. of times our ANN is going to be trained on the whole training dataset

In [18]:
classifier.fit(x_train, y_train, batch_size = 14, epochs = 100)

ValueError: Error when checking input: expected dense_1_input to have shape (216,) but got array with shape (22,)

#### 10. Predict the output using the .predict() method provided by sklearn

In [None]:
preds=classifier.predict(x_test)
preds

#### 11. Import classification report, accuracy score and confusion matrix to view the results

In [None]:
#making the confusion matrix
from sklearn.metrics import confusion_matrix,accuracy_score

* Print the <b>confusion matrix</b>

In [None]:
cm = confusion_matrix(y_test,preds.round())
print(cm)

This means that using ANN, 38 predictions were correct whereas 11 predictions were incorrect

* Print the <b>accuracy</b>

In [None]:
print('Accuracy using an ANN :',accuracy_score(y_test.tolist(), preds.round().tolist())*100)

Therefore it can be seen that the accuracy is about 77.55%