# Keras implementation

In [8]:
import keras

Using TensorFlow backend.


In [25]:
from keras.models import Sequential
from keras.layers import Dense

In [50]:
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

In [51]:
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

In [52]:
dataset.shape

(768, 9)

## split dataset in training and test datasets

In [58]:
# split into input (X) and output (Y) variables
train_size = int(dataset.shape[0]*0.7)
print (train_size)
X_train = dataset[:train_size, :8]
Y_train = dataset[:train_size, 8]
X_train.shape, Y_train.shape

537


((537, 8), (537,))

In [60]:
X_test = dataset[train_size:, :8]
Y_test = dataset[train_size:, 8]
X_test.shape, Y_test.shape

((231, 8), (231,))

## Define the model

### (a) model I: 3-layers, node numbers of input and hidden layers are 12 and 8

In [204]:
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))

We will use logarithmic loss, which for a binary classification problem is defined in Keras as “binary_crossentropy“. The efficient gradient descent algorithm “adam” is an efficient default in Keras. Learn more about the Adam optimization algorithm in the paper [“Adam: A Method for Stochastic Optimization“](https://arxiv.org/abs/1412.6980). The metric used here is "accuracy". Adam is a stochastic gradient method. 

Here we use **[relu](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)** for the activation function for the first two layers, and **sigmoid** to indicate probability for the output layer. **relu** activation function is popular in deep learning. Later we will revisit the same problem but using **sigmoid** activation and compare the performance.

In [205]:
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

#### Model fitting

We can set up number of epoches as **nb_epoch**, and size of each mini-batch as **batch_size**:

In [206]:
model.fit(X_train, Y_train, nb_epoch=150, batch_size=10, verbose=0)

<keras.callbacks.History at 0x11e89ae10>

#### Evaluate the model

In [207]:
scores = model.evaluate(X_test, Y_test)
print('  Loss: ', scores[0], ' , acc:', scores[1])

 32/231 [===>..........................] - ETA: 1s  Loss:  0.528315976069  , acc: 0.757575758092


The **acc** is the accuracy defined the ratio we have right prediction. This is the same as doing:

In [208]:
predictions = model.predict(X_test)
rounded = [round(x[0]) for x in predictions]
a = numpy.dot(rounded-Y_test, rounded-Y_test)
print(1.0-a/len(rounded))

0.757575757576


We can further observe the probability of the Pima Indian patients to have diabetes

In [209]:
print (model.predict_proba(X_test)[:10])

 32/231 [===>..........................] - ETA: 0s[[ 0.02870761]
 [ 0.2803773 ]
 [ 0.33776325]
 [ 0.64599019]
 [ 0.36038035]
 [ 0.14201277]
 [ 0.08908633]
 [ 0.12896512]
 [ 0.85859615]
 [ 0.89892173]]


#### Double mini-batch size

Now we can double the mini-batch size and train the model again:

In [198]:
model.fit(X_train, Y_train, nb_epoch=150, batch_size=20, verbose=0)
scores = model.evaluate(X_test, Y_test)
print('  Loss: ', scores[0], ' , acc:', scores[1])

 32/231 [===>..........................] - ETA: 0s  Loss:  0.572329931896  , acc: 0.78354978355


#### Implement SGD

In [149]:
from keras.optimizers import SGD
#model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

In [156]:
model.fit(X_train, Y_train, nb_epoch=150, batch_size=10, verbose=0)
scores = model.evaluate(X_test, Y_test);
print('  Loss: ', scores[0], ' , acc:', scores[1])

 32/231 [===>..........................] - ETA: 0s  Loss:  0.64257547427  , acc: 0.658008658525


### (b) model II: 3-layers, node numbers of input and hidden layers are 20 and 12

In [107]:
model2 = Sequential()
model2.add(Dense(20, input_dim=8, init='uniform', activation='relu'))
model2.add(Dense(12, init='uniform', activation='relu'))
model2.add(Dense(1, init='uniform', activation='sigmoid'))

In [108]:
model2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [110]:
model2.fit(X_train, Y_train, nb_epoch=150, batch_size=10, verbose=0)
scores = model2.evaluate(X_test, Y_test);
print('  Loss: ', scores[0], ' , acc:', scores[1])

 32/231 [===>..........................] - ETA: 0s  Loss:  0.484604746748  , acc: 0.796536796537


### (c) model III: 3-layers, node numbers are 12 and 8 but all using relu activation function

Now let's do deep learning by considering all activation function with **sigmoid** function. The accuracy is worse than using **relu**.

In [212]:
model3 = Sequential()
model3.add(Dense(12, input_dim=8, init='uniform', activation='sigmoid'))
model3.add(Dense(8, init='uniform', activation='sigmoid'))
model3.add(Dense(1, init='uniform', activation='sigmoid'))
model3.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model3.fit(X_train, Y_train, nb_epoch=150, batch_size=10, verbose=0)
scores = model3.evaluate(X_test, Y_test);
print('  Loss: ', scores[0], ' , acc:', scores[1])

 32/231 [===>..........................] - ETA: 1s  Loss:  0.563818572404  , acc: 0.679653679912


## Prediction

In [210]:
predictions = model.predict(X_test)
predictions.shape
print (predictions[:10])

[[ 0.02870761]
 [ 0.2803773 ]
 [ 0.33776325]
 [ 0.64599019]
 [ 0.36038035]
 [ 0.14201277]
 [ 0.08908633]
 [ 0.12896512]
 [ 0.85859615]
 [ 0.89892173]]


In [211]:
rounded = [round(x[0]) for x in predictions]
print (rounded[:10])

[0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0]


In [202]:
predictions2 = model.predict_proba(X_test)



## Reference

* 1. [Develop Your First Neural Network in Python With Keras Step-By-Step](http://machinelearningmastery.com/tutorial-first-neural-network-python-keras/)