# STUDY GROUP - M04S41
## Multi-layer Perceptrons

### Objectives

You will be able to:
* summarize the steps used to execute a MLP in keras
* explain difference between neural network and a MLP/Deep Learning network and the kinds of problems/data used with each
* differentiate between various optimization and activation functions
* incorporate GridSearch hyperparameter tuning into MLP workflow

### Deep Networks

- What is a deep network?

    * a neural network with multiple hidden layers
    
- What kinds of problems/data do we use shallow neural networks for?

    * simple probelems (image edges, audio pitch/frequency, low number features, low number of output classes (binary classification)
    

### Keras

**Steps**
1. Load Data.
    - Why is important to set a random seed when loading/generating our data? What are sources of randomness/stochasticity in a deep learning model?
    * reproducibility, greater randomness involved in deep learning models due to weights/biases dot products
2. Define Model.
    - How do we know number of layers and their types?
    - Which activation function do we use?
        * Regression: Linear activation function or ‘linear’ and the number of neurons matching the number of outputs.
        * Binary Classification (2 class): Logistic activation function or ‘sigmoid’ and one neuron the output layer.
        * Multiclass Classification (>2 class): Softmax activation function or ‘softmax’ and one output neuron per class value, assuming a one-hot encoded output pattern.
3. Compile Model.
    - How do we know which loss functions to use?
        * Regression: Mean Squared Error or ‘mse‘.
        * Binary Classification (2 class): Logarithmic Loss, also called cross entropy or ‘binary_crossentropy‘.
        * Multiclass Classification (>2 class): Multiclass Logarithmic Loss or ‘categorical_crossentropy‘.
    - How do we know which optimization functions to use?
        * Stochastic Gradient Descent or ‘sgd‘ that requires the tuning of a learning rate and momentum.
        * ADAM or ‘adam‘ that requires the tuning of learning rate.
        * RMSprop or ‘rmsprop‘ that requires the tuning of learning rate.
4. Fit Model.
    - What is an epoch?
    - What is a batch size?
5. Evaluate Model.
6. Tie It All Together.

In [27]:
# Sample Multilayer Perceptron Neural Network in Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
# import pandas as pd
data = np.loadtxt('pima-indians-diabetes.data.csv', delimiter=',')
# load and prepare the dataset
X = data[:,0:8]
Y = data[:,8]
# set random seed
seed = 42
np.random.seed(seed)
# 1. define the network
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# model.summary()
# 2. compile the network
model.compile(loss='binary_crossentropy', optimizer='adam' , metrics = ['accuracy'])
# 3. fit the network
model.fit(X,Y,epochs=100, batch_size=10, verbose=0)
# 4. evaluate the network
loss, accuracy = model.evaluate(X,Y)
print("\nLoss: %.2f, Accuracy: %.2f%%" %(loss, accuracy*100))
# 5. make predictions
probabilities = model.predict(X)
preds = [round(x) for x in probabilities]
# accuracy = np.mean(preds ==Y)


Loss: 0.60, Accuracy: 68.62%


TypeError: type numpy.ndarray doesn't define __round__ method

In [8]:
# Sample Multilayer Perceptron Neural Network in Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy
# load and prepare the dataset
dataset = numpy.loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
X = dataset[:,0:8]
Y = dataset[:,8]
# 1. define the network
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# 2. compile the network
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# 3. fit the network
history = model.fit(X, Y, epochs=100, batch_size=10)
# 4. evaluate the network
loss, accuracy = model.evaluate(X, Y)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))
# 5. make predictions
probabilities = model.predict(X)
predictions = [float(round(x)) for x in probabilities]
accuracy = numpy.mean(predictions == Y)
print("Prediction Accuracy: %.2f%%" % (accuracy*100))


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

TypeError: type numpy.ndarray doesn't define __round__ method

In [19]:
# Create first network with Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10,  verbose=0)
# 4. evaluate the network
loss, accuracy = model.evaluate(X, Y)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))
# calculate predictions
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
accuracy = numpy.mean(rounded == Y)
print("Prediction Accuracy: %.2f%%" % (accuracy*100))

  from ipykernel import kernelapp as app
  app.launch_new_instance()



Loss: 0.46, Accuracy: 76.30%
Prediction Accuracy: 76.30%


### Keras wrappers with sklearn

Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class.

To use these wrappers you must define a function that creates and returns your Keras sequential model, then pass this function to the build_fn argument when constructing the KerasClassifier class. The constructor for the KerasClassifier class can take default arguments that are passed on to the calls to model.fit(), such as the number of epochs and the batch size. The constructor for the KerasClassifier class can also take new arguments that can be passed to your custom create_model() function. These new arguments must also be defined in the signature of your create_model() function with default parameters.

In [17]:
# Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.data.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Best: 0.697917 using {'batch_size': 20, 'epochs': 100}
0.611979 (0.047019) with: {'batch_size': 10, 'epochs': 10}
0.555990 (0.153370) with: {'batch_size': 10, 'epochs': 50}
0.553385 (0.168085) with: {'batch_size': 10, 'epochs': 100}
0.614583 (0.042473) with: {'batch_size': 20, 'epochs': 10}
0.678385 (0.046146) with: {'batch_size': 20, 'epochs': 50}
0.697917 (0.019225) with: {'batch_size': 20, 'epochs': 100}
0.669271 (0.027498) with: {'batch_size': 40, 'epochs': 10}
0.674479 (0.001841) with: {'batch_size': 40, 'epochs': 50}
0.687500 (0.005524) with: {'batch_size': 40, 'epochs': 100}
0.623698 (0.004872) with: {'batch_size': 60, 'epochs': 10}
0.665365 (0.029463) with: {'batch_size': 60, 'epochs': 50}
0.682292 (0.012890) with: {'batch_size': 60, 'epochs': 100}
0.539063 (0.150005) with: {'batch_size': 80, 'epochs': 10}
0.596354 (0.051658) with: {'batch_size': 80, 'epochs': 50}
0.566406 (0.129830) with: {'batch_size': 80, 'epochs': 100}
0.541667 (0.113945) with: {'batch_size': 100, 'epochs':

In [None]:
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD

# Generate dummy data
import numpy as np
x_train = np.random.random((1000, 20))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
x_test = np.random.random((100, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)

probabilities = model.predict(x_test)
predictions = [float(round(x)) for x in probabilities]
accuracy = numpy.mean(predictions == y_test)
print("Prediction Accuracy: %.2f%%" % (accuracy*100))