# Homework 2
----------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------

# Problem 1 Backpropagation

In the following Neural Network, we have initialized the weights randomly. Use a training sample
(X,y) = ((1,1), 0) to update the weights (perform one round of backpropagation using one training
sample). Use learning rate parameter alpha = 0.1. Note that we have bias terms with value of -1 in
this network (no need for coding for this question).

---

See pdf attached in CSNS: hw2_problem1.pdf

# Problem 2: Adjusting the hyperparameters of MNIST Digit Recognition using ANN model in Keras+TensorFlow and Grid-Search in SciKitLearn
---

In [7]:
# "Sequential" models let usv define a stack of neural network layers
from keras.models import Sequential
# import the core layers:
from keras.layers import Dense, Dropout, Activation, Flatten

import numpy as np

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
%matplotlib inline

In [8]:
# import some utilities to transform/preprocess our data:
from keras.utils import np_utils

## a - Download the Kears+Tensorflow tutorial from CSNS. Import all required modules including the following:
`from keras.wrappers.scikit_learn import KerasClassifier`
`from sklearn.model_selection import GridSearchCV`

In [9]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

## b - Import the minst dataset, and split it into testing and training as we saw in the tutorial. Then, reshape each sample into a row vector, and scale it by dividing by 255.

In [4]:
# Keras will download MNIST digit dataset for us:
from keras.datasets import mnist
 
# By default, the first 60k of MNIST has been defined as training and the rest as testing set: 
(X_train, y_train), (X_test, y_test) = mnist.load_data()


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [5]:
print(X_train.shape)
print(X_test.shape)

(60000, 28, 28)
(10000, 28, 28)


In [10]:
# Reshape the the pixels into a line:
X_train = X_train.reshape(X_train.shape[0], 784)
X_test = X_test.reshape(X_test.shape[0], 784)

In [11]:
print(X_train.shape)
print(X_test.shape)

(60000, 784)
(10000, 784)


In [12]:
# simply scale the features to the range of [0,1]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

In [13]:
# output label:
print (y_train.shape)
print (y_train[:10])

(60000,)
[5 0 4 1 9 2 1 3 1 4]


## c - Perform OneHotEncoding for the label y. So, your label will be a vector of 10 elements for each data sample.

In [14]:
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)


In [15]:
# Label after OneHotEncoding:
print (y_train.shape)
print (y_train[:10,:])

(60000, 10)
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


## d - Now, define a function called model_creator. This function will define, create, and compile your neural network model according to your structure, and then return the built model as the output. For the ANN neurons/layers, use the same structure as we had in the tutorial:

In [16]:
def model_creator(input_size=784, hidden_neurons=100, out_size=10):
    # define:
    model = Sequential()
    
    # second layer: hidden layer:
    model.add(Dense(hidden_neurons, input_dim = input_size))  # Nuerons
    model.add(Activation('sigmoid')) # Activation
    
    # third layer: output layer:
    model.add(Dense(out_size, input_dim = hidden_neurons))  # Nuerons
    model.add(Activation('softmax')) # Activation
    
    # compile:
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    
    # return:
    return model

In [17]:
input_size = 784
hidden_neurons = 100
out_size = 10

## e - Fix the random state for reproducibility:

`seed = 2, np.random.seed(seed)`

In [36]:
seed = 2
np.random.seed(seed)
print(seed)

2


# f - Use KerasClassifier class to wrap your model as an object:

`model = KerasClassifier(build_fn = model_creator, verbose=2)`

In [24]:
model = KerasClassifier(build_fn = model_creator, verbose = 2)

# g - Now, run sklearn GridSearch to find the best `batch_size` and `epochs`. Search in this range: `batch_size = [30 , 50 , 100 ]` , `epochs = [10, 15, 20]`.

In your GridSearch, the estimator is the above model, the scoring should be 'neg_log_loss', and you have to use 10-fold CV.

In [22]:
from sklearn.model_selection import GridSearchCV

In [25]:
batch_size = [30, 50, 100]
epochs = [10, 15, 20]

param_grid = dict(batch_size=batch_size, epochs=epochs)

In [26]:
grid = GridSearchCV(
    model,
    param_grid,
    cv=10,
    scoring="neg_log_loss",
    n_jobs=-1
)

In [27]:
grid.fit(X_test, y_test)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Epoch 1/20
 - 1s - loss: 0.8872 - acc: 0.8085
Epoch 2/20
 - 1s - loss: 0.3841 - acc: 0.9005
Epoch 3/20
 - 1s - loss: 0.2939 - acc: 0.9214
Epoch 4/20
 - 1s - loss: 0.2467 - acc: 0.9307
Epoch 5/20
 - 1s - loss: 0.2154 - acc: 0.9394
Epoch 6/20
 - 1s - loss: 0.1872 - acc: 0.9473
Epoch 7/20
 - 1s - loss: 0.1679 - acc: 0.9529
Epoch 8/20
 - 1s - loss: 0.1487 - acc: 0.9578
Epoch 9/20
 - 1s - loss: 0.1324 - acc: 0.9647
Epoch 10/20
 - 1s - loss: 0.1185 - acc: 0.9677
Epoch 11/20
 - 1s - loss: 0.1059 - acc: 0.9718
Epoch 12/20
 - 1s - loss: 0.0937 - acc: 0.9760
Epoch 13/20
 - 1s - loss: 0.0847 - acc: 0.9791
Epoch 14/20
 - 1s - loss: 0.0753 - acc: 0.9822
Epoch 15/20
 - 1s - loss: 0.0665 - acc: 0.9857
Epoch 16/20
 - 1s - loss: 0.0589 - acc: 0.9878
Epoch 17/20
 - 1s - loss: 0.0521 - acc: 0.9891
Epoch 18/20
 - 1s - loss: 0.0460 - acc: 0.9910
Epoch 19/20
 - 1s - loss: 0.0406 - acc: 0.9

GridSearchCV(cv=10, error_score='raise-deprecating',
       estimator=<keras.wrappers.scikit_learn.KerasClassifier object at 0x13c188630>,
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'batch_size': [30, 50, 100], 'epochs': [10, 15, 20]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='neg_log_loss', verbose=0)

# h - Based on your results, what is the best batch_size and epochs? Now, test your model with the best batch_size and epochs on the testing set. `grid.best_estimator_.model` gives you the best model found and trained in the gridsearch. What is the prediction accuracy on the testing set?

In [34]:
print("\nBest batch_size and epochs:", grid.best_params_)


Best batch_size and epochs: {'batch_size': 30, 'epochs': 20}


In [38]:
best_model = grid.best_estimator_.model

# doesn't take random_seed?
best_model.fit(X_train, y_train, validation_split=0.33, batch_size=30, epochs=20, verbose=1)

Train on 40199 samples, validate on 19801 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1476bd748>

## Testing, Prediction, Evaluation:

In [40]:
# Prediction:
y_predict = best_model.predict(X_test, verbose=1)
print (y_predict.shape)

(10000, 10)


In [42]:
# Evaluation:
score = best_model.evaluate(X_test, y_test, verbose=1)
print('Test accuracy is: ', score[1])

Test accuracy is:  0.9828
