# Problem 1 -- Generate the model

Develop a function 

    build_network(nslayers, n_neurons_per_layer, activation_fn)

The function should return a compiled model with the following structure:
* An Input node accepting an image of dimensions $28\times28$
* A Flatten node
* $n$ hidden layer nodes, each containing `n_neurons_per_layer` neurons and using the activation function `activation_fn`.
* An output layer (Dense layer) of 10 neurons that uses the softmax activation function.


The model should be compiled as such:
* Optimizer: sgd
* metrics: `["accuracy"]`
* loss: `sparse_categorical_crossentropy` (since the target variable is represented as a single value, as opposed to being one-hot encoded)



In [1]:
import numpy as np
import tensorflow
from tensorflow import keras
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras import layers
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from tensorflow.keras import Model
from tensorflow.keras import Model, Sequential
from keras.layers import Dense, Activation, Flatten,Input
from scikeras.wrappers import BaseWrapper, KerasClassifier, KerasRegressor

In [77]:
def build_network(nslayers, n_neurons_per_layer, activation_fn):
  model = keras.Sequential([
            Input(shape=(28,28))],
            Flatten())
  for i in range(nslayers):
    model.add(Dense(n_neurons_per_layer, activation=activation_fn))
  model.add(Dense(10,"softmax"))
  model.compile(optimizer='sgd',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
  return model

In [9]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 125)               98125     
                                                                 
 dense_1 (Dense)             (None, 125)               15750     
                                                                 
 dense_2 (Dense)             (None, 125)               15750     
                                                                 
 dense_3 (Dense)             (None, 125)               15750     
                                                                 
 dense_4 (Dense)             (None, 125)               15750     
                                                                 
 dense_5 (Dense)             (None, 125)               1

## Problem 2 -- Load the keras MNIST dataset. 

Call `keras.datasets.mnist.load_data("mnist.npz")`, which returns 
`(X_train, y_train), (X_test, y_test)`.  Split the training dataset into a training and validation set.

In [3]:
from tensorflow.keras.datasets import mnist
from sklearn.model_selection import train_test_split

In [4]:
(X_train, y_train), (X_test, y_test) = mnist.load_data("mnist.npz")

In [5]:
X_training, X_validation, y_training, y_validation = train_test_split(X_train, y_train, train_size=0.8)

In [6]:
def preprocessing(value):
  return value.astype('float32')/255
X_training = preprocessing(X_training)
X_test = preprocessing(X_test)
X_validation =preprocessing(X_validation)

#Problem 3 -- Train the model.

Call `build_network` with parameters of your choice (4-8 layers, 50-150 neurons per layer, and ReLU activation (`relu`) is a reasonable starting point.)  Train the model against the training dataset.  To reduce training time, an early stopping callback is advised.  Evaluate the model using the validation dataset.  What is the prediction accuracy of the neural net?

In [7]:
from tensorflow.keras.callbacks import EarlyStopping
model = build_network(7, 125, "relu")
early_stop = EarlyStopping(baseline=0.1,mode='max')

In [14]:
pred = model.fit(X_training,y_training, epochs=30, callbacks=[early_stop],validation_split=0.2)

Epoch 1/30
Epoch 2/30


In [15]:
model.fit(X_validation,y_validation, epochs=30, callbacks=[early_stop],validation_split=0.2)

Epoch 1/30
Epoch 2/30


<keras.callbacks.History at 0x7fec4f379610>

In [16]:
test_accuracy=model.evaluate(X_test,y_test)
print("Test accuracy:",test_accuracy[1])

Test accuracy: 0.9578999876976013


#Problem 4 -- Optimize the model.

Use one of the hyperparameter optimization frameworks discussed in class, such as scikit-optimize, to find an optimal values of the number of layers, activation function, and neurons per layer for this neural network.  Use a budget of about 20 runs.  Use the below tables as rough guidelines for the parameter space.

|Parameter|Space ($\Lambda_n$)|
|---------|----|
|Activation function|`relu`, `sigmoid`|
|Number of layers|~2-20 (integer, uniform)|
|Number of neurons per layers|10-300 (integer, log distributed)|

What combination of parameters ($\lambda$) produces the highest accuracy, and what is that accuracy?





In [64]:
from hyperopt import Trials
from hyperopt import (fmin, hp, tpe, STATUS_OK, STATUS_FAIL)

In [73]:
def generate_model(values):
    activation, nslayer, n_neurons = [values[k] for k in ['activation_fn', 'n_layers', 'neurons_per_layer']]
    nslayer, n_neurons = [int(k) for k in [nslayer, n_neurons]]
    model = build_network(nslayer, n_neurons, activation)
    early_stop = EarlyStopping(baseline=0.1,mode='max')
    return { 'model' : model,'status': STATUS_OK, 'accuracy' : accuracy, 'loss': loss}

In [74]:
trials = Trials()
best = fmin(fn = generate_model, space = {
    'activation_fn' : hp.choice('activation_fn',["relu", "sigmoid"]),
    'n_layers': hp.quniform("n_layers", 2, 19, 1),
    'neurons_per_layer': hp.quniform("neurons_per_layer", 10, 300, 1)
}, algo = tpe.suggest,max_evals = 10, trials = trials, show_progressbar=True)

100%|██████████| 10/10 [00:01<00:00,  9.77it/s, best loss: 0.2950393557548523]


In [75]:
pred = model.fit(X_training,y_training, epochs=2, callbacks=[early_stop],validation_split=0.2)
loss , accuracy = model.evaluate(X_test, y_test)

Epoch 1/2
Epoch 2/2


In [76]:
best_model = trials.best_trial['result']['model']
loss, accuracy = best_model.evaluate(X_test, y_test)
print("validation accuracy:",accuracy)

validation accuracy: 0.09799999743700027
