## Samuel Zeleke, Neural Networks — Homework 4

**QUESTION 1.** Speculate on whether you believe that so-called “deep” neural networks are destined to be another bust just as perceptrons and expert systems were in the past, or whether they really are a breakthrough that will be used for years into the future. Please give a two-to-three-paragraph answer, including examples to back up your argument.

   **ANSWER**

The failure of previous generations
of AI is because the solutions had significant problems in their functionality: Perceptrons didn't have the an efficient
(back-propagation) for automatic training; and Expert systems were hard to "train", bad at the inferring output for new data, and
hard to integrate. Deep neural networks don't have these issues. Instead, they are relatively easier to integrate (companies like Google provide APIs for using pre-trained models),
scalable (once trained, they can be used for large amount of new data), and easy to maintain. We also have recent developments like CNNs that
significantly improve their accuracy and the complexity of features the can recognize. So, I believe the Deep neural networks are a breakthrough that will be around for a while.

The only limiting factor I see is GPU/CPU performance limitations. The rate of performance improvements in newer generations of
GPUs/CPUs (RIP Moore's law) is not as fast as the growth in the size of newer, more general models. So, if there is no breakthrough
in chip-manufacturing and/or computer architecture, Neural networks will be limited to serving specialized roles like driving cars, or
recognizing objects.

**QUESTION 2.** Hand-compute a single, complete back-propagation cycle. Use the example network from class and compute the updated weight values for the first gradient descent iteration for the XOR example, i.e., [1, 1] → 0. Use the same initial weights we used in the class example but assume the identity function as the activation function (f(x) = x).

**ANSWER**

Inputs| $i_{1} = 1; i_{2} = 1$
-------|-----------
Hidden Layer | $W_{i_{1}h_{1}} = 0.11$; $W_{i_{1}h_{2}} = 0.12$; $W_{i_{2}h_{1}} = 0.21$; $W_{i_{2}h_{2}} = 0.08$
Output Layer | $W_{h_{1}Output} = 0.14$; $W_{h_{2}Output} = 0.15$
Learning Rate (lr) | $$0.05$$

$
Output =
\left[
\begin{array}{c c}
1 & 1 \\
\end{array}
\right]
\cdot
\left[
\begin{array}{c c}
0.11 & 0.12 \\
0.21 & 0.08 \\
\end{array}
\right]
\cdot
\left[
\begin{array}{c}
0.14 \\
0.15 \\
\end{array}
\right] =
\left[
\begin{array}{c c}
0.32 & 0.2 \\
\end{array}
\right]
\cdot
\left[
\begin{array}{c}
0.14 \\
0.15 \\
\end{array}
\right] = 0.0748
$

$\Delta_{error} = |0 - 0.0748| = 0.0748$

New values


Output layer

$W_{h_{1}Output}^{*} = W_{h_{1}Output} + lr * a_{1} * W_{h_{1}Output} * \Delta_{error} = 0.14 + 0.05 * 0.32 * 0.0757 = 0.1412112$

$W_{h_{2}Output}^{*} = W_{h_{2}Output} + lr * a_{2} * W_{h_{2}Output} * \Delta_{error} = 0.15 + 0.05 * 0.2 * 0.0757 = 0.15076$

Hidden layer

$W_{i_{1}h_{1}}^{*} = W_{i_{1}h_{1}} + lr * i_{1} * W_{h_{1}Output} * \Delta_{error} = 0.11 + 0.05 * 1 * 0.14 * 0.0757 = 0.1105299$

$W_{i_{1}h_{2}}^{*} = W_{i_{1}h_{2}} + lr * i_{1} * W_{h_{2}Output} * \Delta_{error} = 0.12 + 0.05 * 1 * 0.15 * 0.0757 = 0.1205676$

$W_{i_{2}h_{1}}^{*} = W_{i_{2}h_{1}} + lr * i_{2} * W_{h_{1}Output} * \Delta_{error} = 0.21 + 0.05 * 1 * 0.14 * 0.0757 = 0.2105299$

$W_{i_{2}h_{2}}^{*} = W_{i_{2}h_{2}} + lr * i_{2} * W_{h_{2}Output} * \Delta_{error} = 0.08 + 0.05 * 1 * 0.15 * 0.0757 = 0.0805676$

**QUESTION 3.** Build a Keras-based ConvNet for Keras’s Fashion MNIST dataset (fashion_mnist). Experiment with different network architectures, submit your most performant network, and report the results.

In [2]:
# import files
import keras
import numpy as np
import pandas as pd
import keras.datasets
from keras.optimizers import RMSprop, Adagrad


import matplotlib.pyplot as plt
import tensorflow as tf

Using TensorFlow backend.


In [4]:
# get data
(training_images, training_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()

In [5]:
# noralize data
training_images = training_images.astype("float32") / 255.0
test_images = test_images.astype("float32") / 255.0
training_images = training_images.reshape((training_images.shape[0], 28, 28, 1))
test_images = test_images.reshape((test_images.shape[0], 28, 28, 1))
#---Make labels categorical
training_labels = keras.utils.to_categorical(training_labels)
test_labels = keras.utils.to_categorical(test_labels)

In [6]:
# create network
model = keras.Sequential()

# input and first convolution: extract 30 features
model.add(keras.layers.Conv2D(30, 2, activation="relu", input_shape = (28, 28, 1)))
model.add(keras.layers.MaxPooling2D(2))

# input and second convolution: extract 30 features
model.add(keras.layers.Conv2D(60, 3, activation="relu"))
model.add(keras.layers.MaxPooling2D(2))

# input and third convolution: extract 30 features
model.add(keras.layers.Conv2D(60, 3, activation="relu"))
model.add(keras.layers.MaxPooling2D(2))

#flatten
model.add(keras.layers.Flatten())
# three dense layers
model.add(keras.layers.Dense(120, activation="relu"))
model.add(keras.layers.Dense(30, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 27, 27, 30)        150       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 30)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 11, 11, 60)        16260     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 60)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 3, 3, 60)          32460     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 60)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 60)                0         
__________

In [7]:
# compile
model.compile(loss = "binary_crossentropy", optimizer=keras.optimizers.Adam(lr = 0.001), metrics = ["acc"])

In [8]:
# train
history = model.fit(
    training_images[:30000],
    training_labels[:30000],
    epochs = 20,
    validation_split=0.2
)

Train on 24000 samples, validate on 6000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [11]:
# evaluate on test
model.evaluate(
    x = test_images,
    y = test_labels,
    steps = 10
)
# final lose = 0.07



[0.0764477401971817, 0.9747902154922485]

The final validation accuracy was 0.9714 and loss of 0.068. The lose on the test data is ~0.08 and the accuracy score is 0.974. So, the
model generalizes well.