
# Neural Network optimization

Each time a neural network finishes passing a batch through the network and generates prediction
results, it must decide how to use the difference between the results obtained and the values that
it knows to be true to adjust the weights at the nodes so that the network moves towards a
solution. The algorithm that determines this step is known as the optimization algorithm.

### 2.1 Practical example

In [None]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop
import time
import matplotlib.pyplot as plt

In [None]:
batch_size = 128
num_classes = 10
epochs = 20

#train test split data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000,784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

#convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

### SGD
SGD, or stochastic gradient descent, is the “classic” optimization algorithm.

In SGD the gradient
of the network loss function with respect to each individual weight in the network is computed.

---

Each direct pass through the network results in a certain parameterized loss function, and we use
each of the gradients that we have created for each of the weights, multiplied by a certain learning
rate, to move our weights in the direction that their gradient points.

---

SGD is the simplest algorithm both conceptually and in terms of its behavior. Given a small
enough learning rate, SGD always follows the gradient on the cost surface.

The new weights
generated in each iteration will always be strictly better than the previous ones from the previous
iteration.

---

SGD's simplicity makes it a good choice for shallow networks. However, it also means that SGD
converges significantly slower than other more advanced algorithms that are also available in
keras. It is also less able to escape local minima on the cost surface.

In [None]:




'''

#=========================== SGD =============================================
model.compile(loss = 'categorical_crossentropy',
              optimizer = 'SGD',
              metrics=['accuracy'])

timeStart = time.time()

history = model.fit(x_train, y_train,
                    batch_size = batch_size,
                    epochs = epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))

timeFinal = time.time() - timeStart

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
print('Time:', timeFinal)


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train','Test'], loc='upper left')
plt.show()

'''
'''
#=========================== RMSprop =============================================
model.compile(loss = 'categorical_crossentropy',
              optimizer = 'RMSprop',
              metrics=['accuracy'])

timeStart = time.time()

history = model.fit(x_train, y_train,
                    batch_size = batch_size,
                    epochs = epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))

timeFinal = time.time() - timeStart

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
print('Time:', timeFinal)


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train','Test'], loc='upper left')
plt.show()

'''


#=========================== ADAM =============================================
model.compile(loss = 'categorical_crossentropy',
              optimizer = 'adam',
              metrics=['accuracy'])

timeStart = time.time()

history = model.fit(x_train, y_train,
                    batch_size = batch_size,
                    epochs = epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))

timeFinal = time.time() - timeStart

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
print('Time:', timeFinal)


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train','Test'], loc='upper left')
plt.show()