###Artificial Neural Network (ANN) Challenge
####In this challenge, I'll work with dataset: fashion MNIST. Using this dataset, I'll do the following:

####1.Preprocess your data so that you can feed it into ANN models.

####2.Split your data into training and test sets.

####3.Try different ANN models and train them on your training set. 

#####1.Number of layers
#####2.Activation functions of the layers
#####3.Number of neurons in the layers
#####4.Different batch sizes during training

####4.Compare your models' training scores and interpret your results.

####4.Evaluate how your models perform on your test set. Compare the results of your models.

#####Use GPU

In [None]:
pip install tensorflow-gpu==2.0.0-rc1

Collecting tensorflow-gpu==2.0.0-rc1
[?25l  Downloading https://files.pythonhosted.org/packages/73/cf/2fc69ba3e59edc8333e2676fa71b40197718dea7dc1282c79955cf6b2acb/tensorflow_gpu-2.0.0rc1-cp36-cp36m-manylinux2010_x86_64.whl (380.5MB)
[K     |████████████████████████████████| 380.5MB 45kB/s 
Collecting keras-applications>=1.0.8
[?25l  Downloading https://files.pythonhosted.org/packages/71/e3/19762fdfc62877ae9102edf6342d71b28fbfd9dea3d2f96a882ce099b03f/Keras_Applications-1.0.8-py3-none-any.whl (50kB)
[K     |████████████████████████████████| 51kB 7.4MB/s 
[?25hCollecting tf-estimator-nightly<1.14.0.dev2019080602,>=1.14.0.dev2019080601
[?25l  Downloading https://files.pythonhosted.org/packages/21/28/f2a27a62943d5f041e4a6fd404b2d21cb7c59b2242a4e73b03d9ba166552/tf_estimator_nightly-1.14.0.dev2019080601-py2.py3-none-any.whl (501kB)
[K     |████████████████████████████████| 501kB 40.6MB/s 
Collecting tb-nightly<1.15.0a20190807,>=1.15.0a20190806
[?25l  Downloading https://files.pythonho

#####Load the libraries

In [None]:
import warnings
warnings.filterwarnings("ignore")

from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers

#####1.Preprocess your data so that you can feed it into ANN models.
#####2. Split your data into training and test sets.

In [None]:
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

#Fashion MNIST has 28x28 greyscale images, so 28x28 = 784
input_dim = 784  
output_dim = nb_classes = 10
nb_epoch = 20

#Split into test and training sets
X_train = X_train.reshape(60000, input_dim)
X_test = X_test.reshape(10000, input_dim)

#Convert to float type
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

#Normalize the vectors
X_train /= 255
X_test /= 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


#####Convert categorical to numerical variables with one-hot encoding

In [None]:
Y_train = to_categorical(y_train, nb_classes)
Y_test = to_categorical(y_test, nb_classes)

#####3. Try different ANN models and train them on your training set. You can play with the following:

######1.Number of layers
######2.Activation functions of the layers
######3.Number of neurons in the layers
######4.Different batch sizes during training

#####Let's start with 3 dense layers with 128, 64, and 10 neurons, with batch size set to 128 as a starting point

In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.42172230417728424
Test accuracy: 0.8507


#####The model had a decent accuracy score of 0.85. But it suffers from overfitting. Let's add another layer and see how they compare.

In [None]:
#build the model
model = Sequential()
# add 4 dense layers with 128, 64, 58, 10 neurons
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(58, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.39879621031284335
Test accuracy: 0.8576


#####The results are very similar. Let's add another layer.

In [None]:
#build the model
model = Sequential()
# add 5 dense layers with 128, 100, 64, 58, 10 neurons
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(58, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.4018573496341705
Test accuracy: 0.8546


#####Adding another layer increased the test score's accuracy but the training score lowered.Let's stick to 3 layers and and try sigmoid, tanh, and ReLU activation functions

#####Using Sigmoid function

In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with activation set to sigmoid
model.add(Dense(128, input_shape=(784,), activation="sigmoid"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.4967690366268158
Test accuracy: 0.8219


#####Using tanh function

In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with activation set to tanh
model.add(Dense(128, input_shape=(784,), activation="tanh"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.41139678983688355
Test accuracy: 0.8533


Using ReLU function

In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with activation set to 'relu'
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.4226639069080353
Test accuracy: 0.8509


#####The model that used the ReLU activation function had a slightly higher accuracy score for the test set than the one that used the tanh activation function.The ReLU and tanh models also had very similar scores for the training sets. The model with the sigmoid activation function had a lower accuracy score for the test set but a higher accuracy score for the training set. But none of the three models appear to suffer from overfitting. The model that used the ReLU activation function did perform the best 

#####Now let's vary the number of neurons in each layer.

In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with ReLU activation
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.4196259379386902
Test accuracy: 0.8514000177383423


In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 100, 50, 10 neurons with ReLU activation
model.add(Dense(100, input_shape=(784,), activation="relu"))
model.add(Dense(50, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.41667118668556213
Test accuracy: 0.8537999987602234


In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 1000, 500, 10 neurons with ReLU activation
model.add(Dense(1000, input_shape=(784,), activation="relu"))
model.add(Dense(500, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.4107329547405243
Test accuracy: 0.852400004863739


In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 2000, 1000, 10 neurons with activation set to 'relu'
model.add(Dense(1500, input_shape=(784,), activation="relu"))
model.add(Dense(800, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.40162894129753113
Test accuracy: 0.8557999730110168


##### I achieved a slightly higher accuracy when changing the number of neurons in the layers.The models took longer to run with 1000, 500, 10 neurons and 2000, 1000, 10 neurons. So it's not worth the compuational power needed to achieve slightly higher accuracy. 

##### Finally let's try with different batch sizes in training with higher, lower and full batch sizes. 

In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with activation set to 'relu'
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 128 as mini batch size
model.fit(X_train, Y_train, batch_size=128, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.4197782278060913
Test accuracy: 0.8528000116348267


In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with activation set to 'relu'
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#set 8 as mini batch size
model.fit(X_train, Y_train, batch_size=8, epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.3273168206214905
Test accuracy: 0.8871999979019165


In [None]:
#build the model
model = Sequential()
# add 3 dense layers with 128, 64, 10 neurons with activation set to 'relu'
model.add(Dense(128, input_shape=(784,), activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))
#compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy',
              metrics=['accuracy'])
#use full sample
model.fit(X_train, Y_train, batch_size=X_train.shape[0], epochs=20, verbose=False)
#evaluate the model
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 2.0586607456207275
Test accuracy: 0.296099990606308


#####Varying the batch size yielded interesting results. Using the full batch size gave the lowest accuracy of 30%. Using a higher batch size of 128 yielded accuracy of 85% on test set and a 42% accuracy on the training set. Using a lower batch size of 8 yielded highest accuracy of 89% on test set and only 33% accuracy on training set. 

#####Based on a comparison of all the model results, the model with 3 dense layers, with 128, 64, and 10 neurons, with batch size set to 128, and with the ReLU activation function performed the best. This model also suffers least from overfitting based on the difference between the test and training set accuracy scores. The training accuracy is 85% is reasonable.

#####For future work, I can look at using diffrent hyper parameters and try to achieve even higher accuracy scores. I can also look at trying a different Deep learning approach to see if we can improve the accuracy scores. 