<a href="https://colab.research.google.com/github/AfiyatiReno/Belajar-Python-Afiyati/blob/master/CNN_Supervised_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src = "https://i.imgur.com/XoRDxQJ.png" align = "center">

#Fashion-MNIST Classification with CNN using Keras


## Why Jupyter Notebook?


*   Interactive programming in the web browser
*   Great for visualization
*   Great for collabration
*   Popular tool for studying machine learning / deep learning


##Why Keras?

There are many deep learning frameworks available in the market like [TensorFlow](https://www.tensorflow.org/), [Theano](http://deeplearning.net/software/theano/). But why Keras?



*   Simple
*   Keras is top level API library where we can use any framework as our backend, by default it recommends TensorFlow
*   Keras is easy to learn and easy to use



## Why Fashion-MNIST?


*   MNIST is too easy
*   MNIST is overused
*   MNIST can not represent modern Computer Vision tasks

Read more about the Fashion-MINST dataset in this paper [here](https://arxiv.org/abs/1708.07747) (**Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms**)


## Dataset Description
The Fashion MNIST data set contains 70,000 grayscale images in 10 categories. 
The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:

![alt text](https://miro.medium.com/max/840/0*dOOHSSWACxZJ_eIR)

In [0]:
import keras
from keras.datasets import fashion_mnist 
from keras.layers import Dense, Conv2D, MaxPooling2D, Activation, Flatten, Dropout
from keras.layers.normalization import BatchNormalization
from keras.models import Sequential
from keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt

In [0]:
#download fashion MNIST dataset
(X_train,Y_train), (X_test,Y_test) = fashion_mnist.load_data()

In [0]:
#check image shape
X_train.shape

In [0]:
# Define the text labels
fashion_mnist_labels = ["T-shirt/top",  # index 0
                        "Trouser",      # index 1
                        "Pullover",     # index 2 
                        "Dress",        # index 3 
                        "Coat",         # index 4
                        "Sandal",       # index 5
                        "Shirt",        # index 6 
                        "Sneaker",      # index 7 
                        "Bag",          # index 8 
                        "Ankle boot"]   # index 9

# Image index, you can pick any number between 0 and 59,999
img_index = 10
# y_train contains the lables, ranging from 0 to 9
label_index = Y_train[img_index]
# # Show one of the images from the training dataset
plt.imshow(X_train[img_index], cmap='Greys')
# Print the label, for example 2 Pullover
print("y = " + str(label_index))
print("Label" +":"+(fashion_mnist_labels[label_index]))

In [0]:
#See the data in a matrix 28x28
import pandas as pd

val = X_train[img_index]
display(pd.DataFrame(val))

##Data Preprocessing

Modifying the values of each pixel such that they range from 0 to 1 will improve the rate at which our model learns.

In [0]:
# scale data to the range of [0, 1]
X_train = X_train.astype("float32") / 255.0
X_test = X_test.astype("float32") / 255.0

## Split the data into train/validation/test data sets


*   Training data - used for training the model
*   Validation data - used for tuning the hyperparameters and evaluate the models
*   Test data - used to test the model after the model has gone through initial vetting by the validation set.

In [0]:
# Further break training data into train / validation sets (# put 5000 into validation set and keep remaining 55,000 for train)
(X_train, X_valid) = X_train[5000:], X_train[:5000] 
(Y_train, Y_valid) = Y_train[5000:], Y_train[:5000]

In [0]:
X_train.shape[0]

When using a convolutional layer as the first layer to our model, we need to reshape our data to (***n_images, x_shape, y_shape, channels***). All we really need to know is that we should set ***channels*** to **1 for grayscale images** and set ***channels*** to **3** when we have a set of **RGB-images** as input.

In [0]:
# Reshape input data from (28, 28) to (28, 28, 1)
w, h = 28, 28
X_train = X_train.reshape(X_train.shape[0], w, h, 1)
X_valid = X_valid.reshape(X_valid.shape[0], w, h, 1)
X_test = X_test.reshape(X_test.shape[0], w, h, 1)

In [0]:
# Print training set shape
print("x_train shape:", Y_train.shape, "y_train shape:", Y_train.shape)

# Print the number of training, validation, and test datasets
print(X_train.shape[0], 'train set')
print(X_valid.shape[0], 'validation set')
print(X_test.shape[0], 'test set')

Our model cannot work with categorical data directly. Therefore, we must use one hot encoding. In one hot encoding, the digits 0 through 9 are represented as a set of nine zeros and a single one. The digit is determined by the location of the number 1. For example, we’d represent a 3 as [0, 0, 0, 1, 0, 0, 0, 0, 0, 0].

In [0]:

# One-hot encode the labels
Y_train_one_hot = to_categorical(Y_train, 10)
Y_valid_one_hot = to_categorical(Y_valid, 10)
Y_test_one_hot = to_categorical(Y_test, 10)
Y_train_one_hot[0]

## Create the model architecture

There are two APIs for defining a model in Keras:
1. [Sequential model API](https://keras.io/models/sequential/)
2. [Functional API](https://keras.io/models/model/)

In this notebook we are using the Sequential model API. 
If you are interested in a tutorial using the Functional API, checkout Sara Robinson's blog [Predicting the price of wine with the Keras Functional API and TensorFlow](https://medium.com/tensorflow/predicting-the-price-of-wine-with-the-keras-functional-api-and-tensorflow-a95d1c2c1b03).

In defining the model we will be using some of these Keras APIs:
*   Conv2D() [link text]( https://www.pyimagesearch.com/2018/12/31/keras-conv2d-and-convolutional-layers/) - create a convolutional layer 
*   Pooling() [link text](https://keras.io/layers/pooling/) - create a pooling layer 
*   Dropout() [link text](https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab) - apply drop out 

###Activation Function in Neural Network

####Why Activation function?
![alt text](https://miro.medium.com/max/1280/1*YI211tVqoRB414cjUcF7WQ.gif)

If we do not apply a Activation function then the output signal would simply be a simple **linear function**.A linear function is just a polynomial of **one degree**.  A Neural Network without Activation function would simply be a **Linear regression Model**, which has limited power and does not performs good most of the times.
While building a neural network, one of the mandatory choices we need to make is which activation function to use. In fact, it is an unavoidable choice because activation functions are the foundations for a neural network to learn and approximate any kind of complex and continuous relationship between variables. It simply adds non-linearity to the network.

####Which one to prefer?


1.   Logistic/Sigmoid
     * Range [0,1]
2.   Tanh ( hyperbolic tangent) 
     * Range [-1,1]
     * For RNN, the tanh activation function is preferred as a standard activation function.
3.   ReLu (Rectified linear units) 
     * Range [0,x]
     * It is the most commonly used function because of its simplicity.
     * ReLu should be only used in hidden layers.
4.   Softmax 
     * The softmax activation function is again a type of sigmoid function.
     * Softmax is used for multiclass classification problem.
     * Softmax is generally preferred in the output layer where we are trying to get probabilities for different classes in the output.

In [0]:
model = Sequential()
model.add(Conv2D(filters=64, 
                 kernel_size=2,
                 padding='same',
                 activation='relu',
                 input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.5))

model.add(Conv2D(filters=32, 
                 kernel_size=2,
                 padding='same',
                 activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.5))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(10, activation='softmax')) #10 indicates the number of class

In [0]:
model.summary()

Reference for number of parameter in Dense layer: [link text](https://medium.com/@zhang_yang/number-of-parameters-in-dense-and-convolutional-neural-networks-34b54c2ec349) 

##Compile The Model
We use model.compile() to configure the learning process before training the model. This is where we define the type of loss function, optimizer and the metrics evaluated by the model during training and testing.

In [0]:
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])

In [0]:
from keras.utils import plot_model

plot_model(model, show_shapes=True, to_file='CNN.png')

from IPython.display import Image
Image('CNN.png')

In [0]:
#These codes aim to make sure that all dimension have the same shape 
print(X_train.shape)
print(Y_train_one_hot.shape)
print(X_valid.shape)
print(Y_valid_one_hot.shape)

##Train The Model
We will train the model with a batch_size of 64 and 10 epochs. Check out this [link text](https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9) to learn more about epoch, iterations, and batch.



1.   **Epoch**: One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.
2.   **Batch size**: Total number of training examples present in a single batch.
3.   **Iterations**: Iterations is the number of batches needed to complete one epoch.

Let’s say we have 2000 training examples that we are going to use .
We can divide the dataset of 2000 examples into batches of 500 then it will take 4 iterations to complete 1 epoch.



In [0]:
from keras.callbacks import ModelCheckpoint

checkpointer = ModelCheckpoint(filepath='model.weights.best.hdf5', verbose = 1, save_best_only=True)
hst = model.fit(X_train,
         Y_train_one_hot,
         batch_size=256,
         epochs=10,
         validation_data=(X_valid, Y_valid_one_hot),
         callbacks=[checkpointer])

In [0]:
test_loss, test_acc = model.evaluate(X_test, Y_test_one_hot)
print('Test loss', test_loss)
print('Test accuracy', test_acc)

In [0]:
model.predict(X_test[:2])
Y_test[:2]

In [0]:
predictions = model.predict(X_test)
print(np.argmax(np.round(predictions[0])))
plt.imshow(X_test[0].reshape(28, 28), cmap = plt.cm.binary)
plt.show()

In [0]:
# visualisasi loss tiap epoch
import matplotlib.pyplot as plt

plt.title('Loss')
plt.plot(hst.history['loss'], label='train')
plt.plot(hst.history['val_loss'], label='validation')
plt.legend()
plt.show()

In [0]:
# visualisasi accuracy tiap epoch
plt.title('Accuracy')
plt.plot(hst.history['acc'], label='train')
plt.plot(hst.history['val_acc'], label='validation')
plt.legend()
plt.show()

##References



*   https://towardsdatascience.com/mnist-cnn-python-c61a5bce7a19
*   https://medium.com/tensorflow/hello-deep-learning-fashion-mnist-with-keras-50fcff8cd74a
*   https://machinelearningmastery.com/how-to-accelerate-learning-of-deep-neural-networks-with-batch-normalization/
*   https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/
*   https://www.opencodez.com/python/text-classification-using-keras.htm
*   https://towardsdatascience.com/analyzing-different-types-of-activation-functions-in-neural-networks-which-one-to-prefer-e11649256209
*   https://towardsdatascience.com/activation-functions-and-its-types-which-is-better-a9a5310cc8f
