<a href="https://colab.research.google.com/github/aguscura/Python-Deep-Learning/blob/main/fashion_mnist_cnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from tensorflow.keras import datasets
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

**Hacemos un reshape acá abajo de (28,28,1), ya que la CNN espera un tensor 3D.**

In [2]:
fashion_mnist = datasets.fashion_mnist

(train_images, train_labels) , (test_images, test_labels) = fashion_mnist.load_data()

class_names = ["T-shirt/Top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle Boot"]

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [3]:
train_labels

array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)

In [4]:
model = Sequential()
model.add(Conv2D(64, (7,7), activation='relu', input_shape=(28,28,1), padding='same'))
model.add(MaxPooling2D(2,2))
model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(2,2))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))


model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 28, 28, 64)        3200      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 14, 14, 64)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 128)       73856     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 7, 7, 128)        0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 6272)              0         
                                                                 
 dense (Dense)               (None, 64)                4

# IMPORTANTE

SPARSE_CATEGORICAL_CROSSENTROPY  **VS.**  CATEGORICAL_CROSSENTROPY

If your Yi's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]

But if your Yi's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]

The usage entirely depends on how you load your dataset. One advantage of using sparse categorical cross entropy is it saves time in memory as well as computation because it simply uses a single integer for a class, rather than a whole vector.

**El número que aparece abajo de las epochs coincide con la cantidad de imagenes / batch size**

Epoch 1/5

1/600 --> Esto indica por ejemplo que teniamos 60.000 imagenes y el batch_size fue de 100.

In [5]:
model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics= ['accuracy'])

model.fit(train_images, train_labels, epochs=5, batch_size = 100)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test Accuracy: ', test_acc )

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test Accuracy:  0.9093000292778015


**BatchNormalization** --> Normaliza las entradas de las capas de la red neuronal. Es decir, la entrada de la capa pasa a tener media 0 y desviación 1.

**Dropout** --> Valioso contra el overfitting. "Desactiva" o "duerme" una cierta parte de las neuronas de la capa en cuestion. Las ignora de manera aleatoria. Es decir, las neuronas no son tenidas en cuenta para una determinada iteración.

In [8]:
from tensorflow.keras.layers import BatchNormalization, Dropout

model_2 = Sequential()
model_2.add(Conv2D(filters=32, kernel_size= (3,3), activation='relu', strides=1, padding='same', input_shape=(28,28,1)))
model_2.add(BatchNormalization())

model_2.add(Conv2D(filters=32, kernel_size= (3,3), activation='relu', strides=1, padding= 'same'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.25))

model_2.add(Conv2D(filters=64, kernel_size= (3,3), activation='relu', strides=1, padding='same'))
model_2.add(MaxPooling2D(pool_size=(2,2)))
model_2.add(Dropout(0.25))

model_2.add(Conv2D(filters=128, kernel_size=(3,3), activation='relu', strides=1, padding='same'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.25))

model_2.add(Flatten())

model_2.add(Dense(512, activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.5))

model_2.add(Dense(128, activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.5))

model_2.add(Dense(10, activation='softmax'))

model_2.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_8 (Conv2D)           (None, 28, 28, 32)        320       
                                                                 
 batch_normalization_4 (Batc  (None, 28, 28, 32)       128       
 hNormalization)                                                 
                                                                 
 conv2d_9 (Conv2D)           (None, 28, 28, 32)        9248      
                                                                 
 batch_normalization_5 (Batc  (None, 28, 28, 32)       128       
 hNormalization)                                                 
                                                                 
 dropout_2 (Dropout)         (None, 28, 28, 32)        0         
                                                                 
 conv2d_10 (Conv2D)          (None, 28, 28, 64)       

In [9]:
# Con BatchNormalization y Dropout

model_2.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics= ['accuracy'])

model_2.fit(train_images, train_labels, epochs=5, batch_size = 100)

test_loss, test_acc = model_2.evaluate(test_images, test_labels)

print('Test Accuracy: ', test_acc )

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test Accuracy:  0.920199990272522


Aún podemos mejorar el modelo sumando mas **epochs**.

**DECAIMIENTO EN EL RATIO DE APRENDIZAJE**

El learning rate es fundamental que se ajuste al momento del entrenamiento. Como vimos, al principio podemos aprender de a pasos "grandes", pero luego, a medida que nos acercamos al minimo de la loss-function, lo mejor es dar pasos "pequeños" buscando el optimo global.

Para esto, existe un callback llamado LearningRateScheduler, que nos permite ir disminuyendo el learning_rate a medida que la red aprende. 

A este callback tenemos que alimentarlo con una funcion que le indique de que manera va a ir reduciendo el lr. Y va a devolver justamente la tasa de aprendizaje actualizada. 

In [None]:
from tensorflow.keras import optimizers
from tensorflow.keras import callbacks

#Comenzamos con lr=0.001
optimizer = optimizers.Adam(learning_rate=0.001)

model_2.compile(optimizer=optimizer,
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

reduce_lr = callbacks.LearningRateScheduler(lambda x: 1e-3 * 0.9 ** x)

model_2.fit(train_images, train_labels, epochs=30, callbacks=[reduce_lr])

test_loss, test_acc = model.evaluate(test_images, test_labels)
print("Test Accuracy: ", test_acc)

Epoch 1/30
 139/1875 [=>............................] - ETA: 11:27 - loss: 0.2551 - accuracy: 0.9080

**CallBacks**

Un callback permite personalizar el modelo durante el entrenamiento. Es una herramienta muy util. Ej. de callbacks LearningRateScheduler, ModelCheckpoint.