<a href="https://colab.research.google.com/github/Black3rror/AI/blob/master/MNIST_sequential_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Goal

Show the labels to the model sequentiall. Some how we want to find out how does the model learn, and what it learns.

# Progress

- Show the first label and train the model, then add the next label and train the model, and go on this way. Compare the learning time of each label
- Each time just show a fixed number of samples of the new label to the model
- Each time show a fixed number of learnt samples, plus a fixed number of new samples of the model

# Import stuff

In [None]:
import numpy as np    # tf uses np so probabily we use np in our code
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, Activation

import matplotlib.pyplot as plt   # if u want to show imgs by pyplot

from tensorflow.keras.callbacks import TensorBoard
import datetime   # to organize TensorBoard files

from keras.utils import to_categorical    # to change a number to one-hot key

In [None]:
%load_ext tensorboard
!rm -rf ./logs/   # Clear any logs from previous runs

# Initialization

In [None]:
assert len(tf.config.list_physical_devices('GPU')) > 0

In [None]:
(trainX, trainy), (testX, testy) = keras.datasets.mnist.load_data()

# one-hot
trainy = to_categorical(trainy)
testy = to_categorical(testy)

# image should be in shape of (28, 28, 1) not (28, 28)
trainX = np.expand_dims(trainX, -1)
testX = np.expand_dims(testX, -1)

# normalize
trainX = trainX.astype("float32")/255
testX = testX.astype("float32")/255

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


# Build the model

In [None]:
def build_model(conv1_units = 32, conv2_units = 64, dropout_rate = 0.5):
  model = Sequential()
  model.add(Conv2D(conv1_units, (3,3), activation='relu', input_shape = trainX.shape[1:]))
  model.add(MaxPooling2D((2, 2)))
  model.add(Conv2D(conv2_units, (3,3), activation='relu'))
  model.add(MaxPooling2D((2, 2)))
  model.add(Flatten())
  model.add(Dropout(dropout_rate))
  model.add(Dense(10, activation='softmax'))
  return model

In [None]:
model = build_model()
#model.summary()

# Compile and fit

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])

In [None]:
(trainX_seq, trainy_seq) = (trainX[trainy[:, 0]==1], trainy[trainy[:, 0]==1])
(trainX_combined, trainy_combined) = (trainX_seq[:6000], trainy_seq[:6000])

for i in range(10):
  print("\n\nnew label: ", i)

  if i != 0:
    # 5400 of new label and 5400 of learnt ones
    (trainX_new, trainy_new) = (trainX[trainy[:, i]==1], trainy[trainy[:, i]==1])

    trainX_combined = np.concatenate((trainX_new[:5400], trainX_seq[:5400]), axis = 0)
    trainy_combined = np.concatenate((trainy_new[:5400], trainy_seq[:5400]), axis = 0)
    p = np.random.permutation(len(trainX_combined))
    (trainX_combined, trainy_combined) = (np.array(trainX_combined)[p], np.array(trainy_combined)[p])
    
    trainX_seq = np.concatenate((trainX_seq, trainX[trainy[:, i]==1]), axis = 0)
    trainy_seq = np.concatenate((trainy_seq, trainy[trainy[:, i]==1]), axis = 0)
    p = np.random.permutation(len(trainX_seq))
    (trainX_seq, trainy_seq) = (np.array(trainX_seq)[p], np.array(trainy_seq)[p])

  log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + "-labels-" + str(i)
  tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)

  model.fit(trainX_combined, trainy_combined, epochs=5, validation_data=(testX, testy), 
            callbacks=[tensorboard_callback])



new label:  0
Epoch 1/5
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  1
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  2
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  3
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  4
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  6
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  7
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  8
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


new label:  9
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


# TensorBoard

In [None]:
%tensorboard --logdir logs/fit    # to run the TensorBoard in the notebook

# Conclusion


Show the first label and train the model, then add the next label and train the model, and go on this way. Compare the learning time of each label\
Result: Done the task above, resuls shows that training gets a bit harder for later labels (opposite of what we thought). The reason is that we didn't mention that when we want to train the model with later labels, new label has less impact in the training batch so it takes more iterations for the model to learn the new label.

Each time just show a fixed number of samples of the new label to the model\
Result: Results were so interesting. On training data, results got worse for later labels. The reason can be, at first weights and biases were around zero but after learning each label, they will get further from zero, so the next label has more trouble to tune them. So I think by expanding the model, this phenomenon may diminish a bit.\
If we look at the validation data to see if the model just learns to output one label or it learns the concept (which will be weired in this case) or even to see how much the model forgets its past labels, we will see interesting things. It doesn't learn the concept and even forgets the past labels easily except for second label (label 1). I guess if we narrow the model (less neurons in each layer) and/or make it deeper (more layers), the model will not act this special behavior.

Each time show a fixed number of learnt samples, plus a fixed number of new samples of the model
Result: The result shows that learning later ones takes more time (opposite of what we thought). Its weired that for learning new labels, the model starts from around zero accuracy! And more weired one is that for label 9, finally the model learns to output just 9 (with 40% probability) and riches the accuracy of around 50%. Currently I have no idea of the reason and it can be investigated more to see how the model works.