High dimensional data is more frequent than one might first think, e.g., even a low resolution  grey scale image from the famous [MNIST datase](https://en.wikipedia.org/wiki/MNIST_database):

<a href="https://miro.medium.com/max/245/1*nlfLUgHUEj5vW7WVJpxY-g.png"><img src="https://drive.google.com/uc?export=view&id=1-XrK7beC0beocLB_NufEmKVsBbCRko88" width=250px></a>

(Image source: [Image Classification in 10 Minutes with MNIST Dataset](https://towardsdatascience.com/image-classification-in-10-minutes-with-mnist-dataset-54c35b77a38d))

has 784 dimensions, as it contains intensity information for each of its $28\times 28 = 784$ pixels. Image processing is far from being the only area with high-dimensional data. For instance, the frequently used  "bag of words" representation of text documents in NLP uses a separate dimension for each word in the data set's vocabulary.


In this **practical session**, we try to:

* create a multi perceptron model,
* create an **LR** scheduler manually,
* find the maximum of **LR** with LRFinder function,
* and use the OneCycle **LR** model to get the best results.


# Task1
##Loading mnist hand written dataset

We try to classificate the mnist handwritten dataset with multi-layer perceptron.

In [None]:
import tensorflow as tf
import numpy as np

import matplotlib.pyplot as plt

In [None]:
#Load the dataset from keras API
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [None]:
# Flatten the images
# You need to reshape the dataset from 28x28 2D array to 784 1D array. So the image_vector size equal the desired number of shape.
# Becuse the first layer of perceptron needs 1D array of nodes.
image_vector_size = ...
x_train = x_train.reshape() # with image_vector_size
x_test = x_test.reshape() # with image_vector_size

#Task2
##Modeling

In this case, create the multiperceptron model, and set the initial hyperparameters, and we try to predict the handwritten numbers from 0 to 9.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import SGD

In [None]:
# Hyperparameters
learning_rate = 0.1 # The learning rate
momentum = 0.0 # Momentum

# create a model
def create_model():
      model = Sequential()
      # Input layer
      model.add(Dense()) #add activation function, size of hidden layers, and input_shape
      # Output layer
      model.add(Dense()) #add activation function, and output shape

      # Compile a model
      model.compile(loss=... , optimizer=SGD(learning_rate, momentum), metrics=['accuracy']) #use crossentropy loss function
      return model
model = create_model()
#fit the modelmodel.summary()

In [None]:
#Fitting the model
#In this step we train our model with training dataset, and measure the loss, and accuracy on training and validation set too.

batch_size = 128
epoch = 15

results = model.fit(
    x_train, y_train,
    epochs= epoch,
    batch_size = batch_size,
    validation_data = (...), #add the validation set
    verbose = 1 # We need this, because Colab does not like if we print 1000 lines. So we disable keras status prints
)

In [None]:
results.history.keys()

In [None]:
# summarize history for accuracy
plt.figure(figsize = (12,5))
plt.subplot(121)
plt.plot(results.history['accuracy'])
plt.plot(results.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='down right')

# summarize history for loss
plt.subplot(122)
plt.plot(results.history['loss'])
plt.plot(results.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

max_loss = np.max(results.history['loss'])
min_loss = np.min(results.history['loss'])
print("Maximum Loss : {:.4f}".format(max_loss))
print("")
print("Minimum Loss : {:.4f}".format(min_loss))
print("")
print("Loss difference : {:.4f}".format((max_loss - min_loss)))

#Task3
##LR Scheduler

In this case, try to create an LR Scheduler manually with scheduler function and LearningRateScheduler callback from keras API.

In [None]:
from tensorflow.keras.callbacks import LearningRateScheduler

In [None]:
def scheduler(epoch, lr):
  #create statements:
  #if epoch less than 10 get back the initial lr, 
  #but if the number of epochs greater than 10 get back this equation: lr * tf.math.exp(-0.1)
  if ... :
    return ...
  else:
    return ...

In [None]:
callback = tf.keras.callbacks.LearningRateScheduler(...) #use the scheduler function for the LR scheduler

history = model.fit(
          x_train, y_train,
          epochs= 15,
          batch_size = 100,
          validation_data = (...), #use the validation set
          verbose = 1, # We need this, because Colab does not like if we print 1000 lines. So we disable keras status prints
          callbacks = ... #add the callback which is the LR Scheduler
          )

In [None]:
scheduled_lr = round(..., 5) #save the model.optimizer.lr like numpy array and round it with 5 points and print it
print(scheduled_lr)

In [None]:
# summarize history for accuracy
plt.figure(figsize = (12,5))
plt.subplot(121)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='down right')

# summarize history for loss
plt.subplot(122)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

max_loss = np.max(history.history['loss'])
min_loss = np.min(history.history['loss'])
print("Maximum Loss : {:.4f}".format(max_loss))
print("")
print("Minimum Loss : {:.4f}".format(min_loss))
print("")
print("Loss difference : {:.4f}".format((max_loss - min_loss)))

#Task4
##LRFinder

There is a predefined learning rate finder function on [github](https://github.com/titu1994/keras-one-cycle/blob/master/clr.py) which we try. Please download the functions and try to use it with our hints.

In [None]:
!wget "https://raw.githubusercontent.com/solalatus/IBS_GF_kepzes/main/Big_Data_and_ML/04Hyperparameters/clr.py?token=AHL2UDKITYK3K27TYW4F7XLBDIE2M" -O clr.py

In [None]:
import os
import numpy as np
import warnings

from tensorflow.keras.callbacks import Callback
from keras import backend as K

In this step we import the LRFinder function, and try to use it. This function need start and end number of learning rate. After that this function iterate all from start to end lr step by step, and measure the accuracies and losses on training and validation set too. After that, the function get back the maximum of lr.

In [None]:
from clr import LRFinder

num_samples = #number of samples
batch_size = #training batch size
minimum_lr = #starting lr eg.: 1e-5
maximum_lr = #maximum lr eg.: 10

lr_callback = LRFinder(num_samples, batch_size,
                       minimum_lr, maximum_lr,
                       validation_data=(x_test, y_test),
                       lr_scale='exp')

In [None]:
# Ensure that number of epochs = 1 when calling fit()
model.fit(x_train, y_train, epochs=1, batch_size=batch_size, callbacks=[lr_callback])

In [None]:
#plot the lr vs loss and find the best of LR
lr_callback.plot_schedule()
max_lr = ... # from history of model!

#Task5
##One Cycle LR

After the LRFinding we try to use One Cycle LR which is a predefenied function in this case from [github](https://github.com/titu1994/keras-one-cycle/blob/master/clr.py). In this step we use the previous result (maximum lr) like maximum learning rate in OneCycleLR.

In [None]:
from clr import OneCycleLR

In [None]:
num_samples = ... # samples equal 10k
batch_size = ... # like in the base model

lr_manager = OneCycleLR(num_samples=...,
                        batch_size=...,
                        max_lr=...,
                        end_percentage=...,) #add 0.1 to end_percentage

In [None]:
#use the lr_manager like callbacks
#add the train set again and fit the new model
model.fit(..., ..., epochs=1, batch_size=... , ...)

In [None]:
#Plot the learning rate and momentum of one cycle LR

print("LR Range : ", min(lr_manager.history['lr']), max(lr_manager.history['lr']))
print("Momentum Range : ", min(lr_manager.history['momentum']), max(lr_manager.history['momentum']))


plt.xlabel('Training Iterations')
plt.ylabel('Learning Rate')
plt.title("CLR")
plt.plot(...) # learning rate
plt.show()

plt.xlabel('Training Iterations')
plt.ylabel('Momentum')
plt.title("CLR")
plt.plot(...) # momentum
plt.show()