# Loss Reduction - Evaluation

The past few experiments using the bigger dataset, the model was not converging. The loss was not going below 0.5, so the objective is to find a model which can give a zero loss



In [None]:
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns


## Shallow CNN of 2 Layers

```
### Model Architecuture
IMG_SIZE = 128
ACTIVATION = 'relu'
KERNEL_INITIALISER = 'glorot_normal'
KERNEL_SIZE = (3,3)
POOL_SIZE = (6,6)

model = tf.keras.Sequential(name='Base')
model.add(tf.keras.layers.Input(shape=(IMG_SIZE,IMG_SIZE,3)))
model.add(tf.keras.layers.experimental.preprocessing.Rescaling(1./255))
model.add(tf.keras.layers.BatchNormalization())            

model.add(tf.keras.layers.Conv2D(filters=10,kernel_size=KERNEL_SIZE,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER,padding='same'))
model.add(tf.keras.layers.MaxPooling2D(pool_size = POOL_SIZE,padding = 'same'))
model.add(tf.keras.layers.Conv2D(filters=20,kernel_size=KERNEL_SIZE,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER,padding='same'))
model.add(tf.keras.layers.MaxPooling2D(pool_size = POOL_SIZE,padding = 'same'))


model.add(tf.keras.layers.Flatten())    

model.add(tf.keras.layers.Dense(64,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER))
model.add(tf.keras.layers.Dense(128,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER))
      
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))

model.summary()
```

## Experiment : Shuffle Buffer Size and LR

The data was loaded using tf.Data and different combination of shuffle() buffer sizes and learning rates were tried to see their impact on loss reduction

General Notes :
* **Learning Rate** has no impact on training time, as it is a parameter by which the model weights are adjusted even though the word 'rate' might seem to have a time dimension
* But **Learning Rate** will have an impact on loss reduction. If LR is smaller, then it will take the GD algorithm more time to reach the loss minimum.
* In one **epoch**, the GD algorithm forward passes through all the samples in the dataset and then backward passes to update the weights. In **mini-batch GD**, the dataset is divided into mini-batches of smaller sizes (typically 64,128,256) and it follows the same process
* If tf.Data is used with infinite repetitions `repeat(-1)`, then `steps_per_epoch` must be specified. This should be large enough to cover the entire dataset `shuffle(buffer_size)`
* Larger datasets take longer time to converge

Observations :

* LR 1e-4
    + Loss is tending towards zero unlike previous experiments
    + buffer_size of 40K, the model is converging but at slower rate
    + Smaller buffer size has higher loss variations between epochs
* Large Dataset (140K files ~ 500K samples) with Smaller buffer size (10K) gave a very wiggly loss. Lowering LR from 1e-4 to 1e-10 did not help.
    + This happened most likely because my steps per epoch were very low [CHECK THIS]
* Increased steps per epoch to 800 (x256) so that it covers the 200K shuffle buffer size
    + LR of 1e-4 the model doesn't seem to converge. It went lower around epoch 65 but then started climbing again
    + Tried with LR of 1e-10 but the loss starts around 1.05 and will take a long time to come down. So stopped
    
* LR 1e-5    
    + The model with 1e-05 LR was not converging fast enough

* The initial loss starts higher for lower learning rates (e.g 1e-10 started at 0.7 but 1e-10 started around 0.55) (WHY)


To try :
+ What happens when data was not shuffled?
+ look at validation parameters




![reduce-loss-png](./evaluation-data/ReduceLoss-wanb.PNG)

## Experiment : LeakyRelu and Learning Rate Decay

* Initial loss started very high but then stabilised
* After reaching around 0.6 at around 450 epochs. the loss reducing very slowly
* The learning rate at 450 epochs was around 0.01, which was probably still high ?
* Validation loss is averaging 0.6
* The LR around 800 epochs is 0.001


```
#LR Decay settings
initial_learning_rate = 0.1
lr_scheduler = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=800*10, #every 10 epochs, there were 800 steps per epoch. Each step covering 256 samples (800x256 = 200K shuffle buffer_size)
    decay_rate=0.95,
    staircase=True)
```

![leaky relu-Initial](./evaluation-data/leaky-relu-initial-loss.png)
![leaky relu](./evaluation-data/leaky-relu2.png)
![leaky relu](./evaluation-data/leaky-relu3.png)

## Experiment : Optimisers




In [6]:
#Model name was not set correctly
#df[480:]['params'].replace('Model_Base','modelBase_with_BN_Dropout',regex`=True,inplace=True)
import pandas as pd
df = pd.read_pickle('./evaluation-data/test-varyOptimisers.pkl', compression='infer')
#Split the params column
# df[['model','x','IMGs','x','LR','x','BS']] = df['params'].str.split('_',expand=True)
# df.drop(columns = 'x',inplace = True)


df['params'].str.split('_',expand=True)
df



Unnamed: 0_level_0,loss,precision,recall,auc,elapsed,params
epoch,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.819574,0.226315,0.786526,0.485282,0.21,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
1,0.813502,0.229212,0.760717,0.478854,0.21,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
0,0.566792,0.229156,0.399007,0.495355,0.47,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
1,0.54385,0.0,0.0,0.510221,0.47,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
0,0.539155,0.0,0.0,0.523841,0.72,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
1,0.541593,0.0,0.0,0.532054,0.72,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
0,0.538194,0.0,0.0,0.541809,0.97,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
1,0.544207,0.0,0.0,0.529684,0.97,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
0,0.541666,0.0,0.0,0.538182,1.24,ReduceLoss_Shallow_AllData20Files_DataShuffle2...
1,0.538303,0.0,0.0,0.529026,1.24,ReduceLoss_Shallow_AllData20Files_DataShuffle2...


In [1]:
import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

from tensorflow import keras

import time

from functions_dataCreation import *
from functions_modelArchitectures import *

from tensorflow.keras.callbacks import Callback

import pandas as pd

import wandb
from wandb.keras import WandbCallback

IMG_SIZE = 128

METRICS = [
    #   keras.metrics.TruePositives(name='tp'),
    #   keras.metrics.FalsePositives(name='fp'),
    #   keras.metrics.TrueNegatives(name='tn'),
    #   keras.metrics.FalseNegatives(name='fn'), 
    #   keras.metrics.BinaryAccuracy(name='accuracy'),
      keras.metrics.Precision(name='precision'),
      keras.metrics.Recall(name='recall'),
      keras.metrics.AUC(name='auc'),
]


1 Physical GPUs, 1 Logical GPUs


### The datasets will not be shuffled

In [2]:
train = createIODataset(140,'../data/Train')
test = createIODataset(4,'../data/Test')

train = train.repeat(-1)
# train = train.shuffle(buffer_size=10240*20,reshuffle_each_iteration=True)
train = train.batch(256,drop_remainder=True)
train = train.prefetch(10)

test = test.repeat(-1)
# test = test.shuffle(buffer_size=10240,reshuffle_each_iteration=True)
test = test.batch(1024,drop_remainder=True)
test = test.prefetch(10)



### Model Architecture

In [14]:
ACTIVATION = 'relu'
KERNEL_INITIALISER = 'glorot_normal'
KERNEL_SIZE = (3,3)
POOL_SIZE = (6,6)
# model.add(tf.keras.layers.Dropout(DROPOUT_RATE))

model = tf.keras.Sequential(name='Base')
model.add(tf.keras.layers.Input(shape=(IMG_SIZE,IMG_SIZE,3)))
model.add(tf.keras.layers.experimental.preprocessing.Rescaling(1./255))
model.add(tf.keras.layers.BatchNormalization())            

model.add(tf.keras.layers.Conv2D(filters=10,kernel_size=KERNEL_SIZE,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER,padding='same'))
model.add(tf.keras.layers.MaxPooling2D(pool_size = POOL_SIZE,padding = 'same'))
model.add(tf.keras.layers.Conv2D(filters=20,kernel_size=KERNEL_SIZE,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER,padding='same'))
model.add(tf.keras.layers.MaxPooling2D(pool_size = POOL_SIZE,padding = 'same'))


model.add(tf.keras.layers.Flatten())    

model.add(tf.keras.layers.Dense(64,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER))
model.add(tf.keras.layers.Dense(128,activation=ACTIVATION,kernel_initializer=KERNEL_INITIALISER))
      
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))

model.summary()

Model: "Base"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
rescaling_8 (Rescaling)      (None, 128, 128, 3)       0         
_________________________________________________________________
batch_normalization_6 (Batch (None, 128, 128, 3)       12        
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 128, 128, 10)      280       
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 22, 22, 10)        0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 22, 22, 20)        1820      
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 4, 4, 20)          0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 320)               0      

In [None]:
LR = 1e-7

wandb.init(project="candlestick-CNN", name = 'ReduceLoss_Shallow_tfDataNOShuffle_' + str(LR) )

df = pd.DataFrame()
start_time = time.time()

checkpoint_path = '../data/callbacks/'
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, verbose=1, save_weights_only=True,period=500)   

model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=LR)
                    ,loss=tf.keras.losses.binary_crossentropy
                    ,metrics=[METRICS])

history = model.fit(train
                #,batch_size = 128
                ,epochs=5000
                ,steps_per_epoch=100
                ,verbose=0
                ,validation_data=test                
                ,validation_freq = 100
                ,validation_steps = 10
                ,callbacks=[WandbCallback(),cp_callback]
                )

temp = pd.DataFrame(history.history).rename_axis("epoch")
temp['elapsed'] = round((time.time() - start_time)/60,2)
var_params = "Shallow_" + "_LR_"  + str(LR)
temp['params'] = var_params

print("Elapsed time " + str(round((time.time() - start_time)/60,2)) + var_params)

# model.save('../data/savedmodels') 



## Faster LR
 
* Train a CNN for 10000 epochs with LR of 1e-4
* Changed glorot_normal for relu activation
* Added BN for first image layer

For Batch Size of 256, to complete 1 epoch the model take 2114 steps. So total dataset size is approx 500K images. Time taken per epoch is 198s = 3.5mins

Lets do 200 steps per epoch i.e 51200 (samples) ie 10% of total per epoch




In [5]:

wandb.init(project="candlestick-CNN", name = 'Deeper tfData 1e-7 100steps per epoch shuffle size 200K Deeper CNN NO Shuffle' )

modelBase = {}
modelBase['name'] = 'Base'
modelBase['inputShape'] = (IMG_SIZE,IMG_SIZE,3)
modelBase['activation'] = 'relu'

modelBase['convLayerMultiplier'] = 1

modelBase['poolingLayer'] = 'MaxPooling2D'
modelBase['padding'] = 'same'

modelBase['denseLayers'] = 2
modelBase['units'] = 128
modelBase['activation'] = 'relu'

#with Dropout and BN
modelBase_with_Dropout = modelBase.copy()
modelBase_with_Dropout['name'] = 'modelBase_with_Dropout'
modelBase_with_Dropout['batchnormalization'] = False
# modelBase_with_Dropout['dropout'] = 0.00001

modelBase_with_Dropout['kernelSize'] = (3,3)
modelBase_with_Dropout['filters'] = [10,15,20,25,30,35,40,45,50,55,60]
modelBase_with_Dropout['poolSize'] = (6,6)

lr = 1e-7

df = pd.DataFrame()

start_time = time.time()

checkpoint_path = '../data/callbacks/'

# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    period=500)   

model = createCNN(modelBase_with_Dropout)

model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=lr)
        ,loss=tf.keras.losses.binary_crossentropy
        ,metrics=[METRICS])

history = model.fit(train
                #,batch_size = 128
                ,epochs=5000
                ,steps_per_epoch=100
                ,verbose=0
                ,validation_data=test                
                ,validation_freq = 100
                ,validation_steps = 10
                ,callbacks=[WandbCallback(),cp_callback]
                )

# temp = pd.DataFrame(history.history).rename_axis("epoch")
# temp['elapsed'] = round((time.time() - start_time)/60,2)
# var_params = "Deeper_" + "_LR_"  + str(lr)
# temp['params'] = var_params

# print("Elapsed time " + str(round((time.time() - start_time)/60,2)) + var_params)

# model.save('../data/savedmodels') 



Failed to query for notebook name, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable
[34m[1mwandb[0m: Currently logged in as: [33mamitagni[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.10.9 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade



Epoch 00500: saving model to ../data/callbacks/

Epoch 01000: saving model to ../data/callbacks/

Epoch 01500: saving model to ../data/callbacks/

Epoch 02000: saving model to ../data/callbacks/

Epoch 02500: saving model to ../data/callbacks/

Epoch 03000: saving model to ../data/callbacks/

Epoch 03500: saving model to ../data/callbacks/

Epoch 04000: saving model to ../data/callbacks/

Epoch 04500: saving model to ../data/callbacks/

Epoch 05000: saving model to ../data/callbacks/


In [6]:
1024*100*6

614400