<a href="https://colab.research.google.com/github/CodeMonkey01/DataMiningI/blob/main/ANN/Option_A/ANN_with_BERT_add_BERT_seperator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ANN with BERT
In this notebook I tried to solve the classification model with an ANN based on pretrained BERT layers.

This notebook shows the preprocessing and hyperparameter selection process. 

In [1]:
import tensorflow as tf 

if tf.test.gpu_device_name(): 

    print('Default GPU Device:{}'.format(tf.test.gpu_device_name()))

else:

   print("Please install GPU version of TF")

Default GPU Device:/device:GPU:0


In [2]:
from tensorflow_estimator.python.estimator.canned.dnn import dnn_logit_fn_builder
import tensorflow_hub as hub
import tensorflow_text as text
import pandas as pd
from sklearn.preprocessing import LabelEncoder

In [3]:
df_raw = pd.read_csv('IMDB_Dataset.csv')
df_raw.describe()

Unnamed: 0,review,sentiment
count,50000,50000
unique,49582,2
top,Loved today's show!!! It was a variety and not...,positive
freq,5,25000


In [4]:
df_raw.drop_duplicates(inplace=True)

In [5]:
df_raw[df_raw["sentiment"]== "positive"]

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
5,"Probably my all-time favorite movie, a story o...",positive
...,...,...
49983,"I loved it, having been a fan of the original ...",positive
49985,Imaginary Heroes is clearly the best film of t...,positive
49989,I got this one a few weeks ago and love it! It...,positive
49992,John Garfield plays a Marine who is blinded by...,positive


In [6]:
df_raw[df_raw["sentiment"]== "negative"]

Unnamed: 0,review,sentiment
3,Basically there's a family where a little boy ...,negative
7,"This show was an amazing, fresh & innovative i...",negative
8,Encouraged by the positive comments about this...,negative
10,Phil the Alien is one of those quirky films wh...,negative
11,I saw this movie when I was about 12 when it c...,negative
...,...,...
49994,This is your typical junk comedy.<br /><br />T...,negative
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


# Check for imbalance
We balance the dataset by deleting 186 positive sentiment reviews so both the negative and positive sentiments are equal.

In [7]:
df_raw_pos = df_raw[df_raw["sentiment"] == "positive"]
df_raw_neg = df_raw[df_raw["sentiment"] == "negative"]
#drop the difference between the positive and negative set.
df_raw_pos = df_raw_pos.iloc[186: , : ]
df_raw = pd.concat([df_raw_pos,df_raw_neg], ignore_index=True)
df_raw.sample(frac=1).reset_index(drop=True)
#Check for balance now
print(df_raw["sentiment"].value_counts())

positive    24698
negative    24698
Name: sentiment, dtype: int64


# clean the reviews from HTML Tags

In [8]:
df_raw.iloc[16569]
df_raw.iloc[10808]
df_raw.iloc[6610]
df_raw.iloc[20822]

review       Well, I guess I'm emotionally attached to this...
sentiment                                             positive
Name: 20822, dtype: object

In [9]:
#20822, 6610, 10808, 16569
#view the reviews with HTML tags in the sentences.
#pd.set_option('display.max_colwidth', None)
df_raw[df_raw['review'].str.contains(r'<[^<>]*>') == True]

Unnamed: 0,review,sentiment
0,Goldeneye will always go down as one of thee m...,positive
2,"""Hey Babu Riba"" is a film about a young woman,...",positive
4,What can you say about the film White Fire. Am...,positive
5,If you are interested in learning more about t...,positive
6,"Gamers: DR is not a fancy made movie, it's mor...",positive
...,...,...
49390,Robert Colomb has two full-time jobs. He's kno...,negative
49391,This is your typical junk comedy.<br /><br />T...,negative
49392,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49393,I am a Catholic taught in parochial elementary...,negative


In [10]:
#remove HTML Tags
df_raw['review'] = df_raw['review'].str.replace(r'<[^<>]*>', '', regex=True)

In [11]:
# todo --> take (random) sample to speed up training
df_sampled = df_raw.sample(10000)
print(df_sampled["sentiment"].value_counts())

negative    5000
positive    5000
Name: sentiment, dtype: int64


In [12]:
print(df_raw["sentiment"].value_counts())

positive    24698
negative    24698
Name: sentiment, dtype: int64


# Preprocessing

As wrote in the paper, we tried out different preprocessing approaches for BERT. 

1.   Stop word removal
2.   Stemming
3.   Seperator / Special tokens

We tried these preprocessing independently and together. After testing each method (or combined with others) we found out that option 3 ("Seperator / Special tokens) is working the best for BERT and this binary classification problem. The code for the other preprocessing methods is listed below.


In [13]:
ex = df_raw.iloc[1:3]

In [14]:
ex

Unnamed: 0,review,sentiment
1,Helena Bonham Carter is the center of this mov...,positive
2,"""Hey Babu Riba"" is a film about a young woman,...",positive


In [15]:
ex["sentiment"] = ex["sentiment"].apply(lambda x: 1 if x == "positive" else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ex["sentiment"] = ex["sentiment"].apply(lambda x: 1 if x == "positive" else 0)


In [16]:
ex

Unnamed: 0,review,sentiment
1,Helena Bonham Carter is the center of this mov...,1
2,"""Hey Babu Riba"" is a film about a young woman,...",1


In [17]:
df = df_raw

In [18]:
df['sentiment'] = df['sentiment'].transform(lambda x: 0 if (x == "negative") else 1)
df['sentiment']

0        1
1        1
2        1
3        1
4        1
        ..
49391    0
49392    0
49393    0
49394    0
49395    0
Name: sentiment, Length: 49396, dtype: int64

In [19]:
df_raw['sentiment']

0        1
1        1
2        1
3        1
4        1
        ..
49391    0
49392    0
49393    0
49394    0
49395    0
Name: sentiment, Length: 49396, dtype: int64

In [20]:
df_sampled['sentiment'] = df_sampled['sentiment'].transform(lambda x: 1 if (x == "positive") else 0)
df_sampled['sentiment']

34443    0
23938    1
23242    1
47497    0
6310     1
        ..
40036    0
18942    1
7922     1
252      1
42023    0
Name: sentiment, Length: 10000, dtype: int64

# Option 1 
Stop word removal

In [21]:
df_raw['sentiment']

0        1
1        1
2        1
3        1
4        1
        ..
49391    0
49392    0
49393    0
49394    0
49395    0
Name: sentiment, Length: 49396, dtype: int64

In [22]:
df

Unnamed: 0,review,sentiment
0,Goldeneye will always go down as one of thee m...,1
1,Helena Bonham Carter is the center of this mov...,1
2,"""Hey Babu Riba"" is a film about a young woman,...",1
3,This movie was a fairly entertaining comedy ab...,1
4,What can you say about the film White Fire. Am...,1
...,...,...
49391,This is your typical junk comedy.There are alm...,0
49392,"Bad plot, bad dialogue, bad acting, idiotic di...",0
49393,I am a Catholic taught in parochial elementary...,0
49394,I'm going to have to disagree with the previou...,0


In [23]:
df[df['sentiment'] == 1]

Unnamed: 0,review,sentiment
0,Goldeneye will always go down as one of thee m...,1
1,Helena Bonham Carter is the center of this mov...,1
2,"""Hey Babu Riba"" is a film about a young woman,...",1
3,This movie was a fairly entertaining comedy ab...,1
4,What can you say about the film White Fire. Am...,1
...,...,...
24693,"I loved it, having been a fan of the original ...",1
24694,Imaginary Heroes is clearly the best film of t...,1
24695,I got this one a few weeks ago and love it! It...,1
24696,John Garfield plays a Marine who is blinded by...,1


In [24]:
df[df['sentiment'] == 0]

Unnamed: 0,review,sentiment
24698,Basically there's a family where a little boy ...,0
24699,"This show was an amazing, fresh & innovative i...",0
24700,Encouraged by the positive comments about this...,0
24701,Phil the Alien is one of those quirky films wh...,0
24702,I saw this movie when I was about 12 when it c...,0
...,...,...
49391,This is your typical junk comedy.There are alm...,0
49392,"Bad plot, bad dialogue, bad acting, idiotic di...",0
49393,I am a Catholic taught in parochial elementary...,0
49394,I'm going to have to disagree with the previou...,0


In [25]:
# Remove stop words
from gensim.parsing.preprocessing import remove_stopwords

df['stop_word']=df['review'].apply(lambda x: remove_stopwords(x))

# Option 2
Stemming

In [26]:

import re
import nltk
from nltk.stem import PorterStemmer

token_pattern = re.compile(r"(?u)\b\w\w+\b")

ps = PorterStemmer()

nltk.download('punkt')
nltk.download('stopwords')

df['stemmed']=df['review'].apply(lambda x: ' '.join([ps.stem(y) for y in token_pattern.findall(x)]))

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Ayham\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Ayham\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Option 3 (WE USED THIS)
Seperator and Special tokens

In [27]:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")

MAX_LEN = 128
#pad_to_max_length=True,
df_sampled['bert_preprocessed']=df_sampled['review'].apply(lambda x: " ".join(list(tokenizer.convert_ids_to_tokens(tokenizer.encode(x, add_special_tokens=True, max_length=MAX_LEN, truncation=True)))))
df['bert_preprocessed']=df['review'].apply(lambda x: " ".join(list(tokenizer.convert_ids_to_tokens(tokenizer.encode(x, add_special_tokens=True, max_length=MAX_LEN, truncation=True)))))

  from .autonotebook import tqdm as notebook_tqdm
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]


# Option 4
Stemming + Stop word removal

In [28]:
# Stemming
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import re

token_pattern = re.compile(r"(?u)\b\w\w+\b")

ps = PorterStemmer()

nltk.download('punkt')
nltk.download('stopwords')

my_stopwords = set(stopwords.words('english'))

df['stemmed_stop_removed']=df['review'].apply(lambda x: ' '.join([ps.stem(y) for y in token_pattern.findall(x) if y not in my_stopwords]))

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Ayham\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Ayham\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# Info
Because the preprocessing part was a long lasting process the code for the different methods (stop word removal, stemming) were just added to show how we actually did it. The results are NOT used, because in our interative process we found out that seperator and special tokens (Option 3) works by far the best. 

In [29]:
df[df['sentiment'] == 0]

Unnamed: 0,review,sentiment,stop_word,stemmed,bert_preprocessed,stemmed_stop_removed
24698,Basically there's a family where a little boy ...,0,Basically there's family little boy (Jake) thi...,basic there famili where littl boy jake think ...,[CLS] Basic ##ally there ' s a family where a ...,basic famili littl boy jake think zombi closet...
24699,"This show was an amazing, fresh & innovative i...",0,"This amazing, fresh & innovative idea 70's air...",thi show wa an amaz fresh innov idea in the 70...,"[CLS] This show was an amazing , fresh & innov...",thi show amaz fresh innov idea 70 first air th...
24700,Encouraged by the positive comments about this...,0,Encouraged positive comments film I looking fo...,encourag by the posit comment about thi film o...,[CLS] En ##co ##urage ##d by the positive comm...,encourag posit comment film look forward watch...
24701,Phil the Alien is one of those quirky films wh...,0,Phil Alien quirky films humour based oddness a...,phil the alien is one of those quirki film whe...,[CLS] Phil the Alien is one of those q ##ui ##...,phil alien one quirki film humour base around ...
24702,I saw this movie when I was about 12 when it c...,0,I saw movie I 12 came out. I recall scariest s...,saw thi movi when wa about 12 when it came out...,[CLS] I saw this movie when I was about 12 whe...,saw movi 12 came recal scariest scene big bird...
...,...,...,...,...,...,...
49391,This is your typical junk comedy.There are alm...,0,This typical junk comedy.There laughs. No genu...,thi is your typic junk comedi there are almost...,[CLS] This is your typical junk comedy . There...,thi typic junk comedi there almost laugh no ge...
49392,"Bad plot, bad dialogue, bad acting, idiotic di...",0,"Bad plot, bad dialogue, bad acting, idiotic di...",bad plot bad dialogu bad act idiot direct the ...,"[CLS] Bad plot , bad dialogue , bad acting , i...",bad plot bad dialogu bad act idiot direct anno...
49393,I am a Catholic taught in parochial elementary...,0,I Catholic taught parochial elementary schools...,am cathol taught in parochi elementari school ...,[CLS] I am a Catholic taught in par ##och ##ia...,cathol taught parochi elementari school nun ta...
49394,I'm going to have to disagree with the previou...,0,I'm going disagree previous comment Maltin one...,go to have to disagre with the previou comment...,[CLS] I ' m going to have to disagree with the...,go disagre previou comment side maltin one thi...


In [30]:
df[df['sentiment'] == 1]

Unnamed: 0,review,sentiment,stop_word,stemmed,bert_preprocessed,stemmed_stop_removed
0,Goldeneye will always go down as one of thee m...,1,Goldeneye thee legendary games VG history. The...,goldeney will alway go down as one of thee mos...,[CLS] Golden ##eye will always go down as one ...,goldeney alway go one thee legendari game vg h...
1,Helena Bonham Carter is the center of this mov...,1,Helena Bonham Carter center movie. She plays r...,helena bonham carter is the center of thi movi...,[CLS] Helena Bon ##ham Carter is the center of...,helena bonham carter center movi she play role...
2,"""Hey Babu Riba"" is a film about a young woman,...",1,"""Hey Babu Riba"" film young woman, Mariana (nic...",hey babu riba is film about young woman marian...,"[CLS] "" Hey Babu R ##iba "" is a film about a y...",hey babu riba film young woman mariana nicknam...
3,This movie was a fairly entertaining comedy ab...,1,This movie fairly entertaining comedy Murphy's...,thi movi wa fairli entertain comedi about murp...,[CLS] This movie was a fairly entertaining com...,thi movi fairli entertain comedi murphi law ap...
4,What can you say about the film White Fire. Am...,1,What film White Fire. Amazing? Fantastic? Dist...,what can you say about the film white fire ama...,[CLS] What can you say about the film White Fi...,what say film white fire amaz fantast disturb ...
...,...,...,...,...,...,...
24693,"I loved it, having been a fan of the original ...",1,"I loved it, having fan original series, I wond...",love it have been fan of the origin seri have ...,"[CLS] I loved it , having been a fan of the or...",love fan origin seri alway wonder back stori w...
24694,Imaginary Heroes is clearly the best film of t...,1,Imaginary Heroes clearly best film year. It co...,imaginari hero is clearli the best film of the...,[CLS] I ##ma ##gin ##ary Heroes is clearly the...,imaginari hero clearli best film year it compl...
24695,I got this one a few weeks ago and love it! It...,1,"I got weeks ago love it! It's modern, light fi...",got thi one few week ago and love it it modern...,[CLS] I got this one a few weeks ago and love ...,got one week ago love it modern light fill tru...
24696,John Garfield plays a Marine who is blinded by...,1,John Garfield plays Marine blinded grenade fig...,john garfield play marin who is blind by grena...,[CLS] John Garfield plays a Marine who is blin...,john garfield play marin blind grenad fight gu...


In [31]:
df["sentiment"]

0        1
1        1
2        1
3        1
4        1
        ..
49391    0
49392    0
49393    0
49394    0
49395    0
Name: sentiment, Length: 49396, dtype: int64

In [32]:
from sklearn.model_selection import train_test_split

# Create train test split for training

X_train, X_test, y_train, y_test = train_test_split(df['bert_preprocessed'], df['sentiment'], test_size=0.4)
X_train_sampled, X_test_sampled, y_train_sampled, y_test_sampled = train_test_split(df_sampled['bert_preprocessed'], df_sampled['sentiment'], test_size=0.4)

# BERT

In [33]:
bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4")



In [34]:
def get_sentence_embeding(sentences):
    preprocessed_text = bert_preprocess(sentences)
    return bert_encoder(preprocessed_text)['pooled_output']

## Test embedding
Test word embedding from pretrained BERT model with a real sentence from dataset.

In [35]:
test_sentence = df["bert_preprocessed"][1]
print("Test sentence:")
print(test_sentence)
print("Test sentence (word embedding):")
print(get_sentence_embeding([test_sentence]))

Test sentence:
[CLS] Helena Bon ##ham Carter is the center of this movie . She plays her role almost im ##mobile in a wheelchair but still brings across her traditional intensity . Kenneth B ##rana ##gh was to ##ler ##able . The movie itself was good not exceptional . If you are a Helena Bon ##ham Carter fan it is worth seeing . [SEP]
Test sentence (word embedding):
tf.Tensor(
[[-6.56708479e-01  3.95552158e-01  9.99649048e-01 -9.87729192e-01
   9.13122892e-01  8.97750378e-01  9.69404042e-01 -9.86160159e-01
  -9.43355262e-01 -4.80239183e-01  9.62758183e-01  9.96482968e-01
  -9.97055233e-01 -9.99594569e-01  6.12984419e-01 -9.47933674e-01
   9.76224363e-01 -4.58532512e-01 -9.99901175e-01 -4.75078076e-01
  -7.04436123e-01 -9.99722660e-01  1.15946516e-01  9.64269042e-01
   9.44839001e-01  7.78565835e-03  9.79944229e-01  9.99921799e-01
   6.98229074e-01  2.65174329e-01  2.78167695e-01 -9.79680657e-01
   5.89254439e-01 -9.98143852e-01  2.72924542e-01  3.86826158e-01
   4.77616787e-01 -1.47951

# Build model

```
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_type_ids':                                                
                                (None, 128),                                                      
                                 'input_word_ids':                                                
                                (None, 128)}                                                      
                                                                                                  
 keras_layer_1 (KerasLayer)     {'encoder_outputs':  108310273   ['keras_layer[1][0]',            
                                 [(None, 128, 768),               'keras_layer[1][1]',            
                                 (None, 128, 768),                'keras_layer[1][2]']            
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768)],                                               
                                 'sequence_output':                                               
                                 (None, 128, 768),                                                
                                 'pooled_output': (                                               
                                None, 768),                                                       
                                 'default': (None,                                                
                                768)}                                                             
                                                                                                  
 dropout (Dropout)              (None, 768)          0           ['keras_layer_1[1][13]']         
                                                                                                  
 output (Dense)                 (None, 1)            769         ['dropout[0][0]']                
                                                                                                  
==================================================================================================
Total params: 108,311,042
Trainable params: 769
Non-trainable params: 108,310,273
__________________________________________________________________________________________________
```

In [36]:
def build_model() -> tf.keras.Model:
    # Bert layers
    text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
    preprocessed_text = bert_preprocess(text_input)
    outputs = bert_encoder(preprocessed_text)

    # Neural network layers
    l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output']) # dropout rate of 0.1 works the best
    l = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l) # other activation functions like softmax reduce the accuracy by A LOT

    # Use inputs and outputs to construct a final model
    model = tf.keras.Model(inputs=[text_input], outputs = [l])

    #model.summary()

    return model

In [37]:
y_train_sampled

6021     1
12149    1
36108    0
47081    0
3800     1
        ..
23573    1
30554    0
33776    0
13238    1
42788    0
Name: sentiment, Length: 6000, dtype: int64

In [38]:
X_train_sampled

6021     [CLS] * WA ##R ##NI ##NG * S ##po ##iler ##s a...
12149    [CLS] i will like to order this movie for the ...
36108    [CLS] Just picked up this film for a b ##uck a...
47081    [CLS] " While traveling in the mountains , a m...
3800     [CLS] With all the " Adult " inn ##uen ##dos i...
                               ...                        
23573    [CLS] This Hong Kong filmed pot ##bo ##iler pa...
30554    [CLS] As I don ' t have a TV , and had never h...
33776    [CLS] It seems a shame that G ##reta G ##ar ##...
13238    [CLS] It ' s worth b ##oning up on the Hindu p...
42788    [CLS] Don ' t let the name of this film de ##c...
Name: bert_preprocessed, Length: 6000, dtype: object

In [39]:
import numpy as np
import itertools
EPOCHS = 5

METRICS = [
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
]

lr_values = np.arange(1e-3, 1e-2, 0.001)
epsilon_values = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
batch_values = [32]

values = list(itertools.product(lr_values, epsilon_values, batch_values))

print(f"Combinations: {len(values)}")
for lr, ep, batch in values:
  model: tf.keras.Model = build_model()

  print(f"Try adam learning rate of: {lr} and e: {ep} and batch size: {batch}")

  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr, epsilon=ep),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=METRICS)
    
  model.fit(X_train_sampled, y_train_sampled, epochs=EPOCHS, batch_size=batch)
  model.evaluate(X_test_sampled, y_test_sampled)

Combinations: 54
Try adam learning rate of: 0.001 and e: 1e-06 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.001 and e: 1e-05 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.001 and e: 0.0001 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.001 and e: 0.001 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.001 and e: 0.01 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.001 and e: 0.1 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.002 and e: 1e-06 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.002 and e: 1e-05 and batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try adam learning rate of: 0.002 and e: 0.0001 and batch size: 32

# Evaluation of learning rate and epsilon
After running the grid search for different learning rate values with 20k samples for 5 epochs, we get the following results. 




```
Try adam learning rate of: 0.002 and e: 1e-05 and batch size: 32
Epoch 1/5
375/375 [==============================] - 128s 335ms/step - loss: 0.6900 - accuracy: 0.5928 - precision: 0.6110 - recall: 0.5039
Epoch 2/5
375/375 [==============================] - 125s 333ms/step - loss: 0.6534 - accuracy: 0.6147 - precision: 0.6167 - recall: 0.6017
Epoch 3/5
375/375 [==============================] - 125s 333ms/step - loss: 0.6552 - accuracy: 0.6198 - precision: 0.6213 - recall: 0.6094
Epoch 4/5
375/375 [==============================] - 126s 336ms/step - loss: 0.6353 - accuracy: 0.6413 - precision: 0.6431 - recall: 0.6314
Epoch 5/5
375/375 [==============================] - 125s 333ms/step - loss: 0.6263 - accuracy: 0.6453 - precision: 0.6467 - recall: 0.6374
250/250 [==============================] - 84s 331ms/step - loss: 0.6130 - accuracy: 0.6631 - precision: 0.6763 - recall: 0.6201

```

> The best results are reached for a learning rate of 0.007 and an epsilon of 1e-06.

# Evaluate Batch size

Because the batch size is dependend on the data set size we had to run the grid search on the 200k data set. 

We used the optimized learning rate and episolon value from above.

In [40]:
import numpy as np
import itertools
EPOCHS = 5

ADAM_LEARNING_RATE = 0.0007
ADAM_EPSILON = 1e-06


METRICS = [
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
]

batch_values = [32, 38, 64, 128, 256, 512]

for batch in batch_values:
  model: tf.keras.Model = build_model()

  print(f"Try batch size: {batch}")

  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=ADAM_LEARNING_RATE, epsilon=ADAM_EPSILON),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=METRICS)
    
  model.fit(X_train, y_train, epochs=EPOCHS, batch_size=batch)
  model.evaluate(X_test, y_test)

Try batch size: 32
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try batch size: 38
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try batch size: 64
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try batch size: 128
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try batch size: 256
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Try batch size: 512
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


# Evaluate

After training the model and evaluation the model for different batch sizes we come up with the following results. 


```
    Try batch size: 32
    accuracy: 0.9117 - precision: 0.9247 - recall: 0.8959
    Try batch size: 48
    accuracy: 0.9221 - precision: 0.9399 - recall: 0.9011
Try batch size: 64
accuracy: 0.9227 - precision: 0.9168 - recall: 0.9295
    Try batch size: 128
    accuracy: 0.8965 - precision: 0.8673 - recall: 0.9370
    Try batch size: 256
    accuracy: 0.9189 - precision: 0.9068 - recall: 0.9333
    Try batch size: 512
    accuracy: 0.8771 - precision: 0.8654 - recall: 0.8940
```

We managed to reach the highest score with batch size 64. Therefore we use this for the final training.

In [None]:
jhjssfsdsdkk