<a href="https://colab.research.google.com/github/CodeMonkey01/DataMiningI/blob/main/ANN/Option_A/ANN_with_BERT_add_BERT_seperator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ANN with BERT
In this notebook I tried to solve the classification model with an ANN based on pretrained BERT layers.

This notebook shows the preprocessing and hyperparameter selection process. 

In [None]:
!pip install gensim

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text as text
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf



In [2]:
df_raw = pd.read_csv('../IMDB Dataset.csv')
df_raw.describe()

Unnamed: 0,review,sentiment
count,50000,50000
unique,49582,2
top,Loved today's show!!! It was a variety and not...,positive
freq,5,25000


In [3]:
df_raw.drop_duplicates(inplace=True)

In [4]:
df_raw[df_raw["sentiment"]== "positive"]

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
5,"Probably my all-time favorite movie, a story o...",positive
...,...,...
49983,"I loved it, having been a fan of the original ...",positive
49985,Imaginary Heroes is clearly the best film of t...,positive
49989,I got this one a few weeks ago and love it! It...,positive
49992,John Garfield plays a Marine who is blinded by...,positive


In [5]:
df_raw[df_raw["sentiment"]== "negative"]

Unnamed: 0,review,sentiment
3,Basically there's a family where a little boy ...,negative
7,"This show was an amazing, fresh & innovative i...",negative
8,Encouraged by the positive comments about this...,negative
10,Phil the Alien is one of those quirky films wh...,negative
11,I saw this movie when I was about 12 when it c...,negative
...,...,...
49994,This is your typical junk comedy.<br /><br />T...,negative
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


# Check for imbalance
We balance the dataset by deleting 186 positive sentiment reviews so both the negative and positive sentiments are equal.

In [6]:
df_raw_pos = df_raw[df_raw["sentiment"]== "positive"]
df_raw_neg = df_raw[df_raw["sentiment"]== "negative"]
#drop the difference between the positive and negative set.
df_raw_pos = df_raw_pos.iloc[186: , : ]
df_raw = pd.concat([df_raw_pos,df_raw_neg], ignore_index=True)
df_raw.sample(frac=1).reset_index(drop=True)
#Check for balance now
print(df_raw["sentiment"].value_counts())

positive    24698
negative    24698
Name: sentiment, dtype: int64


# clean the reviews from HTML Tags

In [39]:
df_raw.iloc[16569]
df_raw.iloc[10808]
df_raw.iloc[6610]
df_raw.iloc[20822]

review       > Contrary to most reviews I've read, I didn't feel this followed any of the other rock movies ("Spinal Tap", etc.) The story was more unique, although I feel most people wanted to see the "sex, drugs & rock and roll" vices that the band kept alluding to.<br /><br />> As an American, I knew a few of the actors - Spall, Connelly & Rea. Surprised to find out "Brian"/Bruce Robinson was in Zifferedi's (<sp?) classic "Romeo & Juliet". Guess I'll have to rent that next.<br /><br />> "THE FLAME STILL BURNS" - My wife, who hails from Mexico, didn't follow the English/British language too well, missed some of the jokes (which I dutifully explained) but she cried her eyes out at the concert scene. She loves the song so much now.<br /><br />> Funny that Amazon.com has the soundtrack for $30+usd when I bought the DVD in the bargain bin at Wal-Mart for $5.50usd. Price non-withstanding, I first saw this on late night cable and have been dying to find it ever since.
sentiment            

In [7]:
#20822, 6610, 10808, 16569
#view the reviews with HTML tags in the sentences.
#pd.set_option('display.max_colwidth', None)
df_raw[df_raw['review'].str.contains(r'<[^<>]*>') == True]

Unnamed: 0,review,sentiment
0,Goldeneye will always go down as one of thee m...,positive
2,"""Hey Babu Riba"" is a film about a young woman,...",positive
4,What can you say about the film White Fire. Am...,positive
5,If you are interested in learning more about t...,positive
6,"Gamers: DR is not a fancy made movie, it's mor...",positive
...,...,...
49390,Robert Colomb has two full-time jobs. He's kno...,negative
49391,This is your typical junk comedy.<br /><br />T...,negative
49392,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49393,I am a Catholic taught in parochial elementary...,negative


In [8]:
#remove HTML Tags
df_raw['review'] = df_raw['review'].str.replace(r'<[^<>]*>', '', regex=True)

In [9]:
# todo --> take (random) sample to speed up training
df_sampled = df_raw.sample(20_000)
df = df_raw
print(df_sampled["sentiment"].value_counts())

negative    10028
positive     9972
Name: sentiment, dtype: int64


In [10]:
print(df_raw["sentiment"].value_counts())

positive    24698
negative    24698
Name: sentiment, dtype: int64


# Preprocessing

As wrote in the paper, we tried out different preprocessing approaches for BERT. 

1.   Stop word removal
2.   Stemming
3.   Seperator / Special tokens

We tried these preprocessing independently and together. After testing each method (or combined with others) we found out that option 3 ("Seperator / Special tokens) is working the best for BERT and this binary classification problem. The code for the other preprocessing methods is listed below.


In [14]:
ex = df_raw.iloc[1:2]

In [17]:
ex

Unnamed: 0,review,sentiment
1,Helena Bonham Carter is the center of this mov...,1


In [16]:
ex["sentiment"] = ex["sentiment"].apply(lambda x: 1 if x == "positive" else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ex["sentiment"] = ex["sentiment"].apply(lambda x: 1 if x == "positive" else 0)


In [18]:
# Transform class from Boolean to integer value
df_raw['sentiment'].apply(lambda x: 1 if x == "positive" else 0)
df_sampled['sentiment'].apply(lambda x: 1 if x == "positive" else 0)
df['sentiment'].apply(lambda x: 1 if x=="positive" else 0)

0        1
1        1
2        1
3        1
4        1
        ..
49391    0
49392    0
49393    0
49394    0
49395    0
Name: sentiment, Length: 49396, dtype: int64

# Option 1 
Stop word removal

In [19]:
# Remove stop words
from gensim.parsing.preprocessing import remove_stopwords

df['stop_word']=df['review'].apply(lambda x: remove_stopwords(x))

# Option 2
Stemming

In [20]:

import re
import nltk
from nltk.stem import PorterStemmer

token_pattern = re.compile(r"(?u)\b\w\w+\b")

ps = PorterStemmer()

nltk.download('punkt')
nltk.download('stopwords')

df['stemmed']=df['review'].apply(lambda x: ' '.join([ps.stem(y) for y in token_pattern.findall(x)]))

[nltk_data] Downloading package punkt to
[nltk_data]     D:\Users\BKU\ayhamshalaby\AppData(Roaming)\nltk_data..
[nltk_data]     .
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     D:\Users\BKU\ayhamshalaby\AppData(Roaming)\nltk_data..
[nltk_data]     .
[nltk_data]   Package stopwords is already up-to-date!


# Option 3 (WE USED THIS)
Seperator and Special tokens

In [21]:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-cased")

MAX_LEN = 128
#pad_to_max_length=True,
df_sampled['bert_preprocessed']=df_sampled['review'].apply(lambda x: " ".join(list(tokenizer.convert_ids_to_tokens(tokenizer.encode(x, add_special_tokens=True, max_length=MAX_LEN, truncation=True)))))
df['bert_preprocessed']=df['review'].apply(lambda x: " ".join(list(tokenizer.convert_ids_to_tokens(tokenizer.encode(x, add_special_tokens=True, max_length=MAX_LEN, truncation=True)))))

  from .autonotebook import tqdm as notebook_tqdm


# Option 4
Stemming + Stop word removal

In [22]:
# Stemming
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import re

token_pattern = re.compile(r"(?u)\b\w\w+\b")

ps = PorterStemmer()

nltk.download('punkt')
nltk.download('stopwords')

my_stopwords = set(stopwords.words('english'))

df['stemmed_stop_removed']=df['review'].apply(lambda x: ' '.join([ps.stem(y) for y in token_pattern.findall(x) if y not in my_stopwords]))

[nltk_data] Downloading package punkt to
[nltk_data]     D:\Users\BKU\ayhamshalaby\AppData(Roaming)\nltk_data..
[nltk_data]     .
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     D:\Users\BKU\ayhamshalaby\AppData(Roaming)\nltk_data..
[nltk_data]     .
[nltk_data]   Package stopwords is already up-to-date!


# Info
Because the preprocessing part was a long lasting process the code for the different methods (stop word removal, stemming) were just added to show how we actually did it. The results are NOT used, because in our interative process we found out that seperator and special tokens (Option 3) works by far the best. 

In [23]:
df[df['sentiment'] == 0]

Unnamed: 0,review,sentiment,stop_word,stemmed,bert_preprocessed,stemmed_stop_removed


In [24]:
df[df['sentiment'] == 1]

Unnamed: 0,review,sentiment,stop_word,stemmed,bert_preprocessed,stemmed_stop_removed


In [34]:
df["sentiment"]

0        positive
1        positive
2        positive
3        positive
4        positive
           ...   
49391    negative
49392    negative
49393    negative
49394    negative
49395    negative
Name: sentiment, Length: 49396, dtype: object

In [25]:
from sklearn.model_selection import train_test_split

# Create train test split for training

X_train, X_test, y_train, y_test = train_test_split(df['bert_preprocessed'], df['sentiment'], test_size=0.4)
X_train_sampled, X_test_sampled, y_train_sampled, y_test_sampled = train_test_split(df_sampled['bert_preprocessed'], df_sampled['sentiment'], test_size=0.4)

# BERT

In [26]:
bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_cased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/4")



In [27]:
def get_sentence_embeding(sentences):
    preprocessed_text = bert_preprocess(sentences)
    return bert_encoder(preprocessed_text)['pooled_output']

## Test embedding
Test word embedding from pretrained BERT model with a real sentence from dataset.

In [28]:
test_sentence = df["bert_preprocessed"][1]
print("Test sentence:")
print(test_sentence)
print("Test sentence (word embedding):")
print(get_sentence_embeding([test_sentence]))

Test sentence:
[CLS] Helena Bon ##ham Carter is the center of this movie . She plays her role almost im ##mobile in a wheelchair but still brings across her traditional intensity . Kenneth B ##rana ##gh was to ##ler ##able . The movie itself was good not exceptional . If you are a Helena Bon ##ham Carter fan it is worth seeing . [SEP]
Test sentence (word embedding):
tf.Tensor(
[[-6.56708479e-01  3.95552158e-01  9.99649048e-01 -9.87729251e-01
   9.13122892e-01  8.97750497e-01  9.69403923e-01 -9.86159980e-01
  -9.43355262e-01 -4.80239183e-01  9.62758183e-01  9.96482909e-01
  -9.97055173e-01 -9.99594569e-01  6.12984776e-01 -9.47933733e-01
   9.76224363e-01 -4.58532244e-01 -9.99901116e-01 -4.75078374e-01
  -7.04436302e-01 -9.99722838e-01  1.15946457e-01  9.64269102e-01
   9.44839001e-01  7.78565696e-03  9.79944289e-01  9.99921918e-01
   6.98228955e-01  2.65173972e-01  2.78167516e-01 -9.79680657e-01
   5.89254618e-01 -9.98143733e-01  2.72924572e-01  3.86826217e-01
   4.77616817e-01 -1.47951

# Build model

```
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 text (InputLayer)              [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_mask': (Non  0           ['text[0][0]']                   
                                e, 128),                                                          
                                 'input_type_ids':                                                
                                (None, 128),                                                      
                                 'input_word_ids':                                                
                                (None, 128)}                                                      
                                                                                                  
 keras_layer_1 (KerasLayer)     {'encoder_outputs':  108310273   ['keras_layer[1][0]',            
                                 [(None, 128, 768),               'keras_layer[1][1]',            
                                 (None, 128, 768),                'keras_layer[1][2]']            
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768),                                                
                                 (None, 128, 768)],                                               
                                 'sequence_output':                                               
                                 (None, 128, 768),                                                
                                 'pooled_output': (                                               
                                None, 768),                                                       
                                 'default': (None,                                                
                                768)}                                                             
                                                                                                  
 dropout (Dropout)              (None, 768)          0           ['keras_layer_1[1][13]']         
                                                                                                  
 output (Dense)                 (None, 1)            769         ['dropout[0][0]']                
                                                                                                  
==================================================================================================
Total params: 108,311,042
Trainable params: 769
Non-trainable params: 108,310,273
__________________________________________________________________________________________________
```

In [29]:
def build_model() -> tf.keras.Model:
    # Bert layers
    text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
    preprocessed_text = bert_preprocess(text_input)
    outputs = bert_encoder(preprocessed_text)

    # Neural network layers
    l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output']) # dropout rate of 0.1 works the best
    l = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l) # other activation functions like softmax reduce the accuracy by A LOT

    # Use inputs and outputs to construct a final model
    model = tf.keras.Model(inputs=[text_input], outputs = [l])

    #model.summary()

    return model

In [33]:
y_train_sampled

11491    positive
41438    negative
41537    negative
44178    negative
25539    negative
           ...   
44121    negative
21494    positive
47256    negative
25019    negative
48466    negative
Name: sentiment, Length: 12000, dtype: object

In [31]:
X_train_sampled

11491    [CLS] This complicated western was a milestone...
41438    [CLS] When George S ##lu ##izer was told he co...
41537    [CLS] I love low budget independent films and ...
44178    [CLS] The St . Francis ##ville Experiment clai...
25539    [CLS] The Blood ##su ##cker Lead ##s the Dance...
                               ...                        
44121    [CLS] Everyone in this movie tells Ra ##ffy Ca...
21494    [CLS] As a fan of the old Doctor Who , and aft...
47256    [CLS] This mini series , also based on a book ...
25019    [CLS] This review also contains a s ##po ##ile...
48466    [CLS] I received this movie as a gift , I knew...
Name: bert_preprocessed, Length: 12000, dtype: object

In [32]:
import numpy as np
import itertools
EPOCHS = 5

METRICS = [
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
]


lr_values = np.arange(1e-3, 1e-2, 0.001)
epsilon_values = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
batch_values = [32]

values = list(itertools.product(lr_values, epsilon_values, batch_values))

print(f"Combinations: {len(values)}")
for lr, ep, batch in values:
  model: tf.keras.Model = build_model()

  print(f"Try adam learning rate of: {lr} and e: {ep} and batch size: {batch}")

  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr, epsilon=ep),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=METRICS)
    
  model.fit(X_train_sampled, y_train_sampled, epochs=EPOCHS, batch_size=batch)
  model.evaluate(X_test_sampled, y_test_sampled)

Combinations: 54
Try adam learning rate of: 0.001 and e: 1e-06 and batch size: 32
Epoch 1/5


UnimplementedError: Graph execution error:

Detected at node 'binary_crossentropy/Cast' defined at (most recent call last):
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel_launcher.py", line 17, in <module>
      app.launch_new_instance()
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\traitlets\config\application.py", line 976, in launch_instance
      app.start()
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\kernelapp.py", line 712, in start
      self.io_loop.start()
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\tornado\platform\asyncio.py", line 215, in start
      self.asyncio_loop.run_forever()
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 600, in run_forever
      self._run_once()
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1896, in _run_once
      handle._run()
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\kernelbase.py", line 510, in dispatch_queue
      await self.process_one()
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\kernelbase.py", line 499, in process_one
      await dispatch(*args)
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\kernelbase.py", line 406, in dispatch_shell
      await result
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\kernelbase.py", line 730, in execute_request
      reply_content = await reply_content
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\ipkernel.py", line 383, in do_execute
      res = shell.run_cell(
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\ipykernel\zmqshell.py", line 528, in run_cell
      return super().run_cell(*args, **kwargs)
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\IPython\core\interactiveshell.py", line 2885, in run_cell
      result = self._run_cell(
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\IPython\core\interactiveshell.py", line 2940, in _run_cell
      return runner(coro)
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\IPython\core\interactiveshell.py", line 3139, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\IPython\core\interactiveshell.py", line 3318, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "D:\Users\BKU\ayhamshalaby\AppData(Roaming)\Python\Python310\site-packages\IPython\core\interactiveshell.py", line 3378, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "C:\Users\ayhamshalaby\AppData\Local\Temp\ipykernel_6100\149891945.py", line 28, in <module>
      model.fit(X_train_sampled, y_train_sampled, epochs=EPOCHS, batch_size=batch)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\training.py", line 1564, in fit
      tmp_logs = self.train_function(iterator)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\training.py", line 1160, in train_function
      return step_function(self, iterator)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\training.py", line 1146, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\training.py", line 1135, in run_step
      outputs = model.train_step(data)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\training.py", line 994, in train_step
      loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\training.py", line 1052, in compute_loss
      return self.compiled_loss(
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\compile_utils.py", line 265, in __call__
      loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\losses.py", line 152, in __call__
      losses = call_fn(y_true, y_pred)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\losses.py", line 272, in call
      return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "c:\Users\ayhamshalaby\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\losses.py", line 2151, in binary_crossentropy
      y_true = tf.cast(y_true, y_pred.dtype)
Node: 'binary_crossentropy/Cast'
Cast string to float is not supported
	 [[{{node binary_crossentropy/Cast}}]] [Op:__inference_train_function_70092]

# Evaluation of learning rate and epsilon
After running the grid search for different learning rate values with 20k samples for 5 epochs, we get the following results. 




```
    Try adam learning rate of: 0.001 and e: 1e-06 and batch size: 32
    accuracy: 0.8656 - precision: 0.8964 - recall: 0.8237
    Try adam learning rate of: 0.001 and e: 1e-05 and batch size: 32
    accuracy: 0.8634 - precision: 0.9016 - recall: 0.8126
    Try adam learning rate of: 0.001 and e: 0.0001 and batch size: 32
    accuracy: 0.8676 - precision: 0.9010 - recall: 0.8229
    Try adam learning rate of: 0.001 and e: 0.001 and batch size: 32
    accuracy: 0.8744 - precision: 0.8774 - recall: 0.8674
    Try adam learning rate of: 0.001 and e: 0.01 and batch size: 32
    accuracy: 0.8439 - precision: 0.9033 - recall: 0.7666
    Try adam learning rate of: 0.001 and e: 0.1 and batch size: 32
    accuracy: 0.8291 - precision: 0.8127 - recall: 0.8507
    Try adam learning rate of: 0.002 and e: 1e-06 and batch size: 32
    accuracy: 0.8875 - precision: 0.8883 - recall: 0.8838
    Try adam learning rate of: 0.002 and e: 1e-05 and batch size: 32
    accuracy: 0.8826 - precision: 0.8503 - recall: 0.9257
    Try adam learning rate of: 0.002 and e: 0.0001 and batch size: 32
    accuracy: 0.8857 - precision: 0.8647 - recall: 0.9118
    Try adam learning rate of: 0.002 and e: 0.001 and batch size: 32
    accuracy: 0.8838 - precision: 0.8753 - recall: 0.8921
    Try adam learning rate of: 0.002 and e: 0.01 and batch size: 32
    accuracy: 0.8677 - precision: 0.8276 - recall: 0.9255
    Try adam learning rate of: 0.002 and e: 0.1 and batch size: 32
    accuracy: 0.8446 - precision: 0.8258 - recall: 0.8694
    Try adam learning rate of: 0.003 and e: 1e-06 and batch size: 32
    accuracy: 0.8919 - precision: 0.8762 - recall: 0.9101
    Try adam learning rate of: 0.003 and e: 1e-05 and batch size: 32
    accuracy: 0.8917 - precision: 0.9028 - recall: 0.8755
    Try adam learning rate of: 0.003 and e: 0.0001 and batch size: 32
    accuracy: 0.8842 - precision: 0.8444 - recall: 0.9391
    Try adam learning rate of: 0.003 and e: 0.001 and batch size: 32
    accuracy: 0.8903 - precision: 0.9013 - recall: 0.8740
    Try adam learning rate of: 0.003 and e: 0.01 and batch size: 32
    accuracy: 0.8841 - precision: 0.8852 - recall: 0.8800
    Try adam learning rate of: 0.003 and e: 0.1 and batch size: 32
    accuracy: 0.8531 - precision: 0.8754 - recall: 0.8199
    Try adam learning rate of: 0.004 and e: 1e-06 and batch size: 32
    accuracy: 0.8866 - precision: 0.9311 - recall: 0.8325
    Try adam learning rate of: 0.004 and e: 1e-05 and batch size: 32
    accuracy: 0.8830 - precision: 0.8373 - recall: 0.9477
    Try adam learning rate of: 0.004 and e: 0.0001 and batch size: 32
    accuracy: 0.8971 - precision: 0.9052 - recall: 0.8848
    Try adam learning rate of: 0.004 and e: 0.001 and batch size: 32
    accuracy: 0.8953 - precision: 0.8807 - recall: 0.9118
    Try adam learning rate of: 0.004 and e: 0.01 and batch size: 32
    accuracy: 0.8864 - precision: 0.8882 - recall: 0.8813
    Try adam learning rate of: 0.004 and e: 0.1 and batch size: 32
    accuracy: 0.8602 - precision: 0.8298 - recall: 0.9028
    Try adam learning rate of: 0.005 and e: 1e-06 and batch size: 32
    accuracy: 0.8754 - precision: 0.8184 - recall: 0.9616
    Try adam learning rate of: 0.005 and e: 1e-05 and batch size: 32
    accuracy: 0.8775 - precision: 0.9474 - recall: 0.7967
    Try adam learning rate of: 0.005 and e: 0.0001 and batch size: 32
    accuracy: 0.8891 - precision: 0.9298 - recall: 0.8394
    Try adam learning rate of: 0.005 and e: 0.001 and batch size: 32
    accuracy: 0.8972 - precision: 0.8934 - recall: 0.8997
    Try adam learning rate of: 0.005 and e: 0.01 and batch size: 32
    accuracy: 0.8873 - precision: 0.8688 - recall: 0.9096
    Try adam learning rate of: 0.005 and e: 0.1 and batch size: 32
    accuracy: 0.8644 - precision: 0.8453 - recall: 0.8886
    Try adam learning rate of: 0.006 and e: 1e-06 and batch size: 32
    accuracy: 0.8219 - precision: 0.7411 - recall: 0.9838
    Try adam learning rate of: 0.006 and e: 1e-05 and batch size: 32
    accuracy: 0.8846 - precision: 0.9486 - recall: 0.8108
    Try adam learning rate of: 0.006 and e: 0.0001 and batch size: 32
    accuracy: 0.9013 - precision: 0.9047 - recall: 0.8947
    Try adam learning rate of: 0.006 and e: 0.001 and batch size: 32
    accuracy: 0.7896 - precision: 0.9814 - recall: 0.5860
    Try adam learning rate of: 0.006 and e: 0.01 and batch size: 32
    accuracy: 0.8873 - precision: 0.8529 - recall: 0.9331
    Try adam learning rate of: 0.006 and e: 0.1 and batch size: 32
    accuracy: 0.8700 - precision: 0.8587 - recall: 0.8825
Try adam learning rate of: 0.007 and e: 1e-06 and batch size: 32
> accuracy: 0.9038 - precision: 0.8971 - recall: 0.9098
    Try adam learning rate of: 0.007 and e: 1e-05 and batch size: 32
    accuracy: 0.8845 - precision: 0.8837 - recall: 0.8870
    accuracy: 0.9019 - precision: 0.8795 - recall: 0.9290
    Try adam learning rate of: 0.007 and e: 0.0001 and batch size: 32
    accuracy: 0.8942 - precision: 0.9342 - recall: 0.8459
    Try adam learning rate of: 0.007 and e: 0.001 and batch size: 32
    accuracy: 0.8972 - precision: 0.8709 - recall: 0.9303
    Try adam learning rate of: 0.007 and e: 0.01 and batch size: 32
    accuracy: 0.8785 - precision: 0.9361 - recall: 0.8098
    Try adam learning rate of: 0.007 and e: 0.1 and batch size: 32
    accuracy: 0.8725 - precision: 0.8619 - recall: 0.8841
    Try adam learning rate of: 0.008 and e: 1e-06 and batch size: 32
    accuracy: 0.8990 - precision: 0.9207 - recall: 0.8709
    Try adam learning rate of: 0.008 and e: 1e-05 and batch size: 32
    accuracy: 0.9013 - precision: 0.9177 - recall: 0.8793
    Try adam learning rate of: 0.008 and e: 0.0001 and batch size: 32
    accuracy: 0.8982 - precision: 0.9217 - recall: 0.8681
    Try adam learning rate of: 0.008 and e: 0.001 and batch size: 32
    accuracy: 0.8899 - precision: 0.9370 - recall: 0.8335
    Try adam learning rate of: 0.008 and e: 0.01 and batch size: 32
    accuracy: 0.8873 - precision: 0.9214 - recall: 0.8442
    Try adam learning rate of: 0.008 and e: 0.1 and batch size: 32
    accuracy: 0.8616 - precision: 0.8165 - recall: 0.9293
    Try adam learning rate of: 0.009000000000000001 and e: 1e-06 and batch size: 32
    accuracy: 0.8947 - precision: 0.9306 - recall: 0.8507
    Try adam learning rate of: 0.009000000000000001 and e: 1e-05 and batch size: 32
    accuracy: 0.8964 - precision: 0.9372 - recall: 0.8474
    Try adam learning rate of: 0.009000000000000001 and e: 0.0001 and batch size: 32
    accuracy: 0.8854 - precision: 0.8317 - recall: 0.9634
    Try adam learning rate of: 0.009000000000000001 and e: 0.001 and batch size: 32
    accuracy: 0.8859 - precision: 0.9482 - recall: 0.8138
    Try adam learning rate of: 0.009000000000000001 and e: 0.01 and batch size: 32
    accuracy: 0.8776 - precision: 0.9427 - recall: 0.8015
    Try adam learning rate of: 0.009000000000000001 and e: 0.1 and batch size: 32
    accuracy: 0.8689 - precision: 0.9006 - recall: 0.8262
```

> The best results are reached for a learning rate of 0.007 and an epsilon of 1e-06.

# Evaluate Batch size

Because the batch size is dependend on the data set size we had to run the grid search on the 200k data set. 

We used the optimized learning rate and episolon value from above.

In [None]:
import numpy as np
import itertools
EPOCHS = 5

ADAM_LEARNING_RATE = 0.0007
ADAM_EPSILON = 1e-06


METRICS = [
      tf.keras.metrics.BinaryAccuracy(name='accuracy'),
      tf.keras.metrics.Precision(name='precision'),
      tf.keras.metrics.Recall(name='recall'),
]

batch_values = [32, 38, 64, 128, 256, 512]

for batch in batch_values:
  model: tf.keras.Model = build_model()

  print(f"Try batch size: {batch}")

  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=ADAM_LEARNING_RATE, epsilon=ADAM_EPSILON),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=METRICS)
    
  model.fit(X_train, y_train, epochs=EPOCHS, batch_size=batch)
  model.evaluate(X_test, y_test)

Try batch size: 512
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


# Evaluate

After training the model and evaluation the model for different batch sizes we come up with the following results. 


```
    Try batch size: 32
    accuracy: 0.9117 - precision: 0.9247 - recall: 0.8959
    Try batch size: 48
    accuracy: 0.9221 - precision: 0.9399 - recall: 0.9011
Try batch size: 64
accuracy: 0.9227 - precision: 0.9168 - recall: 0.9295
    Try batch size: 128
    accuracy: 0.8965 - precision: 0.8673 - recall: 0.9370
    Try batch size: 256
    accuracy: 0.9189 - precision: 0.9068 - recall: 0.9333
    Try batch size: 512
    accuracy: 0.8771 - precision: 0.8654 - recall: 0.8940
```

We managed to reach the highest score with batch size 64. Therefore we use this for the final training.