# Book Review Sentiment Model

This notebook consists about building sentiment analysis model from Book Review comments. All datasets was scraped from Goodreads on July 2024. Goodreads was choosen because it may have good word and labeling quality. Scraped data format is line json (`.ljson`) which is a single of datum is in one line with format json. The reason I use this format is scalability issue. Scraping method is explained in `scrape.py`.

## Requirements

To run this notebook, ensure that you have installed below dependencies:
1. Tensorflow
2. Keras
3. Python
4. Numpy
5. Nltk
6. Matplotlib
7. Sklearn

In [1]:
import json
import re

import keras
import tensorflow as tf
import numpy as np
import matplotlib as mt
import matplotlib.pyplot as plt
import sklearn
import nltk

from datetime import datetime
from keras import losses
from keras import optimizers
from sklearn.metrics import accuracy_score, ConfusionMatrixDisplay, confusion_matrix


2024-07-31 15:22:58.040620: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-31 15:22:58.323862: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Below results show the version of the library that I used

In [2]:
print(tf.__version__)
print(np.version.full_version)
print(mt.__version__)
print(sklearn.__version__)
print(nltk.__version__)

2.16.1
1.26.4
3.9.0
1.5.0
3.8.1


Below is the constant of this notebook that I used for generating model

In [3]:
LANGUAGE = "english"
WORKER_NUMBER = 16

LEARNING_EPOCH = 25
PATIENCE = 5

## Text Preprocessing

The first step of creating sentiment model is text preprocessing. In this step, I remove any unnecessary words, such as 'show more' and punctuations in text. I also do case folding to lowercase in this step. After text has been formatted, I stem every word using `nltk` tools. 

In [4]:
BANNED_KEYWORDS = [
    "This entire review has been hidden because of spoilers",
    "hele kz kardei yok mu",
    "ee bu hemen bitti",
]
ENGLISH_THRESHOLD = 0.5
corpus = []
regex = r'[^a-zA-Z0-9\- \n"\']+'

### Text Cleaning

Below code shows text cleaning process and word tokenize. We save the result at `comments_cleaned.ljson` as cache.

In [5]:
nltk.download('words')
nltk.download('stopwords')

[nltk_data] Downloading package words to /home/miawheker/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/miawheker/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [6]:
stopwords = list(nltk.corpus.stopwords.words(LANGUAGE))
english_words_data = set(w.lower() for w in nltk.corpus.words.words())

In [7]:
with open("datasets/comments_cleaned.ljson", "w") as fw:
    with open("datasets/comments.ljson") as fr:
        for line in fr:
            data = json.loads(line)

            # Skip if the comment contains banned keywords
            if any(keyword in data["text"] for keyword in BANNED_KEYWORDS):
                continue

            result = []
            for sentence in nltk.sent_tokenize(data["text"]):
                # Case folding
                sentence = sentence.lower()
                words = sentence.split()

                # Stopword removal
                sentence = " ".join([word for word in words if word not in stopwords])

                # Remove special chars
                sentence = re.sub(regex, '', sentence)
                sentence = sentence.replace("-", " ")
                sentence = sentence.replace("\n", " ")
                sentence = sentence.replace("\"", "")
                sentence = re.sub(r'\bhttp[a-z0-9]+\b', '', sentence)
                sentence = re.sub(r'\b(img|src)[a-z0-9]*\b', '', sentence)
                sentence = re.sub(r'\s{2,}', ' ', sentence)

                # Remove unnecessary words
                sentence = sentence.replace("- - - - show more", "")
                sentence = sentence.replace("show more", "")
                sentence = sentence.replace("show less", "")

                # Remove whitespaces
                sentence = sentence.strip()

                if len(sentence) < 10:
                    continue
                
                words = sentence.split()
                
                # Skip if the sentence is too short
                wordlen = len(words)
                if wordlen < 3:
                    continue

                english_cnt = 0
                for word in words:
                    if word in english_words_data:
                        english_cnt += 1
                
                # Skip if the sentence contains too many non-english words
                if english_cnt < ENGLISH_THRESHOLD * wordlen:
                    continue

                result.append(sentence)

            # Change rating to integer
            data["rating"] = int(data["rating"].split(" ")[1])
            data["text"] = result

            if len(data["text"]) == 0:
                continue

            fw.write(json.dumps(data))
            fw.write("\n")

            corpus.extend(data["text"])

Below code shows corpus enrichment that will used for word2vec datasets.

In [8]:
with open("datasets/corpus.txt", "r") as fr:
    for line in fr:
        if any(keyword in line for keyword in BANNED_KEYWORDS):
            continue

        if len(sentence) < 10:
            continue

        for sentence in nltk.sent_tokenize(line):
            # Stopword removal
            sentence = " ".join([word for word in sentence.split() if word not in stopwords])

            sentence = sentence.replace("\n", " ")
            sentence = re.sub(regex, '', line)
            sentence = sentence.strip()
            sentence = sentence.lower()

            sentence = sentence.replace("-", " ")
            sentence = sentence.replace("\"", "")
            sentence = re.sub(r'\bhttp[a-z0-9]+\b', '', sentence)
            sentence = re.sub(r'\b(img|src)[a-z0-9]*\b', '', sentence)
            sentence = re.sub(r'\s{2,}', ' ', sentence)

            sentence = sentence.replace("- - - - show more", "")
            sentence = sentence.replace("show more", "")
            sentence = sentence.replace("show less", "")

            if len(sentence) < 10:
                continue

            words = sentence.split()

            wordlen = len(words)
            if wordlen < 3:
                continue

            english_cnt = 0

            for word in words:
                if word in english_words_data:
                    english_cnt += 1

            if english_cnt < ENGLISH_THRESHOLD * wordlen:
                continue
            
            if sentence not in corpus:
                corpus.append(sentence)

with open("datasets/corpus_cleaned.txt", "w") as fw:
    fw.write("\n".join(corpus))

# Cleanup
corpus = []
stopwords = []
english_words = []

### Text Stemming

Below code shows stemming process on corpus and datasets. 

In [9]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

Below is process of stemming process on corpus.

In [10]:
with open("datasets/corpus_stemmed.txt", "w") as fw:
  with open("datasets/corpus_cleaned.txt", "r") as fr:
      for line in fr:
          words = nltk.word_tokenize(line, language=LANGUAGE)
          result = " ".join([stemmer.stem(word) for word in words])

          fw.write(result)
          fw.write("\n")

In [11]:
with open("datasets/comments_stemmed.ljson", "w") as fw:
  with open("datasets/comments_cleaned.ljson", "r") as fr:
      for line in fr:
          data = json.loads(line)
          result = []
          
          for sentence in data["text"]:
              words = nltk.word_tokenize(sentence, language=LANGUAGE)
              sentence = " ".join([stemmer.stem(word) for word in words])
              result.append(sentence)

          fw.write(json.dumps({
              "text": result,
              "rating": data["rating"]
          }))
          fw.write("\n")

In [12]:
# Cleanup
stemmer = None

## Feature Extraction

This section will explain about feature extraction process.

In [13]:
class SentenceIterator:
    """This class is used to stream all line over the file."""
    def __init__(self, filename):
        self.filename = filename

    def __iter__(self):
        with open(self.filename, "r") as f:
            for line in f:
                yield line.split()

class LineIterator:
    """This class is used to stream all line over the file."""
    def __init__(self, filename):
        self.filename = filename

    def generate(self):
        with open(self.filename, "r") as f:
            for line in f:
                yield line

### Number of Vocab

In this section, we try to calculate number of vocab that exist in our datasets.

In [14]:
vocab = set()

for sentence in SentenceIterator("datasets/corpus_stemmed.txt"):
    for word in sentence:
        vocab.add(word)

In [15]:
number_of_vocab = len(vocab)
number_of_vocab

83295

### Text Vectorization Layer

In this section, there is a function that we use to generate feature extraction layer.

In [16]:
def generate_text_extraction(output_mode, datasets="datasets/corpus_stemmed.txt",*, sparse=True, pad_to_max_tokens=True):
        it = LineIterator(datasets)
        ds = tf.data.Dataset.from_generator(it.generate, output_signature=tf.TensorSpec(shape=(), dtype=tf.string))

        layer = keras.layers.TextVectorization(
            max_tokens=number_of_vocab+1,
            output_mode=output_mode,
            split="whitespace",
            sparse=sparse,
            pad_to_max_tokens=pad_to_max_tokens,
            ngrams=1
        )
        
        with tf.device("CPU"):
            layer.adapt(ds)
        
        return layer

In [17]:
tfidf = generate_text_extraction("tf_idf", sparse=False)
tfidf

2024-07-31 15:47:14.310635: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-07-31 15:47:14.350672: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-07-31 15:47:14.361081: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-

<TextVectorization name=text_vectorization, built=False>

In [18]:
# Test the tfidf layer
tfidf("i love this book, love it")

<tf.Tensor: shape=(83296,), dtype=float32, numpy=
array([11.75045 ,  1.883836,  0.      , ...,  0.      ,  0.      ,
        0.      ], dtype=float32)>

In [19]:
bow = keras.layers.TextVectorization(
            max_tokens=number_of_vocab+1,
            output_mode="count",
            split="whitespace",
            sparse=False,
            vocabulary=[word for word in vocab],
            ngrams=1
        )
vocab = None

In [20]:
# Test the bow layer
bow("i love this book, love it")

<tf.Tensor: shape=(83296,), dtype=int64, numpy=array([1, 0, 0, ..., 0, 0, 0])>

## Dataset
In this section, we will learn about our dataset.

In [21]:
data_cnt = 0
rating = [0,0,0,0,0]

with open("datasets/comments_stemmed.ljson", "r") as fr:
    for line in fr:
        data_cnt += 1

        data = json.loads(line)
        rating[data["rating"] - 1] += 1

print("Number of data:", data_cnt)
print("Rating distribution:", rating)

Number of data: 46853
Rating distribution: [2272, 4629, 8451, 15807, 15694]


## Labeling and Data Split

In this section, we will label `comments_stemmed` dataset. We will use two scheme of based on rating. Sentiment classification will devide into 3 class, positive (index 0), neutral (index 1), and negative (index 2). The label will be a vector with three element with range value between 0 and 1. This label is a softmax of three class that we defined before.

The dataset will be splitted into 3 parts, training with proportion 70% of data, test with proportion 15% of data, and validation with proportion of 15% data.

This cells will classify the type of data, whether it is a training data, test data, or validation data randomly.

In [22]:
data_type = np.zeros((data_cnt,), dtype=np.int8)

for i in range(0, int(data_cnt * 0.70)):
    data_type[i] = 0

for i in range(int(data_cnt * 0.70), int(data_cnt * 0.85 + 1)):
    data_type[i] = 1

for i in range(int(data_cnt * 0.85 + 1), data_cnt):
    data_type[i] = 2

In [23]:
# Shuffle data
np.random.shuffle(data_type)
data_type

array([0, 0, 2, ..., 1, 1, 0], dtype=int8)

In [24]:
data_type_dist = np.bincount(data_type)

print("Data type distribution:", data_type_dist)

Data type distribution: [32797  7029  7027]


In [25]:
training_rating_dist = np.zeros((5,), dtype=np.int32)

idx = 0
with open("datasets/comments_stemmed.ljson", "r") as fr:
    for line in fr:
        data = json.loads(line)

        if data_type[idx] == 0:
            training_rating_dist[data["rating"] - 1] += 1
        
        idx += 1

In [26]:
print("Training rating distribution:", training_rating_dist)

Training rating distribution: [ 1558  3263  5886 10945 11145]


### Scheme 1

In this scheme, we will label data training with this rules:
1. Star 5 will be labelled as `[1, 0, 0]`
2. Star 4 will be labelled as `[1, 0, 0]`
3. Star 3 will be labelled as `[0, 1, 0]`
4. Star 2 will be labelled as `[0, 0, 1]`
5. Star 1 will be labelled as `[0, 0, 1]`

Those labels are based on the highest element value in resulted vector.

In [27]:
# Calclate data weight
high_dist = training_rating_dist[0] + training_rating_dist[1]
low_dist = training_rating_dist[3] + training_rating_dist[4]
center_dist = training_rating_dist[2]

max_dist = max(high_dist, low_dist, center_dist)

data_weight = np.ones((5,), dtype=np.float64) * max_dist
data_weight = data_weight / np.array([high_dist, high_dist, center_dist, low_dist, low_dist])
data_weight

array([7.1534018 , 3.4155685 , 1.89347604, 1.01827318, 1.        ])

In [28]:
idx = 0
with open("datasets/comments_stemmed.ljson", "r") as fr:
    with open("datasets/comments_labelled_s1_train.ljson", "w") as ftrain, \
         open("datasets/comments_labelled_s1_val.ljson", "w") as fval, \
         open("datasets/comments_labelled_s1_test.ljson", "w") as ftest:
        for line in fr:
            data = json.loads(line)
            y = []

            if data["rating"] == 1:
                y = [1, 0, 0]
            elif data["rating"] == 2:
                y = [1, 0, 0]
            elif data["rating"] == 3:
                y = [0, 1, 0]
            elif data["rating"] == 4:
                y = [0, 0, 1]
            elif data["rating"] == 5:
                y = [0, 0, 1]
                        
            write_data = {
                "X": " ".join(data["text"]),
                "y": y,
                "w": data_weight[data["rating"] - 1]
            }

            write_data = json.dumps(write_data)

            if data_type[idx] == 0:
                ftrain.write(write_data)
                ftrain.write("\n")
            elif data_type[idx] == 1:
                fval.write(write_data)
                fval.write("\n")
            elif data_type[idx] == 2:
                ftest.write(write_data)
                ftest.write("\n")
            
            idx += 1

### Scheme 2

In this scheme, we will label data training with this rules:
1. Star 5 will be labelled as `[1]`
2. Star 4 will be labelled as `[1]`
3. Star 3 will be labelled as `[1]`
4. Star 2 will be labelled as `[0]`
5. Star 1 will be labelled as `[0]`

Those labels are based on the highest element value in resulted vector.

In [29]:
# Calculate data weight
high_dist = training_rating_dist[0] + training_rating_dist[1] + training_rating_dist[2]
low_dist = training_rating_dist[3] + training_rating_dist[4]

max_dist = max(high_dist, low_dist)

data_weight = np.ones((5,), dtype=np.float64) * max_dist
data_weight = data_weight / np.array([high_dist, high_dist, high_dist, low_dist, low_dist])
data_weight

array([4.58203692, 4.58203692, 3.75297316, 1.        , 1.        ])

In [30]:
idx = 0
with open("datasets/comments_stemmed.ljson", "r") as fr:
    with open("datasets/comments_labelled_s2_train.ljson", "w") as ftrain, \
         open("datasets/comments_labelled_s2_val.ljson", "w") as fval, \
         open("datasets/comments_labelled_s2_test.ljson", "w") as ftest:
        for line in fr:
            data = json.loads(line)
            y = []

            if data["rating"] == 1:
                y = [1]
            elif data["rating"] == 2:
                y = [1]
            elif data["rating"] == 3:
                y = [1]
            elif data["rating"] == 4:
                y = [0]
            elif data["rating"] == 5:
                y = [0]
                        
            write_data = {
                "X": " ".join(data["text"]),
                "y": y,
                "w": data_weight[data["rating"] - 1]
            }

            write_data = json.dumps(write_data)

            if data_type[idx] == 0:
                ftrain.write(write_data)
                ftrain.write("\n")
            elif data_type[idx] == 1:
                fval.write(write_data)
                fval.write("\n")
            elif data_type[idx] == 2:
                ftest.write(write_data)
                ftest.write("\n")
            
            idx += 1

## Classification Model Training

In this section, we will explore about training model for sentiment analysis. The hyperparameters are below:
1. Word Model:
   We will use `word2vec_model` and `fasttext_model` based on model that we have trained before.
2. Classification Model:
   We will use one type of layer. 
3. Dataset Scheme:
   We will use two scheme that we have build before, that is `Sheme 1` and `Scheme 2`

Below cell is helper function to do our task:

In [36]:
class EmbedGenerator:
    def __init__(self, filename, word_model, repeat=1):
        self.filename = filename
        self.word_model = word_model
        self.repeat = repeat

    def generate(self):
        for _ in range(self.repeat):
          with open(self.filename, "r") as f:
              for line in f:
                  data = json.loads(line)
                  X = tf.convert_to_tensor([data["X"]], dtype=tf.string)
                  
                  y = np.array([data["y"]], dtype=np.float32)
                  w = np.array([data["w"]], dtype=np.float32)
                  
                  yield X, y, w

def train_model(*, word, classification, train_dataset_filename, validation_dataset_filename, name):
    train_gen = EmbedGenerator(train_dataset_filename, word, repeat=LEARNING_EPOCH)
    validation_gen = EmbedGenerator(validation_dataset_filename, word, repeat=LEARNING_EPOCH)

    train_ds = tf.data.Dataset.from_generator(train_gen.generate, output_signature=(
        tf.TensorSpec(shape=(None,), dtype=tf.string),
        tf.TensorSpec(shape=(None, 1), dtype=tf.float32),
        tf.TensorSpec(shape=(None,), dtype=tf.float64),
    ))
    validation_ds = tf.data.Dataset.from_generator(validation_gen.generate, output_signature=(
        tf.TensorSpec(shape=(None,), dtype=tf.string),
        tf.TensorSpec(shape=(None, 1), dtype=tf.float32),
        tf.TensorSpec(shape=(None,), dtype=tf.float64),
    ))

    callback = [
      keras.callbacks.EarlyStopping(
        monitor="val_loss",
        patience=5,
        restore_best_weights=True,
        min_delta=0.001,
      ),
      keras.callbacks.ModelCheckpoint(
        filepath=f"models/checkpoint/{name}_model_checkpoint.keras",
        save_best_only=True,
      ),
      keras.callbacks.TensorBoard(
        log_dir=f"logs/{name}_{datetime.now().strftime('%Y%m%d-%H%M%S')}",
      ),
    ]

    print("generate text extraction layer")
    extraction_layer = word

    print("generate model")
    if classification == "layer_1":
      model = keras.models.Sequential([
          keras.layers.Input(shape=(1,), dtype=tf.string),
          extraction_layer,
          keras.layers.Dense(64, activation="relu"),
          keras.layers.Dropout(0.3),
          keras.layers.Dense(64, activation="relu"),
          keras.layers.Dropout(0.3),
          keras.layers.Dense(64, activation="relu"),
          keras.layers.Dropout(0.3),
          keras.layers.Dense(1, activation="sigmoid"),
        ],
        name=name,
      )
    else:
      raise ValueError("Invalid classification")
    
    model.summary()
    model.compile(
      loss=losses.BinaryCrossentropy(),
      optimizer=optimizers.Adam(),
      metrics=[keras.metrics.BinaryAccuracy()],
    )

    history = model.fit(
      train_ds,
      epochs=LEARNING_EPOCH,
      callbacks=callback,
      validation_data=validation_ds,
      batch_size=32,
      steps_per_epoch=data_type_dist[0],
      validation_steps=data_type_dist[1],
    )

    model.save(f"models/{name}_model.keras")

    return model, history

In [34]:
word_model = [
  ("tfidf", tfidf),
  ("bow", bow),
]

classification_scheme = [
  ("layer1", "layer_1"), 
]

dataset = [
  ("schema1","datasets/comments_labelled_s1_train.ljson", "datasets/comments_labelled_s1_val.ljson", "datasets/comments_labelled_s1_test.ljson"),
  ("schema2","datasets/comments_labelled_s2_train.ljson", "datasets/comments_labelled_s2_val.ljson", "datasets/comments_labelled_s2_test.ljson"),
]

In [37]:
for dname, train, validation, test in dataset:
    for cname, classification in classification_scheme:
        for wname, wmodel in word_model:
            print(f"Training {dname}_{wname}_{cname} model")
            train_model(
                word=wmodel,
                classification=classification,
                train_dataset_filename=train,
                validation_dataset_filename=validation,
                name=f"{dname}_{wname}_{cname}",
            )
            print()

Training schema1_tfidf_layer1 model
generate text extraction layer
generate model


Epoch 1/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m261s[0m 8ms/step - binary_accuracy: 0.2736 - loss: 1.1508 - val_binary_accuracy: 0.3548 - val_loss: 1.0247
Epoch 2/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m264s[0m 8ms/step - binary_accuracy: 0.3411 - loss: 1.0185 - val_binary_accuracy: 0.3552 - val_loss: 1.0176
Epoch 3/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m267s[0m 8ms/step - binary_accuracy: 0.3488 - loss: 0.9720 - val_binary_accuracy: 0.3538 - val_loss: 1.0272
Epoch 4/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m268s[0m 8ms/step - binary_accuracy: 0.3550 - loss: 0.9215 - val_binary_accuracy: 0.3561 - val_loss: 1.0212
Epoch 5/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m279s[0m 9ms/step - binary_accuracy: 0.3564 - loss: 0.9033 - val_binary_accuracy: 0.3565 - val_loss: 1.0100
Epoch 6/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m235s[0m 7ms/ste

Epoch 1/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m262s[0m 8ms/step - binary_accuracy: 0.3024 - loss: 1.0953 - val_binary_accuracy: 0.3515 - val_loss: 0.9861
Epoch 2/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m261s[0m 8ms/step - binary_accuracy: 0.3452 - loss: 0.9862 - val_binary_accuracy: 0.3538 - val_loss: 1.0156
Epoch 3/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m261s[0m 8ms/step - binary_accuracy: 0.3515 - loss: 0.9476 - val_binary_accuracy: 0.3560 - val_loss: 1.0410
Epoch 4/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m262s[0m 8ms/step - binary_accuracy: 0.3550 - loss: 0.9150 - val_binary_accuracy: 0.3561 - val_loss: 1.0051
Epoch 5/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m263s[0m 8ms/step - binary_accuracy: 0.3590 - loss: 0.8884 - val_binary_accuracy: 0.3572 - val_loss: 1.0552
Epoch 6/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m264s[0m 8ms/ste

Epoch 1/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m270s[0m 8ms/step - binary_accuracy: 0.5831 - loss: 1.3610 - val_binary_accuracy: 0.7435 - val_loss: 1.1905
Epoch 2/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m269s[0m 8ms/step - binary_accuracy: 0.7182 - loss: 1.1591 - val_binary_accuracy: 0.7433 - val_loss: 1.1815
Epoch 3/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m271s[0m 8ms/step - binary_accuracy: 0.7470 - loss: 1.0906 - val_binary_accuracy: 0.7442 - val_loss: 1.2107
Epoch 4/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m271s[0m 8ms/step - binary_accuracy: 0.7613 - loss: 1.0158 - val_binary_accuracy: 0.7389 - val_loss: 1.2850
Epoch 5/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m272s[0m 8ms/step - binary_accuracy: 0.7699 - loss: 1.0058 - val_binary_accuracy: 0.7470 - val_loss: 1.2372
Epoch 6/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m272s[0m 8ms/ste

Epoch 1/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m275s[0m 8ms/step - binary_accuracy: 0.6157 - loss: 1.2889 - val_binary_accuracy: 0.7455 - val_loss: 1.2264
Epoch 2/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m274s[0m 8ms/step - binary_accuracy: 0.7176 - loss: 1.1050 - val_binary_accuracy: 0.7490 - val_loss: 1.1915
Epoch 3/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m275s[0m 8ms/step - binary_accuracy: 0.7500 - loss: 1.0321 - val_binary_accuracy: 0.7433 - val_loss: 1.1820
Epoch 4/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m278s[0m 8ms/step - binary_accuracy: 0.7597 - loss: 0.9733 - val_binary_accuracy: 0.7505 - val_loss: 1.2616
Epoch 5/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m279s[0m 9ms/step - binary_accuracy: 0.7705 - loss: 0.9178 - val_binary_accuracy: 0.7456 - val_loss: 1.3664
Epoch 6/25
[1m32797/32797[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m281s[0m 9ms/ste