# Neural Machine Translation with Attention

Advanced Learning Fall 2024.   
Last updated: 2025-01-12


For SUBMISSION:   

Please upload the complete and executed `ipynb` to your git repository. Verify that all of your output can be viewed directly from github, and provide a link to that git file below.

~~~
STUDENT ID: 208088815
~~~

~~~
STUDENT GIT LINK: MISSING
~~~
In Addition, don't forget to add your ID to the files, and upload to moodle the html version:    
  
`PS3_Attention_2024_ID_[208088815].html`   




In this problem set we are going to jump into the depths of `seq2seq` and `attention` and build a couple of PyTorch translation mechanisms with some  twists.     


*   Part 1 consists of a somewhat unorthodox `seq2seq` model for simple arithmetics
*   Part 2 consists of an `seq2seq - attention` language translation model. We will use it for Hebrew and English.  


---

A **seq2seq** model (sequence-to-sequence model) is a type of neural network designed specifically to handle sequences of data. The model converts input sequences into other sequences of data. This makes them particularly useful for tasks involving language, where the input and output are naturally sequences of words.

Here's a breakdown of how `seq2seq` models work:

* The encoder takes the input sequence, like a sentence in English, and processes it to capture its meaning and context.

* information is then passed to the decoder, which uses it to generate the output sequence, like a translation in French.

* Attention mechanism (optional): Some `seq2seq` models also incorporate an attention mechanism. This allows the decoder to focus on specific parts of the input sequence that are most relevant to generating the next element in the output sequence.

`seq2seq` models are used in many natural language processing (NLP) tasks.



imports: (feel free to add)

In [None]:
# from __future__ import unicode_literals, print_function, division
# from io import open
# import unicodedata
# Standard library imports
from __future__ import unicode_literals, print_function, division
from io import open
import os
import re
import random
import time
import math
import zipfile
import unicodedata

# PyTorch imports
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.utils.data import TensorDataset, DataLoader, RandomSampler

# NumPy
import numpy as np

# Matplotlib imports
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.switch_backend('agg')

# Keras imports
from keras.models import Sequential, Model
from keras.layers import Input, LSTM, Attention, Flatten, RepeatVector, TimeDistributed, Dense
from keras.callbacks import ModelCheckpoint

# TensorFlow Keras imports
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Input, Embedding, Dense, Bidirectional, Dropout, RepeatVector, TimeDistributed
from tensorflow.keras.callbacks import ModelCheckpoint

# Device setup for PyTorch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Part 1: Seq2Seq Arithmetic model

**Using RNN `seq2seq` model to "learn" simple arithmetics!**

> Given the string "54-7", the model should return a prediction: "47".  
> Given the string "10+20", the model should return a prediction: "30".


- Watch Lukas Biewald's short [video](https://youtu.be/MqugtGD605k?si=rAH34ZTJyYDj-XJ1) explaining `seq2seq` models and his toy application (somewhat outdated).
- You can find the code for his example [here](https://github.com/lukas/ml-class/blob/master/videos/seq2seq/train.py).    



1.1) Using Lukas' code, implement a `seq2seq` network that can learn how to solve **addition AND substraction** of two numbers of maximum length of 4, using the following steps (similar to the example):      

* Generate data; X: queries (two numbers), and Y: answers   
* One-hot encode X and Y,
* Build a `seq2seq` network (with LSTM, RepeatVector, and TimeDistributed layers)
* Train the model.
* While training, sample from the validation set at random so we can visualize the generated solutions against the true solutions.    

Notes:  
* The code in the example is quite old and based on Keras. You might have to adapt some of the code to overcome methods/code that is not supported anymore. Hint: for the evaluation part, review the type and format of the "correct" output - this will help you fix the unsupported "model.predict_classes".
* Please use the parameters in the code cell below to train the model.     
* Instead of using a `wandb.config` object, please use a simple dictionary instead.   
* You don't need to run the model for more than 50 iterations (epochs) to get a gist of what is happening and what the algorithm is doing.
* Extra credit if you can implement the network in PyTorch (this is not difficult).    
* Extra credit if you are able to significantly improve the model.

1.2).

a) Do you think this model performs well?  Why or why not?     
b) What are its limitations?   
c) What would you do to improve it?    
d) Can you apply an attention mechanism to this model? Why or why not?   

1.3).  

Add attention to the model. Evaluate the performance against the `seq2seq` you trained above. Which one is performing better?

1.4)

Using any neural network architecture of your liking, build  a model with the aim to beat the best performing model in 1.1 or 1.3. Compare your results in a meaningful way, and add a short explanation to why you think/thought your suggested network is better.

In [None]:
config = {}
config["training_size"] = 40000
config["digits"] = 4
config["hidden_size"] = 128
config["batch_size"] = 128
config["iterations"] = 50
chars = '0123456789-+ '

SOLUTION:

In [None]:
### MISSING SOLUTION
# Generationg the data using Lukas' code
# Defining a dictionary to store the model parameters
config_dict = {
    "training_size": 40000,
    "digits": 4,
    "hidden_size": 128,
    "batch_size": 128,
    "iterations" : 50,
    "chars" : '0123456789-+ '
}

class CharacterTable(object):
    """Given a set of characters:
    + Encode them to a one hot integer representation
    + Decode the one hot integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
    def __init__(self, chars):
        """Initialize character table.
        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One hot encode given string C.
        # Arguments
            num_rows: Number of rows in the returned one hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)


# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
# int is DIGITS.
maxlen = config_dict['digits'] + 1 + config_dict['digits']

# All the numbers, plus and minus signs and space for padding.
chars = config_dict["chars"]
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()
print('Generating data...')

while len(questions) < config_dict['training_size']:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))
                    for i in range(np.random.randint(1, config_dict['digits'] + 1))))
    a, b = f(), f()
    # Randomly choose addition or subtraction
    operation = np.random.choice(['+', '-'])

    # Skip any questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = (a, b, operation)
    if key in seen:
        continue
    seen.add(key)

    # Create the question string
    q = f"{a}{operation}{b}"
    query = q + ' ' * (maxlen - len(q))
    # Compute the answer
    ans = str(a + b) if operation == '+' else str(a - b)
    ans += ' ' * (config_dict['digits'] + 1 - len(ans))

    questions.append(query)
    expected.append(ans)

print('Total addition questions:', len(questions))

print('Vectorization...')
x = np.zeros((len(questions), maxlen, len(config_dict['chars'])), dtype=bool)
y = np.zeros((len(questions), config_dict['digits'] + 1, len(config_dict['chars'])), dtype=bool)

for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, maxlen)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, config_dict['digits'] + 1)


# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

Generating data...
Total addition questions: 40000
Vectorization...


In [None]:
# Building a seq2seq network
# Defining the model
model1 = Sequential()
model1.add(LSTM(config_dict['hidden_size'] , input_shape=(maxlen, len(chars))))
model1.add(RepeatVector(config_dict['digits'] + 1))
model1.add(LSTM(config_dict['hidden_size'], return_sequences=True))
model1.add(TimeDistributed(Dense(len(chars), activation='softmax')))
model1.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model1.summary()

# Train the model each generation and show predictions against the validation
# dataset.
checkpoint = ModelCheckpoint('best_model.h5',
                             monitor='val_loss',
                             save_best_only=True,
                             mode='min',
                             verbose=1)
for iteration in range(config_dict['iterations']):
    print()
    print('-' * 50)
    print('Iteration', iteration)

    model1.fit(x_train, y_train,
              batch_size=config_dict['batch_size'],
              epochs=1,
              validation_data=(x_val, y_val),
              callbacks=[checkpoint])
    # Select 10 samples from the validation set at random so we can visualize
    # errors.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model1.predict(rowx, verbose=0)
        preds = np.argmax(preds, axis=-1)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print('☑', end=' ')
        else:
            print('☒', end=' ')
        print(guess)

  super().__init__(**kwargs)



--------------------------------------------------
Iteration 0
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.3061 - loss: 2.0461
Epoch 1: val_loss improved from inf to 1.66067, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 9ms/step - accuracy: 0.3071 - loss: 2.0416 - val_accuracy: 0.3936 - val_loss: 1.6607
Q 42-599    T -557  ☒ -233 
Q 5530+105  T 5635  ☒ 133  
Q 89-41     T 48    ☒ -3   
Q 4-832     T -828  ☒ -233 
Q 442-8     T 434   ☒ 13   
Q 8967-4795 T 4172  ☒ 1333 
Q 852+7     T 859   ☒ 133  
Q 8614-49   T 8565  ☒ 133  
Q 1420+6    T 1426  ☒ 133  
Q 9154-9209 T -55   ☒ 1333 

--------------------------------------------------
Iteration 1
[1m277/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.3984 - loss: 1.6477
Epoch 1: val_loss improved from 1.66067 to 1.57283, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.3985 - loss: 1.6474 - val_accuracy: 0.4211 - val_loss: 1.5728
Q 1+7816    T 7817  ☒ 107  
Q 881-825   T 56    ☒ -84  
Q 1564+89   T 1653  ☒ 1445 
Q 29-542    T -513  ☒ -433 
Q 8234+20   T 8254  ☒ 1222 
Q 370-38    T 332   ☒ 322  
Q 0+548     T 548   ☒ 144  
Q 2048-46   T 2002  ☒ 324  
Q 960-0     T 960   ☒ 800  
Q 1-235     T -234  ☒ -233 

--------------------------------------------------
Iteration 2
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.4200 - loss: 1.5790
Epoch 1: val_loss improved from 1.57283 to 1.51819, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.4201 - loss: 1.5788 - val_accuracy: 0.4399 - val_loss: 1.5182
Q 638+898   T 1536  ☒ 8088 
Q 7516-4    T 7512  ☒ 6665 
Q 8+432     T 440   ☒ 333  
Q 2073-9952 T -7879 ☒ -3229
Q 2184-2361 T -177  ☒ -319 
Q 8201-3    T 8198  ☒ 8383 
Q 5727+30   T 5757  ☒ 7773 
Q 985-1     T 984   ☒ 888  
Q 1089+59   T 1148  ☒ 9999 
Q 7-5289    T -5282 ☒ -7739

--------------------------------------------------
Iteration 3
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.4396 - loss: 1.5189
Epoch 1: val_loss improved from 1.51819 to 1.46746, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.4396 - loss: 1.5187 - val_accuracy: 0.4572 - val_loss: 1.4675
Q 626-249   T 377   ☒ 226  
Q 4616-0    T 4616  ☒ 6666 
Q 45+45     T 90    ☒ 55   
Q 2499-850  T 1649  ☒ -416 
Q 30+824    T 854   ☒ 331  
Q 227+319   T 546   ☒ 221  
Q 2627-107  T 2520  ☒ 2266 
Q 77-847    T -770  ☒ -766 
Q 56-25     T 31    ☒ 21   
Q 1-890     T -889  ☒ -999 

--------------------------------------------------
Iteration 4
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 8ms/step - accuracy: 0.4555 - loss: 1.4657
Epoch 1: val_loss improved from 1.46746 to 1.40703, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.4556 - loss: 1.4654 - val_accuracy: 0.4795 - val_loss: 1.4070
Q 74+94     T 168   ☒ 148  
Q 77+5183   T 5260  ☒ 7782 
Q 5546+885  T 6431  ☒ 5592 
Q 320+561   T 881   ☒ 322  
Q 78+12     T 90    ☒ 72   
Q 4756+39   T 4795  ☒ 5328 
Q 476+8     T 484   ☒ 778  
Q 13+34     T 47    ☒ 33   
Q 281-499   T -218  ☒ -15  
Q 9+1120    T 1129  ☒ 1222 

--------------------------------------------------
Iteration 5
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.4799 - loss: 1.4008
Epoch 1: val_loss improved from 1.40703 to 1.34149, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.4799 - loss: 1.4007 - val_accuracy: 0.4995 - val_loss: 1.3415
Q 51+35     T 86    ☒ 50   
Q 424+5143  T 5567  ☒ 1039 
Q 8050+6470 T 14520 ☒ 10533
Q 9091-5232 T 3859  ☒ 930  
Q 171-7346  T -7175 ☒ -7452
Q 206-9     T 197   ☒ 103  
Q 28-4      T 24    ☒ 37   
Q 1+6918    T 6919  ☒ 6000 
Q 360+1332  T 1692  ☒ 3337 
Q 917-64    T 853   ☒ 903  

--------------------------------------------------
Iteration 6
[1m275/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.5030 - loss: 1.3379
Epoch 1: val_loss improved from 1.34149 to 1.27785, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.5032 - loss: 1.3374 - val_accuracy: 0.5265 - val_loss: 1.2779
Q 8+7310    T 7318  ☒ 7335 
Q 3330-661  T 2669  ☒ 3651 
Q 974+295   T 1269  ☒ 1453 
Q 69-7409   T -7340 ☒ -7005
Q 511+8     T 519   ☒ 552  
Q 59-26     T 33    ☒ 21   
Q 923+5004  T 5927  ☒ 1033 
Q 171-7346  T -7175 ☒ -7555
Q 8355+6    T 8361  ☒ 8451 
Q 6293+131  T 6424  ☒ 6281 

--------------------------------------------------
Iteration 7
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5295 - loss: 1.2726
Epoch 1: val_loss improved from 1.27785 to 1.22154, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.5295 - loss: 1.2725 - val_accuracy: 0.5437 - val_loss: 1.2215
Q 940+380   T 1320  ☒ 1004 
Q 3201+640  T 3841  ☒ 3055 
Q 8678-77   T 8601  ☒ 8555 
Q 52+549    T 601   ☒ 500  
Q 586+457   T 1043  ☒ 1104 
Q 1353+42   T 1395  ☒ 1554 
Q 6001-21   T 5980  ☒ 6085 
Q 9327+195  T 9522  ☒ 9305 
Q 676-9135  T -8459 ☒ -6355
Q 485+760   T 1245  ☒ 1004 

--------------------------------------------------
Iteration 8
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 11ms/step - accuracy: 0.5502 - loss: 1.2108
Epoch 1: val_loss improved from 1.22154 to 1.17168, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 12ms/step - accuracy: 0.5503 - loss: 1.2106 - val_accuracy: 0.5626 - val_loss: 1.1717
Q 732+6     T 738   ☒ 734  
Q 700+93    T 793   ☒ 809  
Q 8724+5    T 8729  ☒ 8720 
Q 9+3529    T 3538  ☒ 3569 
Q 3688-666  T 3022  ☒ 259  
Q 9813-307  T 9506  ☒ 8000 
Q 58+38     T 96    ☒ 11   
Q 3940+72   T 4012  ☒ 3999 
Q 95+531    T 626   ☒ 519  
Q 867+7281  T 8148  ☒ 1410 

--------------------------------------------------
Iteration 9
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.5687 - loss: 1.1583
Epoch 1: val_loss improved from 1.17168 to 1.12789, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.5687 - loss: 1.1582 - val_accuracy: 0.5759 - val_loss: 1.1279
Q 149+3     T 152   ☒ 147  
Q 4-7641    T -7637 ☒ -7614
Q 92+23     T 115   ☒ 127  
Q 79+5251   T 5330  ☒ 5222 
Q 84+4      T 88    ☒ 96   
Q 1320-4    T 1316  ☒ 1112 
Q 602+858   T 1460  ☒ 1519 
Q 6756-58   T 6698  ☒ 6685 
Q 408-63    T 345   ☒ 352  
Q 8-7330    T -7322 ☒ -7319

--------------------------------------------------
Iteration 10
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.5807 - loss: 1.1229
Epoch 1: val_loss improved from 1.12789 to 1.08987, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.5808 - loss: 1.1228 - val_accuracy: 0.5913 - val_loss: 1.0899
Q 5530+0    T 5530  ☒ 5459 
Q 233+18    T 251   ☒ 239  
Q 4647-34   T 4613  ☒ 4643 
Q 500-9293  T -8793 ☒ -8900
Q 145+179   T 324   ☒ 249  
Q 8295-3    T 8292  ☒ 8299 
Q 45+34     T 79    ☒ 92   
Q 55-2669   T -2614 ☒ -2581
Q 8+240     T 248   ☒ 249  
Q 48+448    T 496   ☒ 450  

--------------------------------------------------
Iteration 11
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.5945 - loss: 1.0879
Epoch 1: val_loss improved from 1.08987 to 1.07116, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.5945 - loss: 1.0879 - val_accuracy: 0.5952 - val_loss: 1.0712
Q 792-23    T 769   ☒ 793  
Q 5394-5253 T 141   ☒ -11  
Q 94+72     T 166   ☒ 171  
Q 22-95     T -73   ☒ -55  
Q 383-2459  T -2076 ☒ -2955
Q 653-74    T 579   ☒ 685  
Q 702+8527  T 9229  ☒ 8533 
Q 12-303    T -291  ☒ -309 
Q 3133+5056 T 8189  ☒ 1055 
Q 3-326     T -323  ☒ -325 

--------------------------------------------------
Iteration 12
[1m275/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6045 - loss: 1.0544
Epoch 1: val_loss improved from 1.07116 to 1.03376, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6046 - loss: 1.0542 - val_accuracy: 0.6069 - val_loss: 1.0338
Q 8355+200  T 8555  ☒ 8413 
Q 1-9809    T -9808 ☒ -9916
Q 128-17    T 111   ☒ 103  
Q 9413+970  T 10383 ☒ 10272
Q 88+929    T 1017  ☒ 901  
Q 974+84    T 1058  ☒ 104  
Q 93+3      T 96    ☒ 94   
Q 84+957    T 1041  ☒ 1040 
Q 43+49     T 92    ☒ 90   
Q 6376-6    T 6370  ☒ 6367 

--------------------------------------------------
Iteration 13
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6156 - loss: 1.0269
Epoch 1: val_loss improved from 1.03376 to 1.02158, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6156 - loss: 1.0268 - val_accuracy: 0.6136 - val_loss: 1.0216
Q 73-0      T 73    ☒ 61   
Q 7880-178  T 7702  ☒ 7869 
Q 7-2637    T -2630 ☒ -2628
Q 643-8     T 635   ☒ 636  
Q 4264+6505 T 10769 ☒ 11166
Q 5339-5    T 5334  ☒ 5327 
Q 9-3813    T -3804 ☒ -3827
Q 8919+560  T 9479  ☒ 9066 
Q 7-4535    T -4528 ☒ -4447
Q 3-870     T -867  ☒ -876 

--------------------------------------------------
Iteration 14
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.6242 - loss: 1.0057
Epoch 1: val_loss improved from 1.02158 to 1.00053, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.6242 - loss: 1.0056 - val_accuracy: 0.6229 - val_loss: 1.0005
Q 71-58     T 13    ☒ 1    
Q 0-2923    T -2923 ☒ -2929
Q 648-230   T 418   ☒ 266  
Q 2184-2361 T -177  ☒ 116  
Q 48+211    T 259   ☑ 259  
Q 3998-9666 T -5668 ☒ -5995
Q 9010-293  T 8717  ☒ 8999 
Q 372-995   T -623  ☒ -456 
Q 3403-51   T 3352  ☒ 3177 
Q 9064+5915 T 14979 ☒ 13433

--------------------------------------------------
Iteration 15
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 9ms/step - accuracy: 0.6326 - loss: 0.9830
Epoch 1: val_loss improved from 1.00053 to 0.97836, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.6327 - loss: 0.9828 - val_accuracy: 0.6297 - val_loss: 0.9784
Q 8-245     T -237  ☒ -248 
Q 640-700   T -60   ☒ -12  
Q 6081-7    T 6074  ☒ 6085 
Q 200+424   T 624   ☒ 755  
Q 45+45     T 90    ☑ 90   
Q 3039+3282 T 6321  ☒ 5555 
Q 0-47      T -47   ☑ -47  
Q 3952-6029 T -2077 ☒ -1565
Q 7-8291    T -8284 ☒ -8276
Q 9-20      T -11   ☑ -11  

--------------------------------------------------
Iteration 16
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6393 - loss: 0.9644
Epoch 1: val_loss improved from 0.97836 to 0.95815, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6394 - loss: 0.9642 - val_accuracy: 0.6394 - val_loss: 0.9581
Q 92-72     T 20    ☒ 13   
Q 490+629   T 1119  ☒ 1174 
Q 2-3321    T -3319 ☒ -3317
Q 541+39    T 580   ☑ 580  
Q 61-6008   T -5947 ☒ -5944
Q 89-24     T 65    ☒ 70   
Q 9-6839    T -6830 ☒ -6829
Q 84+335    T 419   ☒ 429  
Q 60-6015   T -5955 ☒ -6944
Q 26-145    T -119  ☒ -121 

--------------------------------------------------
Iteration 17
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6521 - loss: 0.9340
Epoch 1: val_loss improved from 0.95815 to 0.94180, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6521 - loss: 0.9340 - val_accuracy: 0.6434 - val_loss: 0.9418
Q 4-63      T -59   ☒ -60  
Q 32+5406   T 5438  ☒ 5499 
Q 35+49     T 84    ☒ 88   
Q 12+1789   T 1801  ☒ 1804 
Q 8963+490  T 9453  ☒ 1406 
Q 5+588     T 593   ☒ 592  
Q 9122+4    T 9126  ☒ 9114 
Q 80-9993   T -9913 ☒ -9992
Q 131+4115  T 4246  ☒ 4484 
Q 65+97     T 162   ☒ 154  

--------------------------------------------------
Iteration 18
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6571 - loss: 0.9163
Epoch 1: val_loss improved from 0.94180 to 0.92372, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6571 - loss: 0.9163 - val_accuracy: 0.6520 - val_loss: 0.9237
Q 958-2     T 956   ☒ 957  
Q 3538+0    T 3538  ☑ 3538 
Q 8963+490  T 9453  ☒ 9307 
Q 676-75    T 601   ☒ 609  
Q 19-8      T 11    ☒ 15   
Q 572-4     T 568   ☒ 573  
Q 425+2     T 427   ☒ 425  
Q 4809-788  T 4021  ☒ 4119 
Q 8+133     T 141   ☒ 139  
Q 8-5478    T -5470 ☒ -5479

--------------------------------------------------
Iteration 19
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 10ms/step - accuracy: 0.6648 - loss: 0.8993
Epoch 1: val_loss improved from 0.92372 to 0.91702, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 11ms/step - accuracy: 0.6648 - loss: 0.8994 - val_accuracy: 0.6532 - val_loss: 0.9170
Q 46+9461   T 9507  ☒ 9472 
Q 4811-6875 T -2064 ☒ -144 
Q 766-0     T 766   ☑ 766  
Q 229-4     T 225   ☒ 228  
Q 92+700    T 792   ☒ 807  
Q 8125+501  T 8626  ☒ 8622 
Q 779+816   T 1595  ☒ 151  
Q 933+10    T 943   ☒ 938  
Q 1069+286  T 1355  ☒ 1357 
Q 3+95      T 98    ☒ 90   

--------------------------------------------------
Iteration 20
[1m275/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6724 - loss: 0.8795
Epoch 1: val_loss improved from 0.91702 to 0.89854, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6723 - loss: 0.8795 - val_accuracy: 0.6585 - val_loss: 0.8985
Q 659-49    T 610   ☒ 541  
Q 0-2966    T -2966 ☑ -2966
Q 6756-58   T 6698  ☒ 6500 
Q 125+896   T 1021  ☒ 1016 
Q 228+697   T 925   ☒ 906  
Q 6054+320  T 6374  ☒ 6447 
Q 64-30     T 34    ☒ 45   
Q 2+5693    T 5695  ☒ 5694 
Q 812-8420  T -7608 ☒ -7436
Q 8963+490  T 9453  ☒ 9307 

--------------------------------------------------
Iteration 21
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6778 - loss: 0.8660
Epoch 1: val_loss did not improve from 0.89854
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6778 - loss: 0.8660 - val_accuracy: 0.6588 - val_loss: 0.9046
Q 9992-0    T 9992  ☒ 9991 
Q 1+6882    T 6883  ☒ 6892 
Q 9+1120    T 1129  ☑ 1129 
Q 57-33     T 24    ☑ 24   
Q 34-6    



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.6834 - loss: 0.8538 - val_accuracy: 0.6635 - val_loss: 0.8762
Q 115+2     T 117   ☒ 115  
Q 258-705   T -447  ☒ -457 
Q 9-6839    T -6830 ☒ -6827
Q 9921-81   T 9840  ☒ 9853 
Q 91+53     T 144   ☒ 148  
Q 0-389     T -389  ☒ -387 
Q 2353-8270 T -5917 ☒ -6530
Q 862-7077  T -6215 ☒ -6575
Q 2-88      T -86   ☒ -87  
Q 819-9     T 810   ☒ 809  

--------------------------------------------------
Iteration 23
[1m275/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.6879 - loss: 0.8382
Epoch 1: val_loss improved from 0.87622 to 0.86179, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6879 - loss: 0.8382 - val_accuracy: 0.6709 - val_loss: 0.8618
Q 28-903    T -875  ☒ -892 
Q 1838+3146 T 4984  ☒ 4045 
Q 77+5183   T 5260  ☒ 5225 
Q 586-786   T -200  ☑ -200 
Q 749+39    T 788   ☒ 783  
Q 9+1120    T 1129  ☒ 1120 
Q 3+587     T 590   ☒ 582  
Q 7388+8    T 7396  ☒ 7389 
Q 7893-204  T 7689  ☒ 7733 
Q 984-604   T 380   ☒ 362  

--------------------------------------------------
Iteration 24
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.6919 - loss: 0.8245
Epoch 1: val_loss improved from 0.86179 to 0.85826, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.6919 - loss: 0.8245 - val_accuracy: 0.6713 - val_loss: 0.8583
Q 4726+756  T 5482  ☒ 5240 
Q 4890+1    T 4891  ☒ 4890 
Q 7906-8    T 7898  ☒ 7910 
Q 9672-76   T 9596  ☒ 9600 
Q 921+3907  T 4828  ☒ 4755 
Q 8+89      T 97    ☑ 97   
Q 4-640     T -636  ☒ -639 
Q 28-537    T -509  ☒ -523 
Q 1-1940    T -1939 ☒ -1923
Q 343+626   T 969   ☒ 909  

--------------------------------------------------
Iteration 25
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.6975 - loss: 0.8152
Epoch 1: val_loss improved from 0.85826 to 0.85294, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.6975 - loss: 0.8152 - val_accuracy: 0.6770 - val_loss: 0.8529
Q 7622+5909 T 13531 ☒ 13833
Q 728-77    T 651   ☒ 663  
Q 91+53     T 144   ☒ 148  
Q 398-2300  T -1902 ☒ -1891
Q 308+32    T 340   ☒ 359  
Q 947+5     T 952   ☒ 951  
Q 34+2      T 36    ☒ 37   
Q 78-73     T 5     ☒ 1    
Q 475-251   T 224   ☒ 232  
Q 5727+30   T 5757  ☒ 5763 

--------------------------------------------------
Iteration 26
[1m275/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7045 - loss: 0.7973
Epoch 1: val_loss improved from 0.85294 to 0.83746, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7044 - loss: 0.7974 - val_accuracy: 0.6813 - val_loss: 0.8375
Q 73+9048   T 9121  ☒ 9037 
Q 7412-72   T 7340  ☒ 7347 
Q 4790-67   T 4723  ☒ 4794 
Q 92+16     T 108   ☒ 102  
Q 96+173    T 269   ☒ 262  
Q 46-33     T 13    ☒ 1    
Q 6704+4    T 6708  ☒ 6712 
Q 64-30     T 34    ☒ 27   
Q 14+717    T 731   ☒ 734  
Q 28+63     T 91    ☒ 90   

--------------------------------------------------
Iteration 27
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7117 - loss: 0.7823
Epoch 1: val_loss improved from 0.83746 to 0.83437, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7116 - loss: 0.7824 - val_accuracy: 0.6818 - val_loss: 0.8344
Q 7412-72   T 7340  ☑ 7340 
Q 0-9224    T -9224 ☒ -9212
Q 56-811    T -755  ☒ -743 
Q 1+136     T 137   ☑ 137  
Q 57-4934   T -4877 ☒ -4888
Q 49-3      T 46    ☑ 46   
Q 3694-5200 T -1506 ☒ -17  
Q 846-20    T 826   ☒ 834  
Q 5+3713    T 3718  ☒ 3723 
Q 6598+9    T 6607  ☒ 6691 

--------------------------------------------------
Iteration 28
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7132 - loss: 0.7739
Epoch 1: val_loss improved from 0.83437 to 0.82425, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7131 - loss: 0.7739 - val_accuracy: 0.6860 - val_loss: 0.8243
Q 27+7012   T 7039  ☒ 7057 
Q 526+4022  T 4548  ☒ 4869 
Q 5366+383  T 5749  ☒ 5950 
Q 752+3180  T 3932  ☒ 4980 
Q 4+329     T 333   ☒ 334  
Q 391+35    T 426   ☒ 429  
Q 161+363   T 524   ☒ 505  
Q 9898+9483 T 19381 ☒ 18876
Q 0+623     T 623   ☑ 623  
Q 6925+340  T 7265  ☒ 7006 

--------------------------------------------------
Iteration 29
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 9ms/step - accuracy: 0.7165 - loss: 0.7639
Epoch 1: val_loss improved from 0.82425 to 0.82263, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.7165 - loss: 0.7639 - val_accuracy: 0.6878 - val_loss: 0.8226
Q 71+481    T 552   ☒ 529  
Q 4254+968  T 5222  ☒ 5201 
Q 712+883   T 1595  ☒ 1692 
Q 254+9     T 263   ☒ 261  
Q 48-49     T -1    ☑ -1   
Q 705+8227  T 8932  ☒ 8934 
Q 6+6       T 12    ☑ 12   
Q 685-2     T 683   ☒ 684  
Q 464-811   T -347  ☒ -358 
Q 44-89     T -45   ☒ -53  

--------------------------------------------------
Iteration 30
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7190 - loss: 0.7560
Epoch 1: val_loss improved from 0.82263 to 0.80618, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7190 - loss: 0.7560 - val_accuracy: 0.6937 - val_loss: 0.8062
Q 51-3777   T -3726 ☒ -3729
Q 8225+4    T 8229  ☒ 8228 
Q 38+20     T 58    ☒ 61   
Q 194+3413  T 3607  ☒ 3527 
Q 9+8601    T 8610  ☒ 8601 
Q 855-893   T -38   ☒ -17  
Q 906-159   T 747   ☒ 626  
Q 545+775   T 1320  ☒ 1323 
Q 3-569     T -566  ☒ -564 
Q 66+9      T 75    ☑ 75   

--------------------------------------------------
Iteration 31
[1m277/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7254 - loss: 0.7389
Epoch 1: val_loss improved from 0.80618 to 0.80164, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7254 - loss: 0.7389 - val_accuracy: 0.6936 - val_loss: 0.8016
Q 5-72      T -67   ☒ -69  
Q 251-5236  T -4985 ☒ -5972
Q 9+1120    T 1129  ☒ 1120 
Q 1-647     T -646  ☒ -645 
Q 9-1450    T -1441 ☒ -1446
Q 2-516     T -514  ☒ -515 
Q 413+38    T 451   ☒ 459  
Q 1+988     T 989   ☒ 987  
Q 35+9      T 44    ☒ 43   
Q 940+6475  T 7415  ☒ 7020 

--------------------------------------------------
Iteration 32
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7306 - loss: 0.7270
Epoch 1: val_loss improved from 0.80164 to 0.78954, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7306 - loss: 0.7270 - val_accuracy: 0.7004 - val_loss: 0.7895
Q 18-91     T -73   ☑ -73  
Q 9851+853  T 10704 ☒ 10663
Q 36+9182   T 9218  ☒ 9212 
Q 4593-239  T 4354  ☒ 4166 
Q 9+38      T 47    ☑ 47   
Q 201-4546  T -4345 ☒ -4332
Q 22-7424   T -7402 ☒ -7411
Q 3694-5200 T -1506 ☒ -144 
Q 7349+709  T 8058  ☒ 8022 
Q 190-6015  T -5825 ☒ -5990

--------------------------------------------------
Iteration 33
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 9ms/step - accuracy: 0.7349 - loss: 0.7140
Epoch 1: val_loss did not improve from 0.78954
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.7348 - loss: 0.7141 - val_accuracy: 0.6975 - val_loss: 0.7965
Q 1-342     T -341  ☑ -341 
Q 544-4     T 540   ☒ 541  
Q 743-95    T 648   ☒ 656  
Q 22+323    T 345   ☒ 353  
Q 32+71  



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7375 - loss: 0.7056 - val_accuracy: 0.7033 - val_loss: 0.7814
Q 84+4      T 88    ☑ 88   
Q 550-9400  T -8850 ☒ -8905
Q 1673+24   T 1697  ☒ 1790 
Q 12-303    T -291  ☒ -299 
Q 2+819     T 821   ☒ 820  
Q 4+3024    T 3028  ☒ 3037 
Q 0-337     T -337  ☒ -336 
Q 8517+199  T 8716  ☒ 8630 
Q 56+2434   T 2490  ☒ 2498 
Q 2-516     T -514  ☒ -515 

--------------------------------------------------
Iteration 35
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7425 - loss: 0.6941
Epoch 1: val_loss improved from 0.78141 to 0.77167, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7424 - loss: 0.6941 - val_accuracy: 0.7041 - val_loss: 0.7717
Q 5203+774  T 5977  ☒ 5911 
Q 36+7057   T 7093  ☒ 7084 
Q 91-3358   T -3267 ☒ -3273
Q 49-3084   T -3035 ☒ -3019
Q 230-6443  T -6213 ☒ -6133
Q 2378+800  T 3178  ☒ 3190 
Q 10-535    T -525  ☒ -522 
Q 8+9702    T 9710  ☒ 9706 
Q 552-6612  T -6060 ☒ -6036
Q 759-3     T 756   ☒ 754  

--------------------------------------------------
Iteration 36
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7451 - loss: 0.6870
Epoch 1: val_loss improved from 0.77167 to 0.76440, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7450 - loss: 0.6870 - val_accuracy: 0.7082 - val_loss: 0.7644
Q 7928+900  T 8828  ☒ 8711 
Q 6+710     T 716   ☒ 717  
Q 6-949     T -943  ☑ -943 
Q 17+2472   T 2489  ☒ 2498 
Q 25+5      T 30    ☑ 30   
Q 3-5359    T -5356 ☒ -5355
Q 1-520     T -519  ☑ -519 
Q 84-6118   T -6034 ☒ -6040
Q 589-159   T 430   ☒ 400  
Q 88-14     T 74    ☑ 74   

--------------------------------------------------
Iteration 37
[1m276/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7507 - loss: 0.6757
Epoch 1: val_loss did not improve from 0.76440
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7507 - loss: 0.6757 - val_accuracy: 0.7056 - val_loss: 0.7664
Q 542-218   T 324   ☒ 359  
Q 934-66    T 868   ☒ 888  
Q 4+5140    T 5144  ☒ 5156 
Q 8+188     T 196   ☒ 195  
Q 543-5   



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7515 - loss: 0.6669 - val_accuracy: 0.7126 - val_loss: 0.7497
Q 1402+4    T 1406  ☒ 1418 
Q 8-20      T -12   ☑ -12  
Q 1602+38   T 1640  ☒ 1632 
Q 6009+7    T 6016  ☒ 6014 
Q 793+908   T 1701  ☒ 1799 
Q 754+535   T 1289  ☒ 1288 
Q 3348+4055 T 7403  ☒ 7385 
Q 23-450    T -427  ☒ -422 
Q 67+8671   T 8738  ☒ 8729 
Q 178+0     T 178   ☑ 178  

--------------------------------------------------
Iteration 39
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7576 - loss: 0.6550
Epoch 1: val_loss improved from 0.74967 to 0.74884, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7576 - loss: 0.6550 - val_accuracy: 0.7157 - val_loss: 0.7488
Q 480-1     T 479   ☑ 479  
Q 837+0     T 837   ☑ 837  
Q 972+4399  T 5371  ☒ 5022 
Q 0-38      T -38   ☑ -38  
Q 85-82     T 3     ☒ 1    
Q 77+5792   T 5869  ☒ 5866 
Q 723-9447  T -8724 ☒ -8792
Q 8-113     T -105  ☑ -105 
Q 170-13    T 157   ☒ 155  
Q 3398+7    T 3405  ☒ 3397 

--------------------------------------------------
Iteration 40
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 8ms/step - accuracy: 0.7614 - loss: 0.6453
Epoch 1: val_loss improved from 0.74884 to 0.74520, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.7614 - loss: 0.6453 - val_accuracy: 0.7134 - val_loss: 0.7452
Q 31-358    T -327  ☒ -324 
Q 405+18    T 423   ☒ 417  
Q 1067-62   T 1005  ☒ 100  
Q 2999+580  T 3579  ☒ 3688 
Q 589+65    T 654   ☒ 656  
Q 759+3     T 762   ☒ 761  
Q 71-28     T 43    ☒ 42   
Q 20+964    T 984   ☒ 997  
Q 97-138    T -41   ☒ -56  
Q 757-1490  T -733  ☒ -785 

--------------------------------------------------
Iteration 41
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7656 - loss: 0.6349
Epoch 1: val_loss improved from 0.74520 to 0.73602, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7656 - loss: 0.6349 - val_accuracy: 0.7191 - val_loss: 0.7360
Q 5-8818    T -8813 ☑ -8813
Q 925-2     T 923   ☒ 921  
Q 968-41    T 927   ☒ 939  
Q 47+547    T 594   ☒ 514  
Q 1402-41   T 1361  ☒ 1979 
Q 7111+135  T 7246  ☒ 7257 
Q 505-43    T 462   ☒ 469  
Q 316+1     T 317   ☒ 316  
Q 361+990   T 1351  ☒ 1362 
Q 105-44    T 61    ☒ 57   

--------------------------------------------------
Iteration 42
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7716 - loss: 0.6187
Epoch 1: val_loss did not improve from 0.73602
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7715 - loss: 0.6188 - val_accuracy: 0.7189 - val_loss: 0.7400
Q 95-47     T 48    ☒ 49   
Q 34-27     T 7     ☒ 1    
Q 161+363   T 524   ☒ 419  
Q 811-9     T 802   ☒ 702  
Q 4-736   



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7745 - loss: 0.6101 - val_accuracy: 0.7201 - val_loss: 0.7300
Q 1-9013    T -9012 ☒ -9021
Q 722-93    T 629   ☒ 632  
Q 5929+527  T 6456  ☒ 6404 
Q 9+598     T 607   ☑ 607  
Q 9830-5537 T 4293  ☒ 4333 
Q 1841-458  T 1383  ☒ 1587 
Q 298+6498  T 6796  ☒ 6880 
Q 6-976     T -970  ☒ -969 
Q 3+8106    T 8109  ☒ 8112 
Q 0-7311    T -7311 ☒ -7310

--------------------------------------------------
Iteration 44
[1m277/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7705 - loss: 0.6169
Epoch 1: val_loss did not improve from 0.72997
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7706 - loss: 0.6169 - val_accuracy: 0.7167 - val_loss: 0.7390
Q 1161+362  T 1523  ☒ 1517 
Q 4132-0    T 4132  ☒ 4137 
Q 72-7      T 65    ☑ 65   
Q 58+5      T 63    ☑ 63   
Q 35-329  



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7789 - loss: 0.5977 - val_accuracy: 0.7259 - val_loss: 0.7093
Q 91+9267   T 9358  ☒ 9368 
Q 6579-248  T 6331  ☒ 6436 
Q 44+7      T 51    ☑ 51   
Q 114+1     T 115   ☒ 114  
Q 52-5555   T -5503 ☒ -5597
Q 6+140     T 146   ☑ 146  
Q 3468-2    T 3466  ☒ 3463 
Q 4019-518  T 3501  ☒ 3588 
Q 477-1     T 476   ☒ 477  
Q 1499+8268 T 9767  ☒ 10833

--------------------------------------------------
Iteration 46
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7849 - loss: 0.5855
Epoch 1: val_loss improved from 0.70935 to 0.70912, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 8ms/step - accuracy: 0.7849 - loss: 0.5855 - val_accuracy: 0.7278 - val_loss: 0.7091
Q 757+87    T 844   ☒ 843  
Q 3393-702  T 2691  ☒ 2577 
Q 7543-82   T 7461  ☒ 7444 
Q 416+17    T 433   ☒ 423  
Q 70+19     T 89    ☒ 88   
Q 717-328   T 389   ☒ 355  
Q 18+733    T 751   ☒ 740  
Q 3283-22   T 3261  ☒ 3263 
Q 5320+7    T 5327  ☒ 5326 
Q 342-14    T 328   ☒ 327  

--------------------------------------------------
Iteration 47
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 7ms/step - accuracy: 0.7884 - loss: 0.5771
Epoch 1: val_loss improved from 0.70912 to 0.70533, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7884 - loss: 0.5772 - val_accuracy: 0.7306 - val_loss: 0.7053
Q 29-4710   T -4681 ☒ -4672
Q 1337-67   T 1270  ☒ 1392 
Q 11-4998   T -4987 ☒ -4992
Q 99-2808   T -2709 ☒ -2790
Q 16+536    T 552   ☑ 552  
Q 506+34    T 540   ☒ 551  
Q 468-228   T 240   ☒ 226  
Q 721-16    T 705   ☑ 705  
Q 886-554   T 332   ☒ 323  
Q 3793-893  T 2900  ☒ 2928 

--------------------------------------------------
Iteration 48
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 10ms/step - accuracy: 0.7919 - loss: 0.5672
Epoch 1: val_loss improved from 0.70533 to 0.70393, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 11ms/step - accuracy: 0.7919 - loss: 0.5672 - val_accuracy: 0.7316 - val_loss: 0.7039
Q 37-98     T -61   ☒ -51  
Q 423-75    T 348   ☒ 356  
Q 101+60    T 161   ☒ 162  
Q 6367+256  T 6623  ☒ 6712 
Q 34-314    T -280  ☒ -200 
Q 728-95    T 633   ☒ 646  
Q 9+1230    T 1239  ☒ 1231 
Q 8-662     T -654  ☑ -654 
Q 3-5359    T -5356 ☒ -5355
Q 5771+596  T 6367  ☒ 6464 

--------------------------------------------------
Iteration 49
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 6ms/step - accuracy: 0.7923 - loss: 0.5620
Epoch 1: val_loss improved from 0.70393 to 0.69383, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - accuracy: 0.7923 - loss: 0.5620 - val_accuracy: 0.7317 - val_loss: 0.6938
Q 130-9952  T -9822 ☒ -9828
Q 691+98    T 789   ☒ 781  
Q 5796-52   T 5744  ☒ 5736 
Q 8864-8    T 8856  ☒ 8848 
Q 9898+9483 T 19381 ☒ 18123
Q 7259-6344 T 915   ☒ 11   
Q 976-4     T 972   ☑ 972  
Q 298+6498  T 6796  ☒ 6877 
Q 918+9775  T 10693 ☒ 10650
Q 9-5801    T -5792 ☑ -5792


a) I think the model performs reasonably well, with a training accuracy of about 80% and validation accuracy of about 73%. These results suggest that the model is able to learn and generalize on the task to some extent. It’s possible that the model is not fully capturing the necessary dependencies between the inputs and outputs, as reflected by some of the errors in the output predictions. In the examples from the validation data, it seems that surprisingly, the model most of the time works in estimating the answer.

b)
One limitation of this model is its inability to effectively capture long-range dependencies or focus on the most relevant parts of the input sequence for each prediction. Without an attention mechanism, the model struggles to selectively prioritize different parts of the input, which is crucial in tasks where the input sequence may vary significantly and require different areas of focus for each prediction(like this task). This lack of flexibility can hinder performance, especially when dealing with complex or nuanced patterns. Additionally, the simple encoder-decoder structure limits the model’s ability to handle more intricate dependencies, reducing its generalization to more complex inputs.

c)  Improvements:
To improve this model, I would suggest implementing an attention mechanism. Attention would allow the model to focus on different parts of the sequence, which is especially useful in sequence-to-sequence tasks where important information may be scattered throughout the input.
Furthermore, hyperparameter tuning could help, such as adjusting the learning rate, increasing the number of layers, or adding more neurons per layer. Regularization techniques like dropout could help prevent overfitting and improve generalization.
Additionally, expanding the training data could help the model generalize better to a wider range of inputs. We can balance the dataset between operations to ensure the model works well for sunstraction and addition.

d) Yes, an attention mechanism can be applied to this model, and it would be particularly beneficial for the current problem of solving arithmetic equations. The simple encoder-decoder structure without attention faces challenges in handling long input sequences or varying complexity in equations.

In this problem, an attention mechanism would allow the model to focus on relevant parts of the input sequence (such as specific digits or operators) when making predictions, rather than relying on a fixed-length representation of the entire sequence. This would enable the model to handle longer sequences more effectively, improve its generalization to larger numbers, and reduce errors in operations.

## 1.3


In [None]:
# Defining model 2 - adding attention
# Defining input layer
inputs = Input(shape=(maxlen, len(chars)))

# First LSTM layer - Output sequences to feed into attention
lstm_1 = LSTM(config_dict['hidden_size'], return_sequences=True)(inputs)

# Attention layer - Attention mechanism applied on lstm_1 output
attention = Attention()([lstm_1, lstm_1])

# Flattenning the attention output to match the expected input shape for RepeatVector
attention_flattened = Flatten()(attention)

# Repeat Vector layer to ensure that the output has the same shape as the original sequence length
repeat_vector = RepeatVector(config_dict['digits'] + 1)(attention_flattened)

# Second LSTM layer - Output sequences
lstm_2 = LSTM(config_dict['hidden_size'], return_sequences=True)(repeat_vector)

# TimeDistributed Dense layer for output (categorical distribution)
outputs = TimeDistributed(Dense(len(chars), activation='softmax'))(lstm_2)

# Creating the model
model2 = Model(inputs=inputs, outputs=outputs)

# Compile the model
model2.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model2.summary()

# Train the model each generation and show predictions against the validation
# dataset.
checkpoint = ModelCheckpoint('best_model.h5',
                             monitor='val_loss',
                             save_best_only=True,
                             mode='min',
                             verbose=1)
for iteration in range(config_dict['iterations']):
    print()
    print('-' * 50)
    print('Iteration', iteration)

    model2.fit(x_train, y_train,
              batch_size=config_dict['batch_size'],
              epochs=1,
              validation_data=(x_val, y_val),
              callbacks=[checkpoint])
    # Select 10 samples from the validation set at random so we can visualize
    # errors.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model2.predict(rowx, verbose=0)
        preds = np.argmax(preds, axis=-1)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print('☑', end=' ')
        else:
            print('☒', end=' ')
        print(guess)


--------------------------------------------------
Iteration 0
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 117ms/step - accuracy: 0.3369 - loss: 1.9548
Epoch 1: val_loss improved from inf to 1.59994, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 124ms/step - accuracy: 0.3371 - loss: 1.9542 - val_accuracy: 0.4311 - val_loss: 1.5999
Q 377+936   T 1313  ☒ 7777 
Q 765-50    T 715   ☒ 556  
Q 9561+34   T 9595  ☒ 1556 
Q 733-695   T 38    ☒ -33  
Q 286+39    T 325   ☒ 126  
Q 8-9611    T -9603 ☒ -1619
Q 8-20      T -12   ☒ -2   
Q 8375+745  T 9120  ☒ 7757 
Q 238-8495  T -8257 ☒ -835 
Q 9134+594  T 9728  ☒ 1055 

--------------------------------------------------
Iteration 1
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 125ms/step - accuracy: 0.4321 - loss: 1.5821
Epoch 1: val_loss improved from 1.59994 to 1.52157, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 129ms/step - accuracy: 0.4322 - loss: 1.5820 - val_accuracy: 0.4462 - val_loss: 1.5216
Q 1214-507  T 707   ☒ 1111 
Q 997-775   T 222   ☒ 797  
Q 46-2055   T -2009 ☒ -4253
Q 6+6553    T 6559  ☒ 553  
Q 458-38    T 420   ☒ 437  
Q 9083-853  T 8230  ☒ 9333 
Q 9058+3967 T 13025 ☒ 1033 
Q 99-2318   T -2219 ☒ -9993
Q 9-6131    T -6122 ☒ -9663
Q 6+957     T 963   ☒ 655  

--------------------------------------------------
Iteration 2
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 121ms/step - accuracy: 0.4517 - loss: 1.4921
Epoch 1: val_loss improved from 1.52157 to 1.43801, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 127ms/step - accuracy: 0.4517 - loss: 1.4920 - val_accuracy: 0.4690 - val_loss: 1.4380
Q 7+138     T 145   ☒ 732  
Q 8+36      T 44    ☒ 83   
Q 6915+1    T 6916  ☒ 6666 
Q 839-54    T 785   ☒ 353  
Q 7500-59   T 7441  ☒ 5053 
Q 8918+2    T 8920  ☒ 8890 
Q 275+8     T 283   ☒ 788  
Q 683-483   T 200   ☒ 333  
Q 91-47     T 44    ☒ -5   
Q 51-1031   T -980  ☒ -111 

--------------------------------------------------
Iteration 3
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step - accuracy: 0.4706 - loss: 1.4249
Epoch 1: val_loss improved from 1.43801 to 1.38526, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.4706 - loss: 1.4248 - val_accuracy: 0.4866 - val_loss: 1.3853
Q 86+517    T 603   ☒ 152  
Q 43-54     T -11   ☒ -2   
Q 61-1      T 60    ☒ 16   
Q 82+535    T 617   ☒ 259  
Q 2378+6    T 2384  ☒ 3325 
Q 6187-434  T 5753  ☒ 611  
Q 7503-70   T 7433  ☒ 7077 
Q 6856+3    T 6859  ☒ 6669 
Q 26-5      T 21    ☒ 56   
Q 390+1     T 391   ☒ 309  

--------------------------------------------------
Iteration 4
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step - accuracy: 0.4948 - loss: 1.3608
Epoch 1: val_loss improved from 1.38526 to 1.31635, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.4948 - loss: 1.3608 - val_accuracy: 0.5082 - val_loss: 1.3164
Q 0-768     T -768  ☒ -776 
Q 9+8649    T 8658  ☒ 9844 
Q 5473-243  T 5230  ☒ 4416 
Q 5+964     T 969   ☒ 964  
Q 4+956     T 960   ☒ 954  
Q 9561-394  T 9167  ☒ 9442 
Q 62+1813   T 1875  ☒ 1224 
Q 61-1      T 60    ☒ 66   
Q 1841-924  T 917   ☒ 1412 
Q 858+19    T 877   ☒ 844  

--------------------------------------------------
Iteration 5
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.5223 - loss: 1.2860
Epoch 1: val_loss improved from 1.31635 to 1.24331, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.5224 - loss: 1.2858 - val_accuracy: 0.5412 - val_loss: 1.2433
Q 74+4      T 78    ☒ 79   
Q 2034-606  T 1428  ☒ 2111 
Q 813+1474  T 2287  ☒ 1066 
Q 799-8     T 791   ☒ 784  
Q 8+621     T 629   ☒ 628  
Q 585-25    T 560   ☒ 586  
Q 0+791     T 791   ☒ 707  
Q 236+11    T 247   ☒ 268  
Q 140+9     T 149   ☒ 104  
Q 3847+2139 T 5986  ☒ 4166 

--------------------------------------------------
Iteration 6
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.5529 - loss: 1.2123
Epoch 1: val_loss improved from 1.24331 to 1.16945, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 122ms/step - accuracy: 0.5530 - loss: 1.2122 - val_accuracy: 0.5667 - val_loss: 1.1695
Q 4764-6    T 4758  ☒ 4471 
Q 68-6343   T -6275 ☒ -6377
Q 59-696    T -637  ☒ -617 
Q 542+9048  T 9590  ☒ 1035 
Q 3+318     T 321   ☒ 338  
Q 6904+67   T 6971  ☒ 6001 
Q 941+3     T 944   ☒ 941  
Q 7+1769    T 1776  ☒ 1734 
Q 3455-20   T 3435  ☒ 3257 
Q 43+98     T 141   ☒ 133  

--------------------------------------------------
Iteration 7
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 118ms/step - accuracy: 0.5790 - loss: 1.1439
Epoch 1: val_loss improved from 1.16945 to 1.11824, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 122ms/step - accuracy: 0.5790 - loss: 1.1438 - val_accuracy: 0.5836 - val_loss: 1.1182
Q 2082+4125 T 6207  ☒ 4712 
Q 4648-9033 T -4385 ☒ -4548
Q 86-571    T -485  ☒ -597 
Q 6179-8    T 6171  ☒ 6118 
Q 70+964    T 1034  ☒ 1045 
Q 779+174   T 953   ☒ 144  
Q 8783-13   T 8770  ☒ 8766 
Q 4826-64   T 4762  ☒ 4819 
Q 239+15    T 254   ☒ 244  
Q 5473-243  T 5230  ☒ 4810 

--------------------------------------------------
Iteration 8
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 123ms/step - accuracy: 0.5984 - loss: 1.0891
Epoch 1: val_loss improved from 1.11824 to 1.08284, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.5984 - loss: 1.0891 - val_accuracy: 0.5959 - val_loss: 1.0828
Q 5+8613    T 8618  ☒ 8616 
Q 62-82     T -20   ☒ -1   
Q 369+874   T 1243  ☒ 1227 
Q 52+697    T 749   ☒ 127  
Q 266-9897  T -9631 ☒ -9561
Q 412+2652  T 3064  ☒ 2661 
Q 94+82     T 176   ☒ 162  
Q 6-990     T -984  ☒ -996 
Q 3003-580  T 2423  ☒ 2111 
Q 5240-5803 T -563  ☒ -344 

--------------------------------------------------
Iteration 9
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.6154 - loss: 1.0495
Epoch 1: val_loss improved from 1.08284 to 1.06353, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.6154 - loss: 1.0495 - val_accuracy: 0.6015 - val_loss: 1.0635
Q 40+3254   T 3294  ☒ 2267 
Q 345-3     T 342   ☑ 342  
Q 9-3767    T -3758 ☒ -3702
Q 205+9     T 214   ☑ 214  
Q 1848+274  T 2122  ☒ 2511 
Q 41-4      T 37    ☒ 47   
Q 326+32    T 358   ☒ 361  
Q 9-6131    T -6122 ☒ -6177
Q 831+0     T 831   ☑ 831  
Q 5648+1906 T 7554  ☒ 6641 

--------------------------------------------------
Iteration 10
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 125ms/step - accuracy: 0.6285 - loss: 1.0117
Epoch 1: val_loss improved from 1.06353 to 1.01545, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 130ms/step - accuracy: 0.6285 - loss: 1.0117 - val_accuracy: 0.6244 - val_loss: 1.0154
Q 619+7905  T 8524  ☒ 8687 
Q 309+548   T 857   ☒ 863  
Q 5883-7138 T -1255 ☒ -122 
Q 84-3      T 81    ☒ 87   
Q 4241+8    T 4249  ☒ 4226 
Q 950-908   T 42    ☒ -1   
Q 9-345     T -336  ☒ -339 
Q 9+6893    T 6902  ☒ 6996 
Q 69-0      T 69    ☒ 68   
Q 1+764     T 765   ☑ 765  

--------------------------------------------------
Iteration 11
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 123ms/step - accuracy: 0.6391 - loss: 0.9778
Epoch 1: val_loss improved from 1.01545 to 0.99089, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.6391 - loss: 0.9778 - val_accuracy: 0.6296 - val_loss: 0.9909
Q 484-437   T 47    ☒ 300  
Q 149+8     T 157   ☒ 147  
Q 1-814     T -813  ☑ -813 
Q 9+1805    T 1814  ☒ 1818 
Q 159+9042  T 9201  ☒ 9033 
Q 5-6535    T -6530 ☒ -6538
Q 20-618    T -598  ☒ -688 
Q 5001-4390 T 611   ☒ 3192 
Q 2-3367    T -3365 ☒ -3366
Q 42-2      T 40    ☒ 41   

--------------------------------------------------
Iteration 12
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 130ms/step - accuracy: 0.6505 - loss: 0.9503
Epoch 1: val_loss improved from 0.99089 to 0.96779, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 135ms/step - accuracy: 0.6505 - loss: 0.9503 - val_accuracy: 0.6374 - val_loss: 0.9678
Q 7094+11   T 7105  ☒ 7077 
Q 142-8     T 134   ☒ 137  
Q 25+90     T 115   ☒ 113  
Q 341+70    T 411   ☒ 474  
Q 34+9124   T 9158  ☒ 9131 
Q 520+90    T 610   ☒ 511  
Q 937+4     T 941   ☒ 949  
Q 6128+3    T 6131  ☒ 6171 
Q 67+66     T 133   ☒ 132  
Q 842+779   T 1621  ☒ 1578 

--------------------------------------------------
Iteration 13
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 124ms/step - accuracy: 0.6611 - loss: 0.9209
Epoch 1: val_loss improved from 0.96779 to 0.95022, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 129ms/step - accuracy: 0.6611 - loss: 0.9209 - val_accuracy: 0.6432 - val_loss: 0.9502
Q 9-3884    T -3875 ☑ -3875
Q 33-819    T -786  ☒ -756 
Q 2+2       T 4     ☒ 5    
Q 441-9     T 432   ☒ 435  
Q 839-54    T 785   ☒ 849  
Q 1-2311    T -2310 ☒ -2312
Q 382+4697  T 5079  ☒ 5720 
Q 5+769     T 774   ☒ 782  
Q 8+6769    T 6777  ☒ 6784 
Q 4411+5677 T 10088 ☒ 1077 

--------------------------------------------------
Iteration 14
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step - accuracy: 0.6719 - loss: 0.8941
Epoch 1: val_loss improved from 0.95022 to 0.92329, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 122ms/step - accuracy: 0.6719 - loss: 0.8941 - val_accuracy: 0.6542 - val_loss: 0.9233
Q 4904-1526 T 3378  ☒ 3555 
Q 6766-6918 T -152  ☒ -19  
Q 5+645     T 650   ☒ 649  
Q 8-141     T -133  ☒ -137 
Q 35-7337   T -7302 ☒ -7300
Q 6340+61   T 6401  ☒ 6471 
Q 7+613     T 620   ☒ 617  
Q 46-24     T 22    ☑ 22   
Q 640-47    T 593   ☒ 587  
Q 765-50    T 715   ☑ 715  

--------------------------------------------------
Iteration 15
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 127ms/step - accuracy: 0.6822 - loss: 0.8663
Epoch 1: val_loss improved from 0.92329 to 0.90022, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 131ms/step - accuracy: 0.6821 - loss: 0.8663 - val_accuracy: 0.6610 - val_loss: 0.9002
Q 40-0      T 40    ☒ 30   
Q 3732+851  T 4583  ☒ 4525 
Q 493+14    T 507   ☒ 596  
Q 76+31     T 107   ☑ 107  
Q 3907-1    T 3906  ☒ 3909 
Q 7573+314  T 7887  ☒ 7904 
Q 4-902     T -898  ☒ -905 
Q 0-103     T -103  ☑ -103 
Q 4-78      T -74   ☑ -74  
Q 1610-2    T 1608  ☒ 1602 

--------------------------------------------------
Iteration 16
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 122ms/step - accuracy: 0.6909 - loss: 0.8442
Epoch 1: val_loss improved from 0.90022 to 0.88645, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 127ms/step - accuracy: 0.6909 - loss: 0.8441 - val_accuracy: 0.6691 - val_loss: 0.8864
Q 71-1621   T -1550 ☒ -1593
Q 3397-38   T 3359  ☒ 3305 
Q 22-494    T -472  ☒ -466 
Q 0-559     T -559  ☑ -559 
Q 9-814     T -805  ☒ -806 
Q 338+2991  T 3329  ☒ 4099 
Q 5+206     T 211   ☒ 208  
Q 10+26     T 36    ☒ 37   
Q 750-9     T 741   ☒ 750  
Q 42+9628   T 9670  ☒ 9686 

--------------------------------------------------
Iteration 17
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.6994 - loss: 0.8227
Epoch 1: val_loss improved from 0.88645 to 0.87422, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.6994 - loss: 0.8227 - val_accuracy: 0.6703 - val_loss: 0.8742
Q 1099+9086 T 10185 ☒ 10889
Q 5181-4    T 5177  ☒ 5183 
Q 2618-19   T 2599  ☒ 2607 
Q 71+8401   T 8472  ☒ 8483 
Q 960+2860  T 3820  ☒ 3884 
Q 0-768     T -768  ☑ -768 
Q 6-966     T -960  ☒ -961 
Q 90+329    T 419   ☒ 311  
Q 662-2     T 660   ☒ 654  
Q 3048+267  T 3315  ☒ 3511 

--------------------------------------------------
Iteration 18
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 118ms/step - accuracy: 0.7077 - loss: 0.8014
Epoch 1: val_loss improved from 0.87422 to 0.85990, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 124ms/step - accuracy: 0.7077 - loss: 0.8014 - val_accuracy: 0.6775 - val_loss: 0.8599
Q 137+8     T 145   ☑ 145  
Q 132+4     T 136   ☑ 136  
Q 309+548   T 857   ☒ 866  
Q 204+7747  T 7951  ☒ 7877 
Q 8149-977  T 7172  ☒ 7726 
Q 1-2       T -1    ☑ -1   
Q 19-898    T -879  ☒ -870 
Q 7541+88   T 7629  ☒ 7522 
Q 28-392    T -364  ☑ -364 
Q 9-6686    T -6677 ☑ -6677

--------------------------------------------------
Iteration 19
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 118ms/step - accuracy: 0.7168 - loss: 0.7790
Epoch 1: val_loss improved from 0.85990 to 0.84753, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 124ms/step - accuracy: 0.7168 - loss: 0.7790 - val_accuracy: 0.6824 - val_loss: 0.8475
Q 7273-51   T 7222  ☒ 7262 
Q 2242-839  T 1403  ☒ 1511 
Q 17-2      T 15    ☑ 15   
Q 21+953    T 974   ☒ 968  
Q 633-17    T 616   ☒ 624  
Q 5797-3    T 5794  ☑ 5794 
Q 4510-218  T 4292  ☒ 4188 
Q 8-6854    T -6846 ☒ -6854
Q 9+27      T 36    ☒ 94   
Q 998-29    T 969   ☒ 979  

--------------------------------------------------
Iteration 20
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step - accuracy: 0.7247 - loss: 0.7555
Epoch 1: val_loss improved from 0.84753 to 0.83087, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 125ms/step - accuracy: 0.7247 - loss: 0.7556 - val_accuracy: 0.6891 - val_loss: 0.8309
Q 95-52     T 43    ☒ 47   
Q 20+69     T 89    ☒ 80   
Q 559+7     T 566   ☒ 578  
Q 94-685    T -591  ☒ -573 
Q 5290-2    T 5288  ☒ 5291 
Q 152+417   T 569   ☒ 563  
Q 88-75     T 13    ☑ 13   
Q 9181+34   T 9215  ☒ 9237 
Q 182-7171  T -6989 ☒ -6003
Q 0+784     T 784   ☑ 784  

--------------------------------------------------
Iteration 21
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 115ms/step - accuracy: 0.7350 - loss: 0.7333
Epoch 1: val_loss improved from 0.83087 to 0.80168, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 125ms/step - accuracy: 0.7350 - loss: 0.7333 - val_accuracy: 0.7027 - val_loss: 0.8017
Q 1528-759  T 769   ☒ 707  
Q 123-4     T 119   ☒ 118  
Q 9-184     T -175  ☒ -176 
Q 9918+996  T 10914 ☒ 10087
Q 25-4      T 21    ☑ 21   
Q 169+969   T 1138  ☒ 1056 
Q 5+9003    T 9008  ☒ 9005 
Q 43-54     T -11   ☒ -1   
Q 534+299   T 833   ☒ 845  
Q 80+37     T 117   ☒ 111  

--------------------------------------------------
Iteration 22
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 123ms/step - accuracy: 0.7491 - loss: 0.7042
Epoch 1: val_loss improved from 0.80168 to 0.79193, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 133ms/step - accuracy: 0.7491 - loss: 0.7042 - val_accuracy: 0.7063 - val_loss: 0.7919
Q 5565-1339 T 4226  ☒ 4057 
Q 309+7     T 316   ☑ 316  
Q 9624+22   T 9646  ☒ 9644 
Q 3-75      T -72   ☑ -72  
Q 80-77     T 3     ☒ -    
Q 11-4883   T -4872 ☒ -4870
Q 3774-569  T 3205  ☒ 3113 
Q 840+0     T 840   ☑ 840  
Q 4421-6    T 4415  ☒ 4417 
Q 1719-33   T 1686  ☒ 1606 

--------------------------------------------------
Iteration 23
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 115ms/step - accuracy: 0.7584 - loss: 0.6831
Epoch 1: val_loss improved from 0.79193 to 0.76387, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 119ms/step - accuracy: 0.7584 - loss: 0.6831 - val_accuracy: 0.7195 - val_loss: 0.7639
Q 651+99    T 750   ☒ 754  
Q 8574-71   T 8503  ☒ 8577 
Q 90-8599   T -8509 ☑ -8509
Q 4387+0    T 4387  ☑ 4387 
Q 283+88    T 371   ☒ 366  
Q 5+345     T 350   ☑ 350  
Q 2553+9744 T 12297 ☒ 11099
Q 182+2886  T 3068  ☒ 3060 
Q 8889-5460 T 3429  ☒ 3755 
Q 5-730     T -725  ☑ -725 

--------------------------------------------------
Iteration 24
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step - accuracy: 0.7731 - loss: 0.6500
Epoch 1: val_loss improved from 0.76387 to 0.73055, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 125ms/step - accuracy: 0.7731 - loss: 0.6500 - val_accuracy: 0.7351 - val_loss: 0.7305
Q 7659+8695 T 16354 ☒ 15334
Q 11+608    T 619   ☑ 619  
Q 1+84      T 85    ☑ 85   
Q 5004-98   T 4906  ☒ 4944 
Q 7+787     T 794   ☒ 784  
Q 9+123     T 132   ☑ 132  
Q 10-1425   T -1415 ☒ -1347
Q 20+0      T 20    ☑ 20   
Q 809-122   T 687   ☒ 773  
Q 7+718     T 725   ☑ 725  

--------------------------------------------------
Iteration 25
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.7875 - loss: 0.6159
Epoch 1: val_loss improved from 0.73055 to 0.71026, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 122ms/step - accuracy: 0.7875 - loss: 0.6159 - val_accuracy: 0.7441 - val_loss: 0.7103
Q 5059-6    T 5053  ☒ 5045 
Q 931+444   T 1375  ☒ 1367 
Q 93-9260   T -9167 ☒ -9139
Q 2-547     T -545  ☑ -545 
Q 0-845     T -845  ☑ -845 
Q 948-5600  T -4652 ☒ -4506
Q 342+7     T 349   ☑ 349  
Q 3449-102  T 3347  ☒ 3312 
Q 363-1     T 362   ☑ 362  
Q 223+8697  T 8920  ☒ 8910 

--------------------------------------------------
Iteration 26
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 116ms/step - accuracy: 0.7977 - loss: 0.5878
Epoch 1: val_loss improved from 0.71026 to 0.68119, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 120ms/step - accuracy: 0.7977 - loss: 0.5878 - val_accuracy: 0.7520 - val_loss: 0.6812
Q 5039+20   T 5059  ☒ 5111 
Q 75+6982   T 7057  ☒ 7055 
Q 17+63     T 80    ☒ 84   
Q 1189-8516 T -7327 ☒ -7333
Q 15+5094   T 5109  ☒ 5099 
Q 2+2       T 4     ☑ 4    
Q 8+993     T 1001  ☑ 1001 
Q 9+7728    T 7737  ☒ 7739 
Q 62+261    T 323   ☒ 393  
Q 9+1318    T 1327  ☒ 1339 

--------------------------------------------------
Iteration 27
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 127ms/step - accuracy: 0.8092 - loss: 0.5527
Epoch 1: val_loss improved from 0.68119 to 0.65242, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 131ms/step - accuracy: 0.8091 - loss: 0.5527 - val_accuracy: 0.7631 - val_loss: 0.6524
Q 5+769     T 774   ☒ 784  
Q 976+992   T 1968  ☒ 1074 
Q 850-318   T 532   ☒ 568  
Q 9424-552  T 8872  ☒ 8988 
Q 83+96     T 179   ☑ 179  
Q 0+73      T 73    ☑ 73   
Q 58-6561   T -6503 ☒ -6593
Q 37+77     T 114   ☑ 114  
Q 384+583   T 967   ☒ 923  
Q 13-6083   T -6070 ☒ -6074

--------------------------------------------------
Iteration 28
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step - accuracy: 0.8183 - loss: 0.5259
Epoch 1: val_loss improved from 0.65242 to 0.64572, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 129ms/step - accuracy: 0.8183 - loss: 0.5259 - val_accuracy: 0.7676 - val_loss: 0.6457
Q 93-72     T 21    ☒ 11   
Q 58-71     T -13   ☑ -13  
Q 81-7424   T -7343 ☒ -7353
Q 60-6236   T -6176 ☒ -6274
Q 1992-33   T 1959  ☒ 1975 
Q 55-89     T -34   ☒ -44  
Q 7072+1    T 7073  ☑ 7073 
Q 5285-1    T 5284  ☑ 5284 
Q 1-91      T -90   ☒ -80  
Q 118+0     T 118   ☒ 128  

--------------------------------------------------
Iteration 29
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 116ms/step - accuracy: 0.8273 - loss: 0.5011
Epoch 1: val_loss improved from 0.64572 to 0.61793, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.8273 - loss: 0.5011 - val_accuracy: 0.7731 - val_loss: 0.6179
Q 4037+48   T 4085  ☒ 4171 
Q 5099+8955 T 14054 ☒ 14444
Q 1-814     T -813  ☑ -813 
Q 82+757    T 839   ☒ 831  
Q 9397-31   T 9366  ☑ 9366 
Q 8906-99   T 8807  ☒ 8819 
Q 147-9     T 138   ☑ 138  
Q 34+24     T 58    ☒ 68   
Q 2030+72   T 2102  ☒ 2103 
Q 80+2199   T 2279  ☒ 2271 

--------------------------------------------------
Iteration 30
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 124ms/step - accuracy: 0.8317 - loss: 0.4839
Epoch 1: val_loss improved from 0.61793 to 0.59391, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 129ms/step - accuracy: 0.8317 - loss: 0.4839 - val_accuracy: 0.7825 - val_loss: 0.5939
Q 62+1813   T 1875  ☒ 1885 
Q 8266-1993 T 6273  ☒ 6755 
Q 320-9876  T -9556 ☒ -9566
Q 376+1     T 377   ☑ 377  
Q 8-61      T -53   ☑ -53  
Q 6521+74   T 6595  ☒ 6587 
Q 361+5     T 366   ☑ 366  
Q 354+109   T 463   ☒ 545  
Q 891-9071  T -8180 ☒ -8222
Q 4-231     T -227  ☒ -217 

--------------------------------------------------
Iteration 31
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 116ms/step - accuracy: 0.8429 - loss: 0.4546
Epoch 1: val_loss improved from 0.59391 to 0.58823, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.8429 - loss: 0.4546 - val_accuracy: 0.7869 - val_loss: 0.5882
Q 709-10    T 699   ☑ 699  
Q 463-461   T 2     ☒ -3   
Q 627-34    T 593   ☒ 619  
Q 919-9970  T -9051 ☒ -8077
Q 333+5579  T 5912  ☒ 5828 
Q 4689+55   T 4744  ☑ 4744 
Q 903-4247  T -3344 ☒ -3434
Q 98-69     T 29    ☒ 19   
Q 3275+4    T 3279  ☑ 3279 
Q 27+434    T 461   ☑ 461  

--------------------------------------------------
Iteration 32
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 123ms/step - accuracy: 0.8524 - loss: 0.4322
Epoch 1: val_loss improved from 0.58823 to 0.57863, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.8524 - loss: 0.4322 - val_accuracy: 0.7901 - val_loss: 0.5786
Q 933-551   T 382   ☑ 382  
Q 481-595   T -114  ☒ -16  
Q 499-5622  T -5123 ☒ -5153
Q 0+3628    T 3628  ☑ 3628 
Q 8182-172  T 8010  ☒ 7110 
Q 9958+864  T 10822 ☒ 10642
Q 3971-79   T 3892  ☒ 3762 
Q 5004-98   T 4906  ☒ 4946 
Q 5089-254  T 4835  ☒ 5745 
Q 138-971   T -833  ☒ -745 

--------------------------------------------------
Iteration 33
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 114ms/step - accuracy: 0.8576 - loss: 0.4191
Epoch 1: val_loss improved from 0.57863 to 0.56241, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 119ms/step - accuracy: 0.8576 - loss: 0.4192 - val_accuracy: 0.7996 - val_loss: 0.5624
Q 6-771     T -765  ☑ -765 
Q 6118+5    T 6123  ☑ 6123 
Q 263+7     T 270   ☑ 270  
Q 260-5     T 255   ☑ 255  
Q 0-73      T -73   ☑ -73  
Q 7322-32   T 7290  ☑ 7290 
Q 461+8153  T 8614  ☒ 8666 
Q 99-8      T 91    ☑ 91   
Q 914-5     T 909   ☑ 909  
Q 7222+117  T 7339  ☒ 7345 

--------------------------------------------------
Iteration 34
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 117ms/step - accuracy: 0.8637 - loss: 0.3991
Epoch 1: val_loss improved from 0.56241 to 0.54489, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.8637 - loss: 0.3991 - val_accuracy: 0.8019 - val_loss: 0.5449
Q 7503-70   T 7433  ☒ 7533 
Q 8717+3    T 8720  ☒ 8710 
Q 5526-222  T 5304  ☒ 5244 
Q 1059-1    T 1058  ☑ 1058 
Q 910-4410  T -3500 ☒ -3378
Q 45-5      T 40    ☑ 40   
Q 914-5     T 909   ☑ 909  
Q 17+537    T 554   ☑ 554  
Q 27+26     T 53    ☑ 53   
Q 1+1288    T 1289  ☑ 1289 

--------------------------------------------------
Iteration 35
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 124ms/step - accuracy: 0.8707 - loss: 0.3813
Epoch 1: val_loss improved from 0.54489 to 0.53366, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 129ms/step - accuracy: 0.8707 - loss: 0.3813 - val_accuracy: 0.8081 - val_loss: 0.5337
Q 6127+6    T 6133  ☒ 6131 
Q 476-691   T -215  ☒ -235 
Q 6985+621  T 7606  ☒ 7486 
Q 44+85     T 129   ☒ 139  
Q 88-8878   T -8790 ☑ -8790
Q 5797-8470 T -2673 ☒ -3999
Q 3+3759    T 3762  ☑ 3762 
Q 283+88    T 371   ☑ 371  
Q 414-45    T 369   ☒ 399  
Q 32-0      T 32    ☑ 32   

--------------------------------------------------
Iteration 36
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 121ms/step - accuracy: 0.8768 - loss: 0.3630
Epoch 1: val_loss did not improve from 0.53366
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 127ms/step - accuracy: 0.8768 - loss: 0.3631 - val_accuracy: 0.8083 - val_loss: 0.5364
Q 984+4541  T 5525  ☒ 5445 
Q 0-559     T -559  ☑ -559 
Q 2-720     T -718  ☑ -718 
Q 9171-67   T 9104  ☒ 9134 




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 134ms/step - accuracy: 0.8796 - loss: 0.3573 - val_accuracy: 0.8152 - val_loss: 0.5178
Q 574-587   T -13   ☑ -13  
Q 9958+864  T 10822 ☒ 10780
Q 523+8184  T 8707  ☒ 8629 
Q 6207-92   T 6115  ☒ 6185 
Q 13+567    T 580   ☑ 580  
Q 41+1063   T 1104  ☒ 1054 
Q 2992-12   T 2980  ☒ 2900 
Q 10+26     T 36    ☑ 36   
Q 274-2     T 272   ☑ 272  
Q 194+2097  T 2291  ☒ 2171 

--------------------------------------------------
Iteration 38
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.8873 - loss: 0.3381
Epoch 1: val_loss improved from 0.51782 to 0.51635, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 122ms/step - accuracy: 0.8873 - loss: 0.3381 - val_accuracy: 0.8163 - val_loss: 0.5163
Q 29-797    T -768  ☒ -758 
Q 23-683    T -660  ☒ -640 
Q 4+6127    T 6131  ☒ 6121 
Q 6127+6    T 6133  ☑ 6133 
Q 5-3835    T -3830 ☒ -3828
Q 306+7     T 313   ☑ 313  
Q 27+97     T 124   ☒ 114  
Q 1594+7    T 1601  ☒ 1691 
Q 2450+2    T 2452  ☒ 2448 
Q 5855+8330 T 14185 ☒ 13945

--------------------------------------------------
Iteration 39
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 118ms/step - accuracy: 0.8896 - loss: 0.3298
Epoch 1: val_loss improved from 0.51635 to 0.50216, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 123ms/step - accuracy: 0.8896 - loss: 0.3298 - val_accuracy: 0.8209 - val_loss: 0.5022
Q 42-861    T -819  ☑ -819 
Q 50-58     T -8    ☒ -    
Q 40-0      T 40    ☑ 40   
Q 128-671   T -543  ☒ -621 
Q 4+86      T 90    ☑ 90   
Q 3-615     T -612  ☑ -612 
Q 1+167     T 168   ☑ 168  
Q 24+3602   T 3626  ☑ 3626 
Q 0-967     T -967  ☑ -967 
Q 87-3934   T -3847 ☒ -3837

--------------------------------------------------
Iteration 40
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 116ms/step - accuracy: 0.8960 - loss: 0.3122
Epoch 1: val_loss did not improve from 0.50216
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 120ms/step - accuracy: 0.8960 - loss: 0.3122 - val_accuracy: 0.8215 - val_loss: 0.5082
Q 7-507     T -500  ☒ -490 
Q 9+27      T 36    ☑ 36   
Q 9507-5520 T 3987  ☒ 3167 
Q 1-25      T -24   ☑ -24  




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.9012 - loss: 0.2987 - val_accuracy: 0.8224 - val_loss: 0.4974
Q 920-1     T 919   ☑ 919  
Q 59-3      T 56    ☑ 56   
Q 504-147   T 357   ☒ 337  
Q 25+732    T 757   ☑ 757  
Q 625-7     T 618   ☑ 618  
Q 85+3      T 88    ☑ 88   
Q 936+4453  T 5389  ☒ 5381 
Q 9669+461  T 10130 ☒ 10258
Q 43-387    T -344  ☑ -344 
Q 903-4247  T -3344 ☒ -3332

--------------------------------------------------
Iteration 42
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 117ms/step - accuracy: 0.9026 - loss: 0.2941
Epoch 1: val_loss improved from 0.49742 to 0.49363, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 121ms/step - accuracy: 0.9025 - loss: 0.2941 - val_accuracy: 0.8251 - val_loss: 0.4936
Q 55+9      T 64    ☑ 64   
Q 2+536     T 538   ☑ 538  
Q 873+8     T 881   ☒ 891  
Q 978+25    T 1003  ☒ 103  
Q 0-4       T -4    ☑ -4   
Q 2675+143  T 2818  ☒ 2778 
Q 7047+9364 T 16411 ☒ 16801
Q 860+25    T 885   ☑ 885  
Q 4-7051    T -7047 ☑ -7047
Q 1171-8122 T -6951 ☒ -7039

--------------------------------------------------
Iteration 43
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 116ms/step - accuracy: 0.9085 - loss: 0.2801
Epoch 1: val_loss did not improve from 0.49363
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 119ms/step - accuracy: 0.9085 - loss: 0.2801 - val_accuracy: 0.8257 - val_loss: 0.5036
Q 484+675   T 1159  ☒ 1129 
Q 42+9628   T 9670  ☒ 9680 
Q 7-1073    T -1066 ☑ -1066
Q 71-278    T -207  ☑ -207 




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 133ms/step - accuracy: 0.9187 - loss: 0.2544 - val_accuracy: 0.8298 - val_loss: 0.4892
Q 8017+36   T 8053  ☒ 8001 
Q 8-1896    T -1888 ☑ -1888
Q 3879+3    T 3882  ☒ 3890 
Q 373+42    T 415   ☒ 405  
Q 39-1412   T -1373 ☒ -1473
Q 845+4175  T 5020  ☒ 4900 
Q 1848+274  T 2122  ☒ 2012 
Q 127-8831  T -8704 ☒ -8604
Q 928-6     T 922   ☑ 922  
Q 1702+843  T 2545  ☑ 2545 

--------------------------------------------------
Iteration 46
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 119ms/step - accuracy: 0.9199 - loss: 0.2478
Epoch 1: val_loss improved from 0.48923 to 0.48739, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 124ms/step - accuracy: 0.9199 - loss: 0.2479 - val_accuracy: 0.8322 - val_loss: 0.4874
Q 6+656     T 662   ☑ 662  
Q 93-87     T 6     ☑ 6    
Q 8+451     T 459   ☑ 459  
Q 3430+4190 T 7620  ☒ 8960 
Q 9081-5    T 9076  ☒ 9086 
Q 780-632   T 148   ☒ 142  
Q 2-45      T -43   ☑ -43  
Q 5136+12   T 5148  ☑ 5148 
Q 82-7575   T -7493 ☒ -7463
Q 22+135    T 157   ☑ 157  

--------------------------------------------------
Iteration 47
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 124ms/step - accuracy: 0.9241 - loss: 0.2394
Epoch 1: val_loss improved from 0.48739 to 0.48689, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.9241 - loss: 0.2394 - val_accuracy: 0.8309 - val_loss: 0.4869
Q 1-814     T -813  ☑ -813 
Q 27+26     T 53    ☒ 43   
Q 75+6982   T 7057  ☒ 7049 
Q 4+701     T 705   ☑ 705  
Q 221-35    T 186   ☑ 186  
Q 50-7117   T -7067 ☒ -7077
Q 51+7      T 58    ☑ 58   
Q 4799+5    T 4804  ☒ 4794 
Q 333+5579  T 5912  ☒ 5902 
Q 6-9977    T -9971 ☒ -9981

--------------------------------------------------
Iteration 48
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 118ms/step - accuracy: 0.9275 - loss: 0.2291
Epoch 1: val_loss improved from 0.48689 to 0.48322, saving model to best_model.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 122ms/step - accuracy: 0.9275 - loss: 0.2291 - val_accuracy: 0.8361 - val_loss: 0.4832
Q 3791-6804 T -3013 ☒ -4063
Q 53+4      T 57    ☑ 57   
Q 42+4509   T 4551  ☒ 4531 
Q 42-2      T 40    ☑ 40   
Q 978-91    T 887   ☑ 887  
Q 7390-258  T 7132  ☑ 7132 
Q 36+60     T 96    ☑ 96   
Q 33+2      T 35    ☑ 35   
Q 807-569   T 238   ☑ 238  
Q 6+957     T 963   ☑ 963  

--------------------------------------------------
Iteration 49
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 119ms/step - accuracy: 0.9294 - loss: 0.2228
Epoch 1: val_loss did not improve from 0.48322
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 128ms/step - accuracy: 0.9294 - loss: 0.2228 - val_accuracy: 0.8334 - val_loss: 0.4854
Q 159+9042  T 9201  ☒ 9631 
Q 9474+2468 T 11942 ☒ 12622
Q 897-726   T 171   ☒ 211  
Q 539+26    T 565   ☒ 573  


It can be seen that in the initial model (without attention), the accuracy was almost 80%, with a validation accuracy of nearly 73%. The model struggled with handling large numbers and had an inconsistent performance on basic arithmetic problems.

After adding the attention mechanism, the accuracy jumped to 92%, with a validation accuracy of 83%, and the loss decreased. These results indicate a significant improvement in performance, especially in terms of both accuracy and loss. The attention mechanism helped the model focus on the most relevant parts of the input sequence, enabling it to handle more complex arithmetic operations and improve its generalization capabilities.

In comparison, the attention-based model performed better across both training and validation sets, suggesting that the attention mechanism greatly enhanced the model's ability to solve arithmetic problems accurately.

## 1.4

In [None]:
# Defining the Model 3 architecture
def build_model_3(input_shape, output_shape, n_chars):
    inputs = Input(shape=input_shape)

    # Bidirectional LSTM - Capture both forward and backward context
    lstm_1 = Bidirectional(LSTM(config_dict['hidden_size'], return_sequences=True))(inputs)

    # Dropout layer to prevent overfitting
    lstm_1 = Dropout(0.2)(lstm_1)

    # Attention mechanism
    attention = Attention()([lstm_1, lstm_1])

    # Flatten the attention output and repeat to match the decoder input shape
    attention_flattened = Flatten()(attention)
    repeat_vector = RepeatVector(output_shape[0])(attention_flattened)

    # Second LSTM layer
    lstm_2 = Bidirectional(LSTM(config_dict['hidden_size'], return_sequences=True))(repeat_vector)

    # Dropout layer for regularization
    lstm_2 = Dropout(0.2)(lstm_2)

    # TimeDistributed Dense layer to generate the final predictions
    outputs = TimeDistributed(Dense(n_chars, activation='softmax'))(lstm_2)

    model_3 = Model(inputs=inputs, outputs=outputs)
    model_3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    return model_3

# Input shape: (maxlen, len(chars)), Output shape: (digits + 1, len(chars))
input_shape = (x_train.shape[1], x_train.shape[2])
output_shape = (y_train.shape[1], y_train.shape[2])

# Build and summarize the model
model_3 = build_model_3(input_shape, output_shape, len(config_dict['chars']))
model_3.summary()

# Training the model and saving the best model
checkpoint = ModelCheckpoint('best_model_3.h5',
                             monitor='val_loss',
                             save_best_only=True,
                             mode='min',
                             verbose=1)

for iteration in range(config_dict['iterations']):
    print()
    print('-' * 50)
    print('Iteration', iteration)

    model_3.fit(x_train, y_train,
                batch_size=config_dict['batch_size'],
                epochs=1,
                validation_data=(x_val, y_val),
                callbacks=[checkpoint])

    # Select 10 samples from the validation set at random for error visualization
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]

        preds = model_3.predict(rowx, verbose=0)
        preds = np.argmax(preds, axis=-1)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)

        print('Q', q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print('☑', end=' ')
        else:
            print('☒', end=' ')
        print(guess)


--------------------------------------------------
Iteration 0
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 15ms/step - accuracy: 0.3731 - loss: 1.8442
Epoch 1: val_loss improved from inf to 1.51187, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 18ms/step - accuracy: 0.3736 - loss: 1.8421 - val_accuracy: 0.4499 - val_loss: 1.5119
Q 6+390     T 396   ☒ 660  
Q 398-6     T 392   ☒ 333  
Q 60-85     T -25   ☒ -5   
Q 5469-2071 T 3398  ☒ -455 
Q 5403-4    T 5399  ☒ 445  
Q 3879-374  T 3505  ☒ 337  
Q 317-3     T 314   ☒ 333  
Q 7-7793    T -7786 ☒ -7777
Q 317+9510  T 9827  ☒ 1133 
Q 1179+70   T 1249  ☒ 1113 

--------------------------------------------------
Iteration 1
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.4557 - loss: 1.5067
Epoch 1: val_loss improved from 1.51187 to 1.37645, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 17ms/step - accuracy: 0.4557 - loss: 1.5066 - val_accuracy: 0.4894 - val_loss: 1.3764
Q 258-705   T -447  ☒ -22  
Q 6232+3293 T 9525  ☒ 1100 
Q 71+40     T 111   ☒ 11   
Q 372+4706  T 5078  ☒ 1100 
Q 9064-40   T 9024  ☒ 9905 
Q 429+616   T 1045  ☒ 112  
Q 201+45    T 246   ☒ 222  
Q 2+1396    T 1398  ☒ 2222 
Q 737+91    T 828   ☒ 110  
Q 8279+95   T 8374  ☒ 8804 

--------------------------------------------------
Iteration 2
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 12ms/step - accuracy: 0.4973 - loss: 1.3669
Epoch 1: val_loss improved from 1.37645 to 1.25505, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.4974 - loss: 1.3667 - val_accuracy: 0.5318 - val_loss: 1.2550
Q 9-843     T -834  ☒ -843 
Q 6-711     T -705  ☒ -711 
Q 384-4129  T -3745 ☒ -337 
Q 47-2019   T -1972 ☒ -2109
Q 805+6320  T 7125  ☒ 8622 
Q 779+816   T 1595  ☒ 1154 
Q 3183-755  T 2428  ☒ 3209 
Q 899-484   T 415   ☒ 33   
Q 38-8288   T -8250 ☒ -8801
Q 216+3     T 219   ☒ 223  

--------------------------------------------------
Iteration 3
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.5337 - loss: 1.2530
Epoch 1: val_loss improved from 1.25505 to 1.13612, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.5338 - loss: 1.2529 - val_accuracy: 0.5769 - val_loss: 1.1361
Q 3921+6565 T 10486 ☒ 1100 
Q 8220+33   T 8253  ☒ 8211 
Q 62+15     T 77    ☑ 77   
Q 3447-8908 T -5461 ☒ -3322
Q 0+114     T 114   ☒ 112  
Q 5512+76   T 5588  ☒ 5563 
Q 4-9846    T -9842 ☒ -9955
Q 522-24    T 498   ☒ 511  
Q 1661+8708 T 10369 ☒ 1222 
Q 4019-518  T 3501  ☒ 3393 

--------------------------------------------------
Iteration 4
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.5776 - loss: 1.1437
Epoch 1: val_loss improved from 1.13612 to 1.04549, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.5777 - loss: 1.1435 - val_accuracy: 0.6138 - val_loss: 1.0455
Q 86-19     T 67    ☒ 52   
Q 583+8     T 591   ☒ 585  
Q 2-5303    T -5301 ☒ -5308
Q 24+9229   T 9253  ☒ 9233 
Q 5+627     T 632   ☒ 623  
Q 821-6100  T -5279 ☒ -5597
Q 8+675     T 683   ☒ 678  
Q 587+5     T 592   ☒ 582  
Q 28-482    T -454  ☒ -470 
Q 115+2     T 117   ☒ 116  

--------------------------------------------------
Iteration 5
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.6129 - loss: 1.0467
Epoch 1: val_loss improved from 1.04549 to 0.93968, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.6130 - loss: 1.0466 - val_accuracy: 0.6482 - val_loss: 0.9397
Q 4257-5883 T -1626 ☒ -333 
Q 670+4     T 674   ☒ 666  
Q 289-9981  T -9692 ☒ -9000
Q 10-713    T -703  ☒ -702 
Q 9-5244    T -5235 ☒ -5244
Q 1556+9726 T 11282 ☒ 10000
Q 9095-3    T 9092  ☒ 9098 
Q 7162+176  T 7338  ☒ 7244 
Q 1218+32   T 1250  ☒ 1345 
Q 105+223   T 328   ☒ 266  

--------------------------------------------------
Iteration 6
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 14ms/step - accuracy: 0.6453 - loss: 0.9572
Epoch 1: val_loss improved from 0.93968 to 0.85078, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 16ms/step - accuracy: 0.6454 - loss: 0.9570 - val_accuracy: 0.6844 - val_loss: 0.8508
Q 853+4355  T 5208  ☒ 4998 
Q 3-30      T -27   ☒ -22  
Q 0-180     T -180  ☑ -180 
Q 8-844     T -836  ☒ -837 
Q 28-385    T -357  ☒ -363 
Q 88+4      T 92    ☑ 92   
Q 4-802     T -798  ☒ -797 
Q 3283-22   T 3261  ☒ 3255 
Q 5463-8    T 5455  ☒ 5555 
Q 490+629   T 1119  ☒ 1111 

--------------------------------------------------
Iteration 7
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.6746 - loss: 0.8770
Epoch 1: val_loss improved from 0.85078 to 0.78266, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.6747 - loss: 0.8767 - val_accuracy: 0.7101 - val_loss: 0.7827
Q 799-9396  T -8597 ☒ -8887
Q 7450-44   T 7406  ☒ 7444 
Q 51+291    T 342   ☒ 340  
Q 3971-4120 T -149  ☒ 220  
Q 812-475   T 337   ☒ 326  
Q 8+75      T 83    ☑ 83   
Q 5-605     T -600  ☑ -600 
Q 1642-9908 T -8266 ☒ -8336
Q 9498-5301 T 4197  ☒ 449  
Q 453-2     T 451   ☑ 451  

--------------------------------------------------
Iteration 8
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 15ms/step - accuracy: 0.7043 - loss: 0.7927
Epoch 1: val_loss improved from 0.78266 to 0.69932, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 16ms/step - accuracy: 0.7045 - loss: 0.7924 - val_accuracy: 0.7418 - val_loss: 0.6993
Q 4260+699  T 4959  ☒ 4700 
Q 4-8540    T -8536 ☒ -8545
Q 621-3814  T -3193 ☒ -3275
Q 1+988     T 989   ☒ 990  
Q 98-758    T -660  ☒ -677 
Q 711+9     T 720   ☒ 710  
Q 85+879    T 964   ☒ 965  
Q 152+402   T 554   ☒ 541  
Q 8278-21   T 8257  ☒ 8262 
Q 9122+4    T 9126  ☑ 9126 

--------------------------------------------------
Iteration 9
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 12ms/step - accuracy: 0.7330 - loss: 0.7164
Epoch 1: val_loss improved from 0.69932 to 0.61441, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.7331 - loss: 0.7162 - val_accuracy: 0.7775 - val_loss: 0.6144
Q 8-487     T -479  ☒ -489 
Q 967-1476  T -509  ☒ -77  
Q 75+22     T 97    ☒ 18   
Q 398-6     T 392   ☑ 392  
Q 641-4     T 637   ☑ 637  
Q 53-340    T -287  ☑ -287 
Q 2-319     T -317  ☑ -317 
Q 659-49    T 610   ☒ 600  
Q 1+5070    T 5071  ☒ 5070 
Q 17+2472   T 2489  ☑ 2489 

--------------------------------------------------
Iteration 10
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 14ms/step - accuracy: 0.7658 - loss: 0.6393
Epoch 1: val_loss improved from 0.61441 to 0.54373, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.7658 - loss: 0.6392 - val_accuracy: 0.8095 - val_loss: 0.5437
Q 5+588     T 593   ☑ 593  
Q 2766+1    T 2767  ☑ 2767 
Q 3460-21   T 3439  ☒ 3461 
Q 0-34      T -34   ☑ -34  
Q 67-1      T 66    ☑ 66   
Q 7-794     T -787  ☑ -787 
Q 2-4238    T -4236 ☒ -4246
Q 4+2       T 6     ☑ 6    
Q 8-0       T 8     ☑ 8    
Q 16-739    T -723  ☒ -728 

--------------------------------------------------
Iteration 11
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.7895 - loss: 0.5751
Epoch 1: val_loss improved from 0.54373 to 0.47800, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.7895 - loss: 0.5750 - val_accuracy: 0.8342 - val_loss: 0.4780
Q 2+832     T 834   ☑ 834  
Q 822+3     T 825   ☑ 825  
Q 18+52     T 70    ☒ 60   
Q 9+32      T 41    ☑ 41   
Q 417-8     T 409   ☑ 409  
Q 712+883   T 1595  ☒ 1655 
Q 69+4787   T 4856  ☑ 4856 
Q 9149+93   T 9242  ☒ 9222 
Q 6869+21   T 6890  ☒ 6880 
Q 29-87     T -58   ☒ -48  

--------------------------------------------------
Iteration 12
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.8194 - loss: 0.5038
Epoch 1: val_loss improved from 0.47800 to 0.42181, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.8195 - loss: 0.5037 - val_accuracy: 0.8521 - val_loss: 0.4218
Q 308+32    T 340   ☑ 340  
Q 1674-94   T 1580  ☑ 1580 
Q 799-9396  T -8597 ☒ -8595
Q 984+40    T 1024  ☒ 1034 
Q 9498-5301 T 4197  ☒ 459  
Q 4-621     T -617  ☑ -617 
Q 389+379   T 768   ☒ 758  
Q 357+1243  T 1600  ☒ 1611 
Q 8-7330    T -7322 ☑ -7322
Q 300+74    T 374   ☒ 384  

--------------------------------------------------
Iteration 13
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 14ms/step - accuracy: 0.8420 - loss: 0.4458
Epoch 1: val_loss improved from 0.42181 to 0.38189, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.8421 - loss: 0.4457 - val_accuracy: 0.8662 - val_loss: 0.3819
Q 3870+791  T 4661  ☑ 4661 
Q 24-8877   T -8853 ☑ -8853
Q 3+270     T 273   ☑ 273  
Q 158+533   T 691   ☒ 601  
Q 603+7885  T 8488  ☒ 8558 
Q 919+764   T 1683  ☒ 1653 
Q 717-328   T 389   ☒ 499  
Q 79+94     T 173   ☑ 173  
Q 67+8671   T 8738  ☑ 8738 
Q 351-1618  T -1267 ☒ -1088

--------------------------------------------------
Iteration 14
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 12ms/step - accuracy: 0.8566 - loss: 0.4020
Epoch 1: val_loss improved from 0.38189 to 0.34566, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.8567 - loss: 0.4019 - val_accuracy: 0.8770 - val_loss: 0.3457
Q 32+91     T 123   ☑ 123  
Q 86-35     T 51    ☑ 51   
Q 2219-7295 T -5076 ☒ -4986
Q 416+498   T 914   ☒ 904  
Q 606-95    T 511   ☑ 511  
Q 66-245    T -179  ☒ -189 
Q 5-3587    T -3582 ☑ -3582
Q 8741+9438 T 18179 ☒ 17409
Q 90+5659   T 5749  ☒ 5669 
Q 1222+4204 T 5426  ☒ 6226 

--------------------------------------------------
Iteration 15
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.8719 - loss: 0.3573
Epoch 1: val_loss improved from 0.34566 to 0.32175, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 17ms/step - accuracy: 0.8719 - loss: 0.3573 - val_accuracy: 0.8851 - val_loss: 0.3218
Q 3+4339    T 4342  ☑ 4342 
Q 13-913    T -900  ☒ -800 
Q 37+8      T 45    ☑ 45   
Q 9573+943  T 10516 ☒ 10326
Q 9064+5915 T 14979 ☒ 15049
Q 42-599    T -557  ☑ -557 
Q 552-80    T 472   ☒ 482  
Q 57-77     T -20   ☒ -10  
Q 7809+3    T 7812  ☑ 7812 
Q 5394-5253 T 141   ☒ -76  

--------------------------------------------------
Iteration 16
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 12ms/step - accuracy: 0.8804 - loss: 0.3276
Epoch 1: val_loss improved from 0.32175 to 0.30337, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.8804 - loss: 0.3276 - val_accuracy: 0.8914 - val_loss: 0.3034
Q 1+452     T 453   ☑ 453  
Q 721-16    T 705   ☒ 715  
Q 585-18    T 567   ☒ 577  
Q 8685+94   T 8779  ☑ 8779 
Q 74+8992   T 9066  ☒ 9056 
Q 2913-3603 T -690  ☒ -110 
Q 793+908   T 1701  ☒ 1681 
Q 22+323    T 345   ☒ 355  
Q 27+7012   T 7039  ☒ 7029 
Q 156-750   T -594  ☒ -616 

--------------------------------------------------
Iteration 17
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.8908 - loss: 0.2986
Epoch 1: val_loss did not improve from 0.30337
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.8908 - loss: 0.2986 - val_accuracy: 0.8890 - val_loss: 0.3128
Q 9+2       T 11    ☑ 11   
Q 200+52    T 252   ☑ 252  
Q 77+8224   T 8301  ☒ 8291 
Q 78+12     T 90    ☑ 90   
Q 8570



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.8974 - loss: 0.2792 - val_accuracy: 0.8955 - val_loss: 0.2771
Q 870+2     T 872   ☑ 872  
Q 1+7452    T 7453  ☑ 7453 
Q 31+4      T 35    ☑ 35   
Q 34+7      T 41    ☑ 41   
Q 8-64      T -56   ☑ -56  
Q 37-879    T -842  ☑ -842 
Q 395+2876  T 3271  ☒ 3261 
Q 473+4500  T 4973  ☒ 4943 
Q 31+2858   T 2889  ☒ 2899 
Q 7+624     T 631   ☑ 631  

--------------------------------------------------
Iteration 19
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9048 - loss: 0.2580
Epoch 1: val_loss improved from 0.27707 to 0.26546, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9048 - loss: 0.2580 - val_accuracy: 0.9003 - val_loss: 0.2655
Q 7-7793    T -7786 ☑ -7786
Q 0-92      T -92   ☑ -92  
Q 82+264    T 346   ☒ 356  
Q 6712-8485 T -1773 ☒ -1733
Q 5339-5    T 5334  ☑ 5334 
Q 322+23    T 345   ☒ 355  
Q 48+79     T 127   ☑ 127  
Q 44+850    T 894   ☑ 894  
Q 415+958   T 1373  ☒ 1383 
Q 1303-638  T 665   ☒ 515  

--------------------------------------------------
Iteration 20
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 14ms/step - accuracy: 0.9048 - loss: 0.2522
Epoch 1: val_loss did not improve from 0.26546
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 16ms/step - accuracy: 0.9048 - loss: 0.2522 - val_accuracy: 0.9000 - val_loss: 0.2676
Q 285-0     T 285   ☑ 285  
Q 50+950    T 1000  ☑ 1000 
Q 5135+211  T 5346  ☒ 5336 
Q 9-5063    T -5054 ☑ -5054
Q 96+8



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9121 - loss: 0.2377 - val_accuracy: 0.9061 - val_loss: 0.2496
Q 95-47     T 48    ☑ 48   
Q 977-8290  T -7313 ☒ -7323
Q 933+4346  T 5279  ☒ 5289 
Q 636-7     T 629   ☑ 629  
Q 416-5     T 411   ☑ 411  
Q 2401+41   T 2442  ☑ 2442 
Q 384-4129  T -3745 ☒ -3775
Q 53-10     T 43    ☑ 43   
Q 2+58      T 60    ☑ 60   
Q 4521-258  T 4263  ☒ 4373 

--------------------------------------------------
Iteration 22
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.9181 - loss: 0.2215
Epoch 1: val_loss improved from 0.24956 to 0.23982, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 16ms/step - accuracy: 0.9181 - loss: 0.2215 - val_accuracy: 0.9103 - val_loss: 0.2398
Q 93+571    T 664   ☑ 664  
Q 331+479   T 810   ☑ 810  
Q 740-519   T 221   ☒ 229  
Q 36+9182   T 9218  ☑ 9218 
Q 95+2413   T 2508  ☑ 2508 
Q 86+596    T 682   ☑ 682  
Q 66+8      T 74    ☑ 74   
Q 406-82    T 324   ☑ 324  
Q 4577-8800 T -4223 ☒ -4133
Q 66-739    T -673  ☒ -683 

--------------------------------------------------
Iteration 23
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9208 - loss: 0.2110
Epoch 1: val_loss improved from 0.23982 to 0.22854, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9208 - loss: 0.2109 - val_accuracy: 0.9135 - val_loss: 0.2285
Q 5712+4    T 5716  ☑ 5716 
Q 1044+469  T 1513  ☑ 1513 
Q 607+9     T 616   ☑ 616  
Q 1+28      T 29    ☑ 29   
Q 94+2      T 96    ☑ 96   
Q 25+6      T 31    ☑ 31   
Q 285-0     T 285   ☑ 285  
Q 0-4573    T -4573 ☑ -4573
Q 64+9085   T 9149  ☒ 9139 
Q 5+4094    T 4099  ☑ 4099 

--------------------------------------------------
Iteration 24
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 15ms/step - accuracy: 0.9243 - loss: 0.2004
Epoch 1: val_loss improved from 0.22854 to 0.22806, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 16ms/step - accuracy: 0.9243 - loss: 0.2004 - val_accuracy: 0.9153 - val_loss: 0.2281
Q 191-4     T 187   ☑ 187  
Q 8796-76   T 8720  ☑ 8720 
Q 50+2827   T 2877  ☑ 2877 
Q 56+83     T 139   ☑ 139  
Q 86-0      T 86    ☑ 86   
Q 2893+0    T 2893  ☑ 2893 
Q 6-584     T -578  ☑ -578 
Q 1376-5792 T -4416 ☒ -4306
Q 7-2773    T -2766 ☑ -2766
Q 3753+834  T 4587  ☒ 4497 

--------------------------------------------------
Iteration 25
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9290 - loss: 0.1895
Epoch 1: val_loss improved from 0.22806 to 0.22188, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9290 - loss: 0.1895 - val_accuracy: 0.9189 - val_loss: 0.2219
Q 1+988     T 989   ☑ 989  
Q 46+742    T 788   ☑ 788  
Q 1402-41   T 1361  ☑ 1361 
Q 362-867   T -505  ☒ -415 
Q 5929+527  T 6456  ☒ 6446 
Q 75+89     T 164   ☑ 164  
Q 37+43     T 80    ☑ 80   
Q 4-543     T -539  ☒ -549 
Q 11-11     T 0     ☒ 2    
Q 671+9565  T 10236 ☒ 10216

--------------------------------------------------
Iteration 26
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.9332 - loss: 0.1775
Epoch 1: val_loss improved from 0.22188 to 0.22152, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9332 - loss: 0.1775 - val_accuracy: 0.9168 - val_loss: 0.2215
Q 649-784   T -135  ☑ -135 
Q 899-7     T 892   ☑ 892  
Q 6+2292    T 2298  ☑ 2298 
Q 470+774   T 1244  ☒ 1144 
Q 6+8769    T 8775  ☑ 8775 
Q 993-381   T 612   ☒ 622  
Q 7-5868    T -5861 ☑ -5861
Q 1982-24   T 1958  ☒ 1968 
Q 3694-5200 T -1506 ☒ -1866
Q 5+1604    T 1609  ☑ 1609 

--------------------------------------------------
Iteration 27
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9361 - loss: 0.1707
Epoch 1: val_loss improved from 0.22152 to 0.21312, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9361 - loss: 0.1707 - val_accuracy: 0.9230 - val_loss: 0.2131
Q 154+18    T 172   ☑ 172  
Q 411+2497  T 2908  ☒ 2898 
Q 1+136     T 137   ☑ 137  
Q 9+517     T 526   ☑ 526  
Q 6663-0    T 6663  ☑ 6663 
Q 5176+3    T 5179  ☑ 5179 
Q 415+958   T 1373  ☑ 1373 
Q 4274+381  T 4655  ☒ 4645 
Q 7543-82   T 7461  ☑ 7461 
Q 581-35    T 546   ☑ 546  

--------------------------------------------------
Iteration 28
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9395 - loss: 0.1614
Epoch 1: val_loss did not improve from 0.21312
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9395 - loss: 0.1615 - val_accuracy: 0.9216 - val_loss: 0.2174
Q 3-870     T -867  ☑ -867 
Q 6718-9    T 6709  ☑ 6709 
Q 934+56    T 990   ☒ 980  
Q 9992-0    T 9992  ☑ 9992 
Q 8422



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.9441 - loss: 0.1511 - val_accuracy: 0.9220 - val_loss: 0.2124
Q 2913-3603 T -690  ☒ -710 
Q 863+817   T 1680  ☑ 1680 
Q 6+98      T 104   ☑ 104  
Q 18-91     T -73   ☑ -73  
Q 550+5     T 555   ☑ 555  
Q 2+524     T 526   ☑ 526  
Q 27+4851   T 4878  ☑ 4878 
Q 11+7009   T 7020  ☒ 7010 
Q 25-6111   T -6086 ☒ -6096
Q 8-154     T -146  ☑ -146 

--------------------------------------------------
Iteration 31
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 15ms/step - accuracy: 0.9439 - loss: 0.1509
Epoch 1: val_loss improved from 0.21237 to 0.20851, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 16ms/step - accuracy: 0.9439 - loss: 0.1509 - val_accuracy: 0.9258 - val_loss: 0.2085
Q 1+5877    T 5878  ☑ 5878 
Q 30-2      T 28    ☑ 28   
Q 2+1926    T 1928  ☑ 1928 
Q 659+3     T 662   ☑ 662  
Q 47-2019   T -1972 ☑ -1972
Q 1-1777    T -1776 ☑ -1776
Q 61+23     T 84    ☑ 84   
Q 8745+1237 T 9982  ☒ 9008 
Q 3712+36   T 3748  ☑ 3748 
Q 60-85     T -25   ☑ -25  

--------------------------------------------------
Iteration 32
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.9491 - loss: 0.1374
Epoch 1: val_loss improved from 0.20851 to 0.20094, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9491 - loss: 0.1374 - val_accuracy: 0.9262 - val_loss: 0.2009
Q 576+71    T 647   ☑ 647  
Q 15+3483   T 3498  ☒ 3508 
Q 10+3      T 13    ☑ 13   
Q 727-79    T 648   ☒ 658  
Q 2+83      T 85    ☑ 85   
Q 6366-0    T 6366  ☑ 6366 
Q 8200-3000 T 5200  ☑ 5200 
Q 0-180     T -180  ☑ -180 
Q 3389+187  T 3576  ☑ 3576 
Q 48+2453   T 2501  ☑ 2501 

--------------------------------------------------
Iteration 33
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9498 - loss: 0.1348
Epoch 1: val_loss improved from 0.20094 to 0.19916, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.9498 - loss: 0.1348 - val_accuracy: 0.9288 - val_loss: 0.1992
Q 1692-945  T 747   ☒ 867  
Q 1551-6833 T -5282 ☒ -5122
Q 960-0     T 960   ☑ 960  
Q 2-768     T -766  ☑ -766 
Q 780-1     T 779   ☑ 779  
Q 7580+1    T 7581  ☑ 7581 
Q 352+19    T 371   ☑ 371  
Q 1726-2    T 1724  ☑ 1724 
Q 3-341     T -338  ☑ -338 
Q 67-399    T -332  ☑ -332 

--------------------------------------------------
Iteration 34
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9524 - loss: 0.1280
Epoch 1: val_loss improved from 0.19916 to 0.19631, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.9524 - loss: 0.1280 - val_accuracy: 0.9287 - val_loss: 0.1963
Q 909-10    T 899   ☑ 899  
Q 3576+9    T 3585  ☑ 3585 
Q 747-1820  T -1073 ☒ -1173
Q 48+79     T 127   ☑ 127  
Q 2+2300    T 2302  ☑ 2302 
Q 785+1     T 786   ☑ 786  
Q 1+3568    T 3569  ☑ 3569 
Q 150+0     T 150   ☑ 150  
Q 70-7      T 63    ☑ 63   
Q 433+938   T 1371  ☑ 1371 

--------------------------------------------------
Iteration 35
[1m280/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 12ms/step - accuracy: 0.9546 - loss: 0.1239
Epoch 1: val_loss improved from 0.19631 to 0.19630, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9546 - loss: 0.1239 - val_accuracy: 0.9295 - val_loss: 0.1963
Q 3-2025    T -2022 ☑ -2022
Q 49-31     T 18    ☑ 18   
Q 46-33     T 13    ☑ 13   
Q 1535+4105 T 5640  ☒ 5650 
Q 2353-8270 T -5917 ☒ -6937
Q 3688-666  T 3022  ☒ 2922 
Q 6376-6    T 6370  ☑ 6370 
Q 2-768     T -766  ☑ -766 
Q 4890+1    T 4891  ☑ 4891 
Q 533-603   T -70   ☒ -40  

--------------------------------------------------
Iteration 36
[1m278/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 14ms/step - accuracy: 0.9562 - loss: 0.1184
Epoch 1: val_loss improved from 0.19630 to 0.19327, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 16ms/step - accuracy: 0.9562 - loss: 0.1184 - val_accuracy: 0.9309 - val_loss: 0.1933
Q 7-324     T -317  ☑ -317 
Q 5-63      T -58   ☑ -58  
Q 366+8     T 374   ☑ 374  
Q 5+3796    T 3801  ☑ 3801 
Q 3774+8    T 3782  ☑ 3782 
Q 867+7281  T 8148  ☒ 8158 
Q 51+291    T 342   ☑ 342  
Q 671-759   T -88   ☒ -98  
Q 237+4     T 241   ☑ 241  
Q 1-1777    T -1776 ☑ -1776

--------------------------------------------------
Iteration 37
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9556 - loss: 0.1186
Epoch 1: val_loss improved from 0.19327 to 0.18916, saving model to best_model_3.h5




[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.9556 - loss: 0.1186 - val_accuracy: 0.9304 - val_loss: 0.1892
Q 645+497   T 1142  ☑ 1142 
Q 9650+1495 T 11145 ☒ 10155
Q 1124-345  T 779   ☑ 779  
Q 37-801    T -764  ☑ -764 
Q 306-1     T 305   ☑ 305  
Q 1248-57   T 1191  ☑ 1191 
Q 0+7057    T 7057  ☑ 7057 
Q 3+246     T 249   ☑ 249  
Q 7434-63   T 7371  ☑ 7371 
Q 18+86     T 104   ☑ 104  

--------------------------------------------------
Iteration 38
[1m279/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 15ms/step - accuracy: 0.9608 - loss: 0.1072
Epoch 1: val_loss did not improve from 0.18916
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 16ms/step - accuracy: 0.9608 - loss: 0.1072 - val_accuracy: 0.9312 - val_loss: 0.1918
Q 4-983     T -979  ☒ -989 
Q 368-862   T -494  ☒ -504 
Q 12+9766   T 9778  ☑ 9778 
Q 476+77    T 553   ☑ 553  
Q 39+6



[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.9625 - loss: 0.1030 - val_accuracy: 0.9357 - val_loss: 0.1836
Q 3233-4    T 3229  ☑ 3229 
Q 198-3018  T -2820 ☒ -2920
Q 1-3451    T -3450 ☑ -3450
Q 5+875     T 880   ☑ 880  
Q 0+37      T 37    ☑ 37   
Q 5+550     T 555   ☑ 555  
Q 829-1674  T -845  ☑ -845 
Q 387+7     T 394   ☑ 394  
Q 0-10      T -10   ☑ -10  
Q 359-2     T 357   ☑ 357  

--------------------------------------------------
Iteration 43
[1m281/282[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 13ms/step - accuracy: 0.9658 - loss: 0.0924
Epoch 1: val_loss did not improve from 0.18361
[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - accuracy: 0.9658 - loss: 0.0924 - val_accuracy: 0.9345 - val_loss: 0.1933
Q 940+5330  T 6270  ☒ 6370 
Q 232-0     T 232   ☑ 232  
Q 666+764   T 1430  ☒ 1420 
Q 768-3     T 765   ☑ 765  
Q 2-33

The Improved Model (Model 3) shows significant improvements over model 2 including the attention, in terms of both training and validation performance. The training accuracy of Model 3 is about 97%, compared to the second model’s 93%.The validation accuracy of Model 3 is almost 94%, a 11% improvement over the second model’s 83%. This demonstrates that Model 3 generalizes much better to unseen data, likely due to the enhancements made in the architecture.

I made two main changes to improve the model. First, I used bidirectional LSTMs so the network is able to capture context from both past and future tokens in the input sequence. This allows the model to have a richer understanding of the sequence, which helps improve predictions.

Second, I add dropout to the model prevents it from overfitting to the training data. It forces the model to learn more robust features that generalize better to unseen data. This is reflected in the significant improvement in validation accuracy for Model 3.