In [6]:
from PsQ_BinClassification import *
import numpy as np 

# The Syntax of Sudoku (Grids), Part I 


AlexPfaff, June-Aug 2025  
nonintersective@gmail.com 

<br>

## Binary Classification: what vs. what? 

At the outset, the task illustrated in this notebook appears to be a textbook example of binary classification. We train models on datasets comprised of valid and invalid Sudou grids, suggesting the model is to learn Sudoku rules and predict whether or not a given grid is valid. Notice, however, that such an endeavour has some serious flaws already in its infancy: given classical 9 x 9 Sudoku, a decent model model should classify $6.7 * 10^{21}$ grids out of $9^{81}$ $\sim$ $1.9 * 10^{77}$ as True and all others as False (this is a necessary, and for several intents and purposes, a sufficient condition). Moreover, since it is a discrete problem and there is no noise in the data (i.e. no uncertainties as to validity), we should aim for no less than 100% accuracy. But given the staggering amount we would need to test - there are only around 1000 times more atoms in the known universe than there are possible grids ($\rightarrow$ combinatorial explosion!) - let's be a bit more modest and start out more locally. 

What's happening here: 
* 8 models (identical architecture, see summary below), $m_0, m_1 ... m_7$ are trained on 8 datasets $D_0, D_1 ... D_7$ 

* $D_0-D_6$ contain 50% valid Sudoku grids (label 1) and 50% invalid grids (label 0) 

* $D_0-D_4$ contain **horizontal** permutation series of the same permutation family as the valid component (horizontal permutation $=$ label permutation: the digits ${i_1}, {i_2} ... {i_9}$ in some grid are mapped to ${j_1}, {j_2} ... {j_9}$ $-$ $i_n$ and $j_n$ may or may not be distinct; there are 9! = 362,880 possible mappings). 

* $D_5$ contains as the valid component a **vertical** permutation series belonging to the same permutation family as $D_0-D_4$ (vertical permutation $=$ grid permutation $=$ row/column swappings that produce a new valid grid; there are $3!^4$ x $3!^4$ $=$ $1296$ x $1296$ $=$ 1,679,616 valid grid permutations)

* $D_6$ contains as the valid component a **random** selection of different permutation series (in the present case: 7 series)

* The invalid component in each dataset constitutes a random selection of (i) arbitrary random integer grids and (ii) sequential overflow of the respective valid grids (see GridCollection: .makeFalseGrids_fromCurrent_seq()). This way we ensure the invalid components have comparable properties, and we can examine how & to what extent the valid components in $D_0-D_6$ differ more closely. 

* $D_8$ contains 50% random integer grids (label 1), and 50% 'invalid grids' (label 0) -- the chance that the former actually contain a valid grid are astronomically slim with $6.7 * 10^{21} : 9^{81}$ $\sim$ $6.7 * 10^{21} : 1.9 * 10^{77}$ $\sim$ $\frac{1}{10^{56}}$. This dataset is pretty much trash and the label 0 for the 'invalid' grids is pretty much meaningless. These data serve as as a control group, as it were, we expect random distributions, meaning the 'accuracy' should be in the neighbourhood of  50 : 50 (after all, what exactly should the model learn?). Both $m_8$ and $D_8$ receive the label '**weird**' in this notebook. 

* Every dataset is split into train/test data. <br><br>  

Datasets are generated by GridCollection; this module allows to to customize your datasets to an unusually high degree:
* determine specific properties of the valid data, e.g  belonging to some geometric class (horizontal/vertical permutations) or randomly belonging to different classes
* specify the ratio valid data : invalid data as precisely 50 : 50 - or sth else
* specify in which way the invalid data are invalid: preserving cardinality but violating Sudoku rules one way or another, violating cardinality at a specific rate or randomly. It is even possible to use valid grids belonging to a different permutation series ('guest grids') $-$ for this reason, the respective component in GridCollection is more generically referred to as 'GarbageCollectio' rather than 'false Grids'
* provides train/test-split methods for different tasks: binary classification, multiclass classification, puzzle learning
* ... plus much more!
<br><br> <br>

## Preview of coming attractions
At first glance, the models do an amzing job achieving an accuracy of 99-100%, but a closer look reveals that they do not necessarily learn Sudoku. 

1. In particular, what models $m_1 - m_4$ seem to learn is the respective horizontal permutation series itself, or, mor to the point, **the underlying ABC-grid** (= prototype or internal value distribution or 'deep structure' in linguistic terminology), which suggests that this kind of constellation can be learned at an insane accuracy (= 100% at a loss rate of e-08). This is perhaps not so surprising assuming that the value distribution is a geometric property that can easily be identified by a convolutional layer. This also means that the datasets are exhaustive: the only valid members (label 1) of such a series are distributed across the training and test sets of datasets $D_0-D_4$, respectively. Thus, anything else - including valid grids belonging to different permutation series - will be (expected to be) classified as False (label 0). A number of observations supports this view:
* models $m_0-m_4$ achieve 100% accuracy on both training and test sets from their respective datasets $D_0-D_4$
* on all other datasets (including other horizontal series), those models predict around 0% of the data (actually labeled 'True'/1) as True, but pretty much 100% of the false data as 'False'/0 (which means that they pretty much classify everything outside their own dataset as 'False'). 
* In a multiclass classification task on a dataset comprising several horizontal series, the trained model likewise achieves 100% accuracy, see notebook 'psq_multi_clf.ipynb' at https://github.com/A-Lex-McLee/PseudoQ-2.1/tree/main 
  
NB: predicting on 'foreign' datasets is illustrated in Section 'Crossevaluation'. Under the assumption elaborated in point 1., the expectation is that there is no 'new data' that should be recognized as 'True' that is not already contained in the 'native' dataset; hence the label " Falsely predicted as 'true': ... "   


2. It is less clear what exactly model $m_5$ (trained on a vertical series) learns; it achieves almost 100% accuracy on its own (training/test) dataset(s), but it also classifies between 40% and 50% of the 'valid' data in other datasets as 'True' (and close to 100% of the invalid data as 'False'/0). If it had learned the vertical permutation series, the 'True' predictions should be close to zero (there would be exactly one match/overlap with each horizontal series in the same permutation family, i.e. $D_0-D_4$). On the other hand, if it had learned the rules of Sudoku, the 'True' predictions should be close to 100% because those are valid Sudoku grids after all.   


3. Model $m_6$ (trained on a random collection) seems to be the most promising approach to tackle the big question: valid vs. invalid Sudoku grids. It achieves almost 100% on its own dataset as well as on the horizontal and the vertical series. There is one caveat, though: we have not checked whether (subsets of) this random collection is not accidentally contained in the permutation family of the base set; a permutation family is comprised of all horizontal series of all grids in the vertical permutation series times two for adding matrix transpositions ($\text{grid}$ + $\text{grid}^T$):   
362800 x 1679616 x 2 (= ) $=$ 1,218,998,108,160 $\sim$ 1.2 * $10^{12}$ grids. Thus there is plenty of scope for random overlaps, which may have an impact on the (high) score in any particular test series.  

NB: GridCollection: .activate_randomSeries() actually allows to control for accidental overlap (checkContainment = True), but since it is somewhat time-consuming, we have not checked here. Just keep it in mind.


4. As expected, model $m_7$ behaves randomly: accuracy on its own dataset ca. 50%; predictions on the other datasets: between 37% and 53%  for 'True' data and between 50% and 55% for 'False'. The other models predict on $D_7$ pretty much everything as 'False'. 
<br><br>
<br><br>

For more discussion, see 'The Syntax of Sudoku.pdf'; see module GridCollection for creation of datasets;   
https://github.com/A-Lex-McLee/PseudoQ-2.1/tree/main  



In [None]:
# instantiate a Grid_NN_Classifier object; no one-hot encoding
bin_classifier = Binary_CNN_Classifier() 

## Train horizontal series (models $m_0$-m_4)

In [None]:
# fit 5 distinct horizontal series
bin_classifier.fit()

Generate a (valid) random grid, and produce 3!^8 = 1679616 permutations (= vertical series)


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 262.21it/s]





100%|██████████| 5/5 [00:13<00:00,  2.60s/it]


0it [00:00, ?it/s]


Fitting model 0
-----------

Epoch 1/10
1588/1588 - 27s - 17ms/step - accuracy: 0.9921 - loss: 0.0202 - val_accuracy: 1.0000 - val_loss: 6.0980e-06 - learning_rate: 0.0010
Epoch 2/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 2.8629e-06 - val_accuracy: 1.0000 - val_loss: 1.3874e-06 - learning_rate: 0.0010
Epoch 3/10

Epoch 3: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 6.2978e-07 - val_accuracy: 1.0000 - val_loss: 6.5792e-07 - learning_rate: 0.0010
Epoch 4/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 2.6218e-07 - val_accuracy: 1.0000 - val_loss: 3.1254e-07 - learning_rate: 5.0000e-04
Epoch 5/10

Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 1.4440e-07 - val_accuracy: 1.0000 - val_loss: 1.8362e-07 - learning_rate: 5.0000e-04
Epoch 6/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 7.7633e-08 

1it [04:19, 259.76s/it]



Fitting model 1
-----------

Epoch 1/10
1588/1588 - 27s - 17ms/step - accuracy: 0.9950 - loss: 0.0153 - val_accuracy: 1.0000 - val_loss: 3.0478e-05 - learning_rate: 0.0010
Epoch 2/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 3.7365e-06 - val_accuracy: 1.0000 - val_loss: 1.5309e-06 - learning_rate: 0.0010
Epoch 3/10

Epoch 3: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 6.5029e-07 - val_accuracy: 1.0000 - val_loss: 4.6590e-07 - learning_rate: 0.0010
Epoch 4/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 2.4656e-07 - val_accuracy: 1.0000 - val_loss: 1.9851e-07 - learning_rate: 5.0000e-04
Epoch 5/10

Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 9.5451e-08 - val_accuracy: 1.0000 - val_loss: 1.1929e-07 - learning_rate: 5.0000e-04
Epoch 6/10
1588/1588 - 26s - 17ms/step - accuracy: 1.0000 - loss: 4.9518e-08

2it [08:42, 261.66s/it]



Fitting model 2
-----------

Epoch 1/10
1588/1588 - 27s - 17ms/step - accuracy: 0.9921 - loss: 0.0244 - val_accuracy: 1.0000 - val_loss: 1.8946e-05 - learning_rate: 0.0010
Epoch 2/10
1588/1588 - 26s - 17ms/step - accuracy: 1.0000 - loss: 5.6287e-06 - val_accuracy: 1.0000 - val_loss: 3.2670e-06 - learning_rate: 0.0010
Epoch 3/10

Epoch 3: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
1588/1588 - 26s - 17ms/step - accuracy: 1.0000 - loss: 1.5041e-06 - val_accuracy: 1.0000 - val_loss: 1.2349e-06 - learning_rate: 0.0010
Epoch 4/10
1588/1588 - 26s - 17ms/step - accuracy: 1.0000 - loss: 6.4157e-07 - val_accuracy: 1.0000 - val_loss: 8.2992e-07 - learning_rate: 5.0000e-04
Epoch 5/10

Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
1588/1588 - 26s - 17ms/step - accuracy: 1.0000 - loss: 3.4675e-07 - val_accuracy: 1.0000 - val_loss: 5.7381e-07 - learning_rate: 5.0000e-04
Epoch 6/10
1588/1588 - 26s - 17ms/step - accuracy: 1.0000 - loss: 1.9128e-07

3it [13:03, 261.39s/it]



Fitting model 3
-----------

Epoch 1/10
1588/1588 - 26s - 16ms/step - accuracy: 0.9903 - loss: 0.0285 - val_accuracy: 1.0000 - val_loss: 1.2171e-05 - learning_rate: 0.0010
Epoch 2/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 3.7162e-06 - val_accuracy: 1.0000 - val_loss: 2.6240e-06 - learning_rate: 0.0010
Epoch 3/10

Epoch 3: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 1.0389e-06 - val_accuracy: 1.0000 - val_loss: 1.1053e-06 - learning_rate: 0.0010
Epoch 4/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 3.9612e-07 - val_accuracy: 1.0000 - val_loss: 6.5050e-07 - learning_rate: 5.0000e-04
Epoch 5/10

Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 2.0187e-07 - val_accuracy: 1.0000 - val_loss: 3.6675e-07 - learning_rate: 5.0000e-04
Epoch 6/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 1.1279e-07

4it [17:21, 260.00s/it]



Fitting model 4
-----------

Epoch 1/10
1588/1588 - 27s - 17ms/step - accuracy: 0.9931 - loss: 0.0224 - val_accuracy: 1.0000 - val_loss: 1.0335e-05 - learning_rate: 0.0010
Epoch 2/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 3.4717e-06 - val_accuracy: 1.0000 - val_loss: 2.4826e-06 - learning_rate: 0.0010
Epoch 3/10

Epoch 3: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 9.2568e-07 - val_accuracy: 1.0000 - val_loss: 8.2630e-07 - learning_rate: 0.0010
Epoch 4/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 3.6780e-07 - val_accuracy: 1.0000 - val_loss: 6.8299e-07 - learning_rate: 5.0000e-04
Epoch 5/10

Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 2.0728e-07 - val_accuracy: 1.0000 - val_loss: 3.9386e-07 - learning_rate: 5.0000e-04
Epoch 6/10
1588/1588 - 26s - 16ms/step - accuracy: 1.0000 - loss: 1.1125e-07

5it [21:40, 260.09s/it]







## Train vertical series (model $m_5$)

In [None]:
# fit vertical series: dataset 5 / model 5
bin_classifier.fit_verticalSeries()

Epoch 1/10
7349/7349 - 117s - 16ms/step - accuracy: 0.9891 - loss: 0.0255 - val_accuracy: 1.0000 - val_loss: 1.1855e-04
Epoch 2/10
7349/7349 - 116s - 16ms/step - accuracy: 0.9998 - loss: 5.9218e-04 - val_accuracy: 1.0000 - val_loss: 5.1497e-05
Epoch 3/10
7349/7349 - 115s - 16ms/step - accuracy: 1.0000 - loss: 1.5956e-04 - val_accuracy: 1.0000 - val_loss: 2.1581e-04
Epoch 4/10
7349/7349 - 117s - 16ms/step - accuracy: 1.0000 - loss: 1.2189e-04 - val_accuracy: 1.0000 - val_loss: 9.9985e-05
Epoch 5/10
7349/7349 - 117s - 16ms/step - accuracy: 1.0000 - loss: 1.9360e-05 - val_accuracy: 1.0000 - val_loss: 3.5611e-05
Epoch 6/10
7349/7349 - 116s - 16ms/step - accuracy: 1.0000 - loss: 5.8418e-05 - val_accuracy: 1.0000 - val_loss: 6.4346e-05
Epoch 7/10
7349/7349 - 117s - 16ms/step - accuracy: 1.0000 - loss: 1.4024e-04 - val_accuracy: 1.0000 - val_loss: 8.7627e-05
Epoch 8/10
7349/7349 - 117s - 16ms/step - accuracy: 0.9999 - loss: 3.1972e-04 - val_accuracy: 1.0000 - val_loss: 2.3255e-04
Epoch 9/10
7

## Train random series  (model $m_6$)

In [None]:
# fit random series: dataset 6 / model 6
bin_classifier.fit_randomSeries()

Generating guest grids, series 1: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 271.45it/s]


Current size: 71428
Generating guest grids, series 2: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 271.72it/s]


Current size: 142856
Generating guest grids, series 3: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 272.16it/s]


Current size: 214284
Generating guest grids, series 4: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 272.38it/s]


Current size: 285712
Generating guest grids, series 5: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 271.73it/s]


Current size: 357140
Generating guest grids, series 6: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 271.90it/s]


Current size: 428568
Generating guest grids, series 7: 


Generating 1296 X 1296 Grid Permutations: 100%|██████████| 1296/1296 [00:04<00:00, 272.36it/s]


Current size: 499996


Checking for duplicates: 100%|██████████| 499996/499996 [00:01<00:00, 331282.67it/s]
Generating missing grids: 100%|██████████| 1679616/1679616 [00:05<00:00, 280275.42it/s]
Adding missing grids: 100%|██████████| 10663/10663 [00:00<00:00, 596803.58it/s]


Epoch 1/10
2188/2188 - 35s - 16ms/step - accuracy: 0.9534 - loss: 0.0997 - val_accuracy: 0.9995 - val_loss: 0.0017
Epoch 2/10
2188/2188 - 34s - 16ms/step - accuracy: 0.9996 - loss: 0.0013 - val_accuracy: 0.9999 - val_loss: 3.6265e-04
Epoch 3/10
2188/2188 - 34s - 16ms/step - accuracy: 0.9999 - loss: 4.6535e-04 - val_accuracy: 0.9999 - val_loss: 2.0450e-04
Epoch 4/10
2188/2188 - 34s - 16ms/step - accuracy: 0.9999 - loss: 1.8777e-04 - val_accuracy: 0.9999 - val_loss: 1.6752e-04
Epoch 5/10
2188/2188 - 34s - 16ms/step - accuracy: 0.9999 - loss: 2.5556e-04 - val_accuracy: 0.9998 - val_loss: 0.0011
Epoch 6/10
2188/2188 - 35s - 16ms/step - accuracy: 1.0000 - loss: 2.4926e-05 - val_accuracy: 1.0000 - val_loss: 5.5833e-05
Epoch 7/10
2188/2188 - 34s - 16ms/step - accuracy: 1.0000 - loss: 1.4068e-07 - val_accuracy: 1.0000 - val_loss: 8.8638e-05
Epoch 8/10
2188/2188 - 35s - 16ms/step - accuracy: 1.0000 - loss: 3.0613e-08 - val_accuracy: 1.0000 - val_loss: 1.1436e-04
Epoch 9/10
2188/2188 - 35s - 16m

## Train 'weird' series  (model $m_7$)

In [None]:
# fit 'weird' series -- no valid grids at all: dataset 7 / model 7
bin_classifier.fit_weirdSeries()

Epoch 1/10
2188/2188 - 36s - 16ms/step - accuracy: 0.5001 - loss: 0.6985 - val_accuracy: 0.4999 - val_loss: 0.6932
Epoch 2/10
2188/2188 - 35s - 16ms/step - accuracy: 0.4999 - loss: 0.6932 - val_accuracy: 0.4998 - val_loss: 0.6932
Epoch 3/10
2188/2188 - 36s - 16ms/step - accuracy: 0.5003 - loss: 0.6932 - val_accuracy: 0.4999 - val_loss: 0.6932
Epoch 4/10
2188/2188 - 35s - 16ms/step - accuracy: 0.5002 - loss: 0.6932 - val_accuracy: 0.4999 - val_loss: 0.6932
Epoch 5/10
2188/2188 - 35s - 16ms/step - accuracy: 0.5001 - loss: 0.6932 - val_accuracy: 0.4989 - val_loss: 0.6932
Epoch 6/10
2188/2188 - 36s - 16ms/step - accuracy: 0.5007 - loss: 0.6932 - val_accuracy: 0.4980 - val_loss: 0.6932
Epoch 7/10
2188/2188 - 35s - 16ms/step - accuracy: 0.5015 - loss: 0.6932 - val_accuracy: 0.4993 - val_loss: 0.6932
Epoch 8/10
2188/2188 - 37s - 17ms/step - accuracy: 0.5018 - loss: 0.6932 - val_accuracy: 0.4999 - val_loss: 0.6932
Epoch 9/10
2188/2188 - 36s - 16ms/step - accuracy: 0.5017 - loss: 0.6932 - val_a

## Evaluate models on their own datasets

In [12]:
# evaluate respective models on their own dataset
bin_classifier.evaluateAll()

Model 0: -- horizontal 

[1m15876/15876[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 988us/step

15876/15876 - 17s - 1ms/step - accuracy: 1.0000 - loss: 1.6536e-08

Correctly predicted (train): 508031  out of  508031  ==  100.0%
[1m6805/6805[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 947us/step

6805/6805 - 8s - 1ms/step - accuracy: 1.0000 - loss: 7.2896e-08

Correctly predicted (test): 217729  out of  217729  ==  100.0%
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 


Model 1: -- horizontal 

[1m15876/15876[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 927us/step

15876/15876 - 17s - 1ms/step - accuracy: 1.0000 - loss: 9.6782e-09

Correctly predicted (train): 508031  out of  508031  ==  100.0%
[1m6805/6805[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 894us/step

6805/6805 - 7s - 1ms/step - accuracy: 1.0000 - loss: 2.9506e-07

Correctly predicted (test): 217729  out of  217729  ==  100.0%
= = = = = = = = = = = = = = = = = =

## Crossevaluation 

Take model $m_x$ (trained on dataset $D_x$) to predict on datasets $D_y$ (x != y) 


### Crossevaluate $m_1$ (horizontal )

In [13]:
# evaluate comparatively: horizontal vs. the rest
bin_classifier.cross_evaluate(1)

Model 1 -- horizontal:

Testing dataset 0 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 926us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 902us/step

11340/11340 - 12s - 1ms/step - accuracy: 0.0000e+00 - loss: 26.2429
11340/11340 - 11s - 1ms/step - accuracy: 1.0000 - loss: 2.7206e-07

Falsely predicted as 'true': 0  out of  362880  ==  0.0%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Correctly predicted as 'false': 362880  out of  362880  ==  100.0%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Testing dataset 2 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 922us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 914us/step

11340/11340 - 12s - 1ms/step - accuracy: 3.8580e-05 - loss: 24.3605
11340/11340 - 12s - 1ms/step - accuracy: 1.0000 - loss: 5.5735e-07

Falsely predicted as 'true': 14  out of  36288

### Crossevaluate $m_5$ (vertical)

In [14]:
# evaluate comparatively: vertical vs. the rest
bin_classifier.cross_evaluate(5)

Model 5 -- vertical:

Testing dataset 0 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 853us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 893us/step

11340/11340 - 11s - 995us/step - accuracy: 0.4512 - loss: 3.7236
11340/11340 - 11s - 1ms/step - accuracy: 1.0000 - loss: 1.1115e-04

Falsely predicted as 'true': 163747  out of  362880  ==  45.123999999999995%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Correctly predicted as 'false': 362868  out of  362880  ==  99.997%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Testing dataset 1 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 843us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 853us/step

11340/11340 - 11s - 982us/step - accuracy: 0.4629 - loss: 3.3113
11340/11340 - 11s - 984us/step - accuracy: 0.9983 - loss: 0.0086

Falsely predicted as 'true': 167968

### Crossevaluate $m_6$ (random)

In [15]:
# evaluate comparatively: random vs. the rest
bin_classifier.cross_evaluate(6)

Model 6 -- random:

Testing dataset 0 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 890us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 889us/step

11340/11340 - 11s - 979us/step - accuracy: 1.0000 - loss: 9.8257e-09
11340/11340 - 12s - 1ms/step - accuracy: 1.0000 - loss: 2.8602e-04

Falsely predicted as 'true': 362880  out of  362880  ==  100.0%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Correctly predicted as 'false': 362863  out of  362880  ==  99.995%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Testing dataset 1 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 837us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 852us/step

11340/11340 - 11s - 963us/step - accuracy: 1.0000 - loss: 5.7521e-08
11340/11340 - 11s - 972us/step - accuracy: 0.9991 - loss: 0.0047

Falsely predicted as 'true': 362880  out o

### Crossevaluate $m_7$ (weird)

In [16]:
# evaluate comparatively: weird vs. the rest
bin_classifier.cross_evaluate(7)

Model 7 -- weird:

Testing dataset 0 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 918us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 862us/step

11340/11340 - 12s - 1ms/step - accuracy: 0.3794 - loss: 0.6999
11340/11340 - 11s - 1ms/step - accuracy: 0.5237 - loss: 0.6884

Falsely predicted as 'true': 137659  out of  362880  ==  37.935%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Correctly predicted as 'false': 190053  out of  362880  ==  52.373999999999995%
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

Testing dataset 1 -- horizontal:

[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 812us/step
[1m11340/11340[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 818us/step

11340/11340 - 11s - 951us/step - accuracy: 0.3939 - loss: 0.6992
11340/11340 - 11s - 960us/step - accuracy: 0.5485 - loss: 0.6890

Falsely predicted as 'true': 142927  out of  3

## ToDos 

* try different models
* try different datasets  
    - larger datasets, 
    - different series, 
    - different composition of 'valid' component, 
    - different 'True' : 'False' ratio, 
    - different kinds of 'invalid' grids  
