# Third approach to sudoku solver using Long-Short Term Memory networks
As done previously we will skip the explanation of some of these familiar blocks of code. 

In this notebook, we explain the lstm approach and the results achieved.

In [2]:
import pandas as pd
import numpy as np
import keras
# import matplotlib.pyplot as plt
import time

## Loading the new x dataset
On this note, we load the saved new dataset on the previous notebook sudoku_algorithm.

In [84]:
new_data_x = np.load("new_data_x.npy")
new_data_x.shape

(5000, 10, 81)

In [85]:
from keras.utils import to_categorical

new_data_x = to_categorical(new_data_x, num_classes=10)
new_data_x.shape

(5000, 10, 81, 10)

## Loading the old y dataset

In [13]:
dt = pd.read_csv("sudoku_dataset/sudoku.csv")
print(dt.shape)

(1000000, 2)


In [9]:
from keras.utils import to_categorical

def preprocessing_data(data):
    array = [list(map(int,list(i))) for i in data]
    return to_categorical(array, num_classes=10)

In [86]:
train_size = 4900
test_size = 100
epochs = 10
batch = 32

In [87]:
new_data_x = np.reshape(new_data_x, (new_data_x.shape[0], new_data_x.shape[1], -1))
new_data_x.shape

(5000, 10, 810)

## Training and test data
In this code, we use the variable y as the solution obtained from the original dataset, while the x was from the modified version.

In [88]:
x_train = new_data_x[:train_size]
y_train = preprocessing_data(dt["solutions"][:train_size])

x_test = new_data_x[train_size:train_size+test_size]
y_test = preprocessing_data(dt["solutions"][train_size:train_size+test_size])

print(x_train.shape)
# print(y_train)

(4900, 10, 810)


## Training and testing
Here, different from the previous models we use an LSTM layer imported from keras. LSTM networks are one type of Recurrent Neural Network (RNN). RNNs are capable of finding information from a sequence of data. The LSTM networks are better in the search for information on a long period of time.

The first parameter in the LSTM layer is the output. The input is the number of steps and features. Then comes the fully connected layer with 810 units. After that, the reshape layer and the softmax activation function.

With this configuration, we reach 99% accuracy.

In [89]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, Reshape, Dropout, BatchNormalization, Activation

model = Sequential()
model.add(LSTM(810, input_shape=x_train.shape[1:]))
model.add(Dense(810))
model.add(Reshape((81,10)))
model.add(Activation('softmax'))

model.summary()

model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

tbCallBack = keras.callbacks.TensorBoard(log_dir='./logs/{}'.format(int(time.time())), histogram_freq=0, write_graph=True, write_images=True)

history = model.fit(x_train, y_train, validation_split=0.125, epochs=epochs, batch_size=batch, callbacks=[tbCallBack])

score = model.evaluate(x_test, y_test, batch_size=batch)
print(score)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_14 (LSTM)               (None, 810)               5252040   
_________________________________________________________________
dense_13 (Dense)             (None, 810)               656910    
_________________________________________________________________
reshape_13 (Reshape)         (None, 81, 10)            0         
_________________________________________________________________
activation_11 (Activation)   (None, 81, 10)            0         
Total params: 5,908,950
Trainable params: 5,908,950
Non-trainable params: 0
_________________________________________________________________
Train on 4287 samples, validate on 613 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[0.0034411631990224124, 0.9909876561164856]


## Results
We test the results with the first case of test data.

In [90]:
x_train_1 = np.reshape(x_test[0], (1,10,810))
y_test_1 = model.predict(x_train_1)
y_test_1 = np.argmax(y_test_1[0], axis=1)

We print all digits that are different from the solution. None digits was printed.

In [91]:
for i,j in zip(y_test_1,np.argmax(y_test[0], axis=1)):
    if i!=j:
        print(i,j)

## Conclusion
This approach was perfect to solve the sudoku puzzle. We had to create a new dataset to teach the network how to solve the quizzes. But it took just 5000 games and 10 epochs to the neural network to learn.