<a href="https://colab.research.google.com/github/JamieBali/hopfieldSudokuSolver/blob/main/hopfieldSudokuSolver.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
We are creating a Neural Network that can solve Sudoku prolems. We will begin by solving a simpler 4x4 sudoku puzzle as a test to implement the system, before we implement the full 9x9 sudoku. We will figure out how we can extend upon this later.

\> We could vary sizes (eg. 16x16, 25x25)

\> We could vary rules (eg. Knight's Puzzle, King's Puzzle, Killer Sudoku)

\> We could implement other solvers and compare them (eg. Algorithmic Solving, Convultional NN, Feed-Forward NN)

In [None]:
import numpy as np
import pandas as pd
import math
import torch
import torch.nn as nn



# Setting up the Puzzles

A Sudoku puzzle, the way we see it, has a number of square tiles, each of which contains a number from 1 to 9. <br>
Our Neural Network, however, needs a binary representation of the grid. We can easily do this by adding a 3rd dimension to the grid. 


> if $X(i,j) = 0$,  $V(i,j,k) = 0$ for all $k$
> 
> if $X(i,j) = k \neq 0, V(i,j,k) = 1$

(Hopfield, 2008)

This means that for our testing grid size (4x4), the size of our 3-dimensional array will be $(4,4,4)$ or $64$.

The dataset we will be using has all the Sudoku puzzles in a human-friendly format, so we have created a function below to convert the puzzles into a binary form, anda  second to conver the binary solution back to a readable format.

In [48]:
def networkFormat(grid, size):
  # we need a binary representation of the grid in order to put it through a neural network.
  # since we get the data in with integers up to 9 in each slot on the grid, we must construct a binary, 3-dimensional matrix to represent our puzzle.
  puzzle = np.zeros((size,size,size))
  for x in range(0, size):
    for y in range(0,size):
      if grid[x][y] != 0:
        temp = int(grid[x][y])
        puzzle[x][y][temp-1] = 1
  return puzzle

###
#
# Because of the way lists are actually displayed in python, the grid gets rotated when printed.
#
# [1,2]                  [1,3]
# [3,4]   would become   [2,4]
#
# we could flip the data, but it shouldn't matter as long as we are consistent.
#
###

def readableFormat(grid, size):
  # we also need a function to get the neural network format and turn it back into a readable human format.
  grid = torch.reshape(grid,(size,size,size))
  puzzle = np.zeros((size,size))
  for x in range(0, size):
    for y in range(0,size):
      temp = 0
      found = False
      for k in range(0,size):
        if grid[x][y][k] == 1:
          found = True
          temp = int(k)
      if found:
        puzzle[x][y] = temp + 1  
      else:
        puzzle[x][y] = 0
  return puzzle

# Creating the Energy Function

For us to be able to use a hopfield network to solve sudoku, we need to start by constructing an energy function which we will minimise. 

The binary rules of a sudoku solution are:

> $V(i,j,k) = 0$ or $1$ for all $i,j,k$
>
> $\sum_{i}V(i,j,k) = 1$ for all $j,k$
>
> $\sum_{j}V(i,j,k) = 1$ for all $i,k$
>
> $\sum_{k}V(i,j,k) = 1$ for all $i,j$
>
> $\sum_{i,j}V(i,j,k) = 1$ for all $k$, with the sum on $i$ and $j$ taken over one of the 3x3 $i,j$ squares bounded by thicker lines.

(Hopfield, 2008)

This means (as according to the rules of sudoku), each row, column, and sqaure can have the numbers 1 through 9 only once, as otherwise it will violate the constraints. Additionally, it is constrained such that each individual square on the board may only contain one number.

In [49]:
def getGridValue(grid, size):
  ###
  # as subject to the above constraints, we can get the value of a solution by running a grid through the described sums and accumulating a total value.
  # For a completed Sudoku puzzle, the total value should be equal to 4(gridSize^2).
  ###
  totalSum = 0

  # sum across i for all j,k (each number appears in each row once and only once)
  # k is equal to the value in the grid - 1, since it begins indexing at 0
  for k in range(0,size):
    for j in range(0,size): 
      sum = 0
      for i in range(0,size):
        sum += int(grid[i][j][k])
      if sum == 1:
        totalSum += 1

  # if the sum across i is correct, totalSum should now be size^2
  # print("Optimal: " + str(size*size) + " | Actual: " + str(totalSum))
  
  # sum across j for all i,k (each number appears in each column once and only once)
  # k is equal to the value in the grid - 1, since it begins indexing at 0
  for k in range(0,size):
    for i in range(0,size): 
      sum = 0
      for j in range(0,size):
        sum += int(grid[i][j][k])
      if sum == 1:
        totalSum += 1

  # if the sum across i is correct, totalSum should now be 2(size^2)
  # print("Optimal: " + str(size*size*2) + " | Actual: " + str(totalSum))

  # sum across k for all i,j (confirms that every tile on the grid contains a number and isn't still 0)
  # k is equal to the value in the grid - 1, since it begins indexing at 0
  for i in range(0,size):
    for j in range(0,size): 
      sum = 0
      for k in range(0,size):
        sum += int(grid[i][j][k])
      if sum == 1:
        totalSum += 1

  # if the sum across i is correct, totalSum should now be 3(size^2)
  # print("Optimal: " + str(size*size*3) + " | Actual: " + str(totalSum))

  # sum across i,j for all k within a sub-grid of dimentions (size x size) (each number appears within each sub-grid once and only once)
  # k is equal to the value in the grid - 1, since it begins indexing at 0
  temp = int(math.sqrt(size))
  for iincrement in range(0,temp):               # this i and j incrementer allows each individual sub-grid to be searched, and allows for easy grid size change
    for jincrement in range(0,temp):
      for i in range(0,temp):
        for j in range(0,temp):
          sum = 0
          for k in range(0, size):
            sum += int(grid[i + (iincrement*temp)][j+(jincrement*temp)][k])
          if sum == 1:
            totalSum += 1

  # if the sum across i is correct, totalSum should now be 4(size^2)
  print("Optimal: " + str(size*size*4) + " | Actual: " + str(totalSum))

# Creating The Network

We will be creating a Hopfield Neural Network to solve our puzzles.

A Hopfield Neural Network is a continuous, single-layer neural network in which all neurones connect to all other neurones symetrically.

In [56]:
###
#
# we will construct this network so it can be size adapted.
# since we initially want to solve a 4x4 sudoku, we will focus on this first.
#
##
def createNetwork(grid, size, alpha):
  
  # first we must construct the neurones
  # we've already made a network formatter, so we just need to flatten the binary 
  # puzzle into its respective neurones.
  neurones = torch.flatten(torch.tensor(networkFormat(grid,size)))

  # next we need to construct weights
  # the weights of the hopfield network must be symetric, meaning w_ij == w_ji and w_ii == 0
  # the initial weights of the network can be generated by converting the network into a Lyapunov function.
  # The energy function in Lyapunov form is shown below
  # 
  # E = -Σ(V(i,j,k)+αΣV(i,j,k).V(i`,j`,k`)
  #
  # We can use this energy function to calculate the required weights of the network.
  # Since we want to inhibit activation where the values would conflict, we can need to construct the
  # weights based on the rules of sudoku
  weights = []
  for x in range(0,len(neurones)):
    temp = np.zeros(len(neurones))

    # rule 1, each tile may only contain one number
    # this is performed dynamically across each level of the weights matrix
    holder = math.floor(x/size)           # to find which tile of the grid we're on
    for y in range(0,size):               # for each neurone representing a possible number on that tile
      temp[(holder*size) + y] = -alpha        # we inhibit response from the respective neurone
    
    # rule 2, each row may only contain each number once
    # this, too, is performed dynamically so we can vary the size of the grid later
    holder = math.floor(x/(size**2))            # to find which row we're on
    holder = (holder * (size**2)) + (x % size)  # modulus of x by size tells us which number we're looking for.
    for y in range(0,size):                   # for each tile in that row
      temp[holder+(size*y)] = -alpha                # inhibit the response from the respective neurones

    # rule 3, each column may only contain each number once
    # the process for this rule is easier
    holder = x % (size**2)          # find the value on row 0 for the respective number
    for y in range(0,size):       # for each row
      temp[holder+((size**2)*y)] = -alpha    # inhibit the respective number in that column

    # rule 4, each sub-grid may only contain each number once
    # this is the most complex one
    s = math.sqrt(size)
    c = math.floor(x/(s**3))%s         # this gives the neuron a column index from 0 to size-1
    r = size * (math.floor(x/(s**5)))  # this gives the neuron a row index from 0 to 2(size)
    ind = (s**3)*(c+r)                 # this gives is the index of the neuron in the top-left corner of that sub-grid
    holder = int(ind + (x%size))            # lastly we get the holder of the top-left index with the depth of the number
    for y in range(0,int(s)):
      temp[holder + (y*size)] = -alpha
      for z in range(1,int(s)):
        temp[holder + (y*size) + (z*(size**2))] = -alpha

    # for y in range(0, len(temp)):
      # if temp[y] == 0:
        # temp[y] = alpha

    temp[x] = 0
    weights.append(temp)

  return neurones, torch.tensor(weights), neurones

def activate(param):
  if param < 0:
    return 0
  else:
    return 1


###
#
# Next we will construct a step function which runs a single matrix multiplication step,
# divides by 8, and then performs the logsig function. These are the steps described in 
# the paper "Solving Suidoku Puzzles by using Hopfield Neural Networks" (Mladenov, 2011).
#
# We have created a bespoke step function as opposed to using the one provided by PyTorch
# as this allows us to add different rules. We want each neurone to connect to all neurones
# except itself, and we want additional processing on the results for each neurone.
#
##
def step(neurones, weights, theta, size):
  # weights = nn.Parameter(weights)
  # neurones = torch.unsqueeze(neurones,-1)
  # neurones = torch.mm(weights, neurones) # the weights matrix has an index of 0s diagonally, so the neurones will not get input into themselves
  # for x in range(0,len(neurones)):
  #   neurones[x] = activate(neurones[x])

  for x in range(0, len(neurones)):
    neurones[x] = activate(torch.sum(torch.mul(neurones,weights[x])))
    #if x % size == 0:
      #neurones = removeDuplicates(neurones, size)

  # lastly, before returning the neurones, we have to reapply the theta values.
  # these theta values 
  for x in range(0, len(theta)):
    if theta[x] == 1:
      neurones[x] = 1
  neurones = torch.reshape(neurones,(1,len(theta)))
  return neurones[0]


###
#
# An alternative paper (Watson, Buckley, and Mills), suggests using a uniform
# random choice to select a random slot on the grid and check for that one alone.
#
###
def randomStep(neurones, weights, theta):

  i = int(np.random.randint(0, len(neurones)))

  if theta[i] == 1: # if the selected neurones is a given, we don't want to waste processing time
    return neurones
  sum = 0
  for j in range(0, len(neurones)):
    sum += (neurones[i] * weights[i][j])
  neurones[i] = activate(sum)

  # lastly, before returning the neurones, we have to reapply the theta values.
  # these theta values 
  for x in range(0, len(theta)):
    if theta[x] == 1:
      neurones[x] = 1
  return neurones

# Testing

We'll begin by testing the algorithm on a 4x4 puzzle.
These are much easier puzzles, meaning it is a good starting point to prove the network is able to solve the puzzles.

We will then move on to testing a few 9x9 grids of varying difficulty.

In [59]:
## 
#
# 4x4 test
# We have 5 epochs running for this, despite the fact it should be able to solve all puzzles withing 1 or 2.
#
##

puzzle = [[3,0,0,0],
          [0,1,0,2],
          [0,0,0,0],
          [0,0,1,0]]

getGridValue(networkFormat(puzzle,4),4)

neurones, weights, theta = createNetwork(puzzle, 4, 1)

print(neurones.detach().numpy().reshape((4,4,4)))
print(readableFormat(neurones, 4))

for x in range(0,100):
  neurones = step(neurones, weights, theta, 4)
  getGridValue(neurones.detach().numpy().reshape((4,4,4)),4)
  

Optimal: 64 | Actual: 16
[[[0. 0. 1. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [1. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 1. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [1. 0. 0. 0.]
  [0. 0. 0. 0.]]]
[[3. 0. 0. 0.]
 [0. 1. 0. 2.]
 [0. 0. 0. 0.]
 [0. 0. 1. 0.]]
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64
Optimal: 64 | Actual: 64


In [None]:
## 
#
# 9x9 test
#
##

# solved -  so we can reverse the solution and check all the functions work correctly (286 from expert star sudoku)
# solved = [[2,1,6,3,7,8,4,5,9],[8,5,4,2,9,1,6,3,7],[9,7,3,4,5,6,8,1,2],[7,8,1,5,6,2,3,9,4],[5,3,2,9,4,7,1,8,6],[4,6,9,1,8,3,2,7,5],[6,2,8,7,3,5,9,4,1],[1,4,5,8,2,9,7,6,3],[3,9,7,6,1,4,5,2,8]]

# this puzzle was taken from the paper i am following. (V. Mladenov et al.)
paper = [[0,0,0,9,5,7,0,0,0],[7,6,0,0,0,0,0,1,0],[8,0,5,0,0,6,0,2,0],[3,0,9,0,4,0,0,0,0],[0,0,8,0,0,0,1,0,0],[0,0,0,0,2,0,5,0,6],[0,8,0,1,0,0,6,0,4],[0,3,0,0,0,0,0,7,1],[0,0,0,4,3,2,0,0,0]]

# these three were taken randomly from sudoku.com
easy = [[0,0,0,0,7,9,0,3,0],[5,0,2,0,6,1,4,7,8],[3,7,6,0,8,5,9,0,2],[0,1,7,5,0,0,8,0,0],[2,0,9,8,3,0,0,0,0],[0,0,0,0,2,0,0,4,0],[0,0,0,0,5,0,2,0,1],[0,2,3,0,0,0,0,5,4],[1,0,0,7,0,0,0,0,0]]
medium = [[0,3,1,0,5,0,0,2,0],[0,0,0,0,0,2,9,0,5],[2,0,0,0,1,0,0,0,0],[3,5,0,0,9,0,0,7,0],[7,0,0,5,0,0,0,4,0],[0,1,0,7,0,3,2,0,0],[1,2,6,3,0,0,0,0,0],[0,9,0,8,0,5,0,0,0],[5,0,0,0,2,0,7,0,0]]
hard = [[0,4,0,0,0,5,0,6,0],[0,0,5,4,2,0,0,0,0],[0,0,1,6,0,3,5,0,4],[0,0,0,0,0,0,7,0,0],[0,3,7,0,0,0,0,1,0],[9,0,0,0,0,4,3,5,0],[0,0,4,2,5,0,0,0,0],[0,0,0,0,0,0,0,7,6],[6,0,9,0,7,0,0,0,5]]

# this puzzle is taken from "expert star sudoku," a book of level 6 sudoku.
expert = [[0,0,0,0,1,0,0,0,0],[0,9,0,0,0,5,0,0,0],[5,0,0,0,7,6,0,0,0],[0,0,4,0,0,0,2,7,0],[0,2,0,0,0,0,5,0,9],[0,7,0,8,0,0,0,0,0],[0,0,1,0,0,3,0,0,0],[6,0,0,1,2,0,0,8,0],[7,8,0,0,0,0,9,0,0]]

getGridValue(networkFormat(easy,9),9)

neurones, weights, theta = createNetwork(easy, 9, 2)

print(readableFormat(neurones, 9))

for x in range(0,5):
  neurones = step(neurones, weights, theta, 9)
  weights = updateWeights(weights, 9, neurones)
  getGridValue(neurones.detach().numpy().reshape((9,9,9)),9)
  print(readableFormat(neurones, 9))
  

Optimal: 324 | Actual: 144
[[0. 0. 0. 0. 7. 9. 0. 3. 0.]
 [5. 0. 2. 0. 6. 1. 4. 7. 8.]
 [3. 7. 6. 0. 8. 5. 9. 0. 2.]
 [0. 1. 7. 5. 0. 0. 8. 0. 0.]
 [2. 0. 9. 8. 3. 0. 0. 0. 0.]
 [0. 0. 0. 0. 2. 0. 0. 4. 0.]
 [0. 0. 0. 0. 5. 0. 2. 0. 1.]
 [0. 2. 3. 0. 0. 0. 0. 5. 4.]
 [1. 0. 0. 7. 0. 0. 0. 0. 0.]]
Optimal: 324 | Actual: 308
[[4. 8. 1. 2. 7. 9. 5. 3. 6.]
 [5. 9. 2. 3. 6. 1. 4. 7. 8.]
 [3. 7. 6. 4. 8. 5. 9. 1. 2.]
 [6. 1. 7. 5. 4. 0. 8. 2. 3.]
 [2. 4. 9. 8. 3. 6. 1. 0. 5.]
 [8. 3. 5. 1. 2. 7. 6. 4. 9.]
 [7. 6. 4. 9. 5. 3. 2. 8. 1.]
 [9. 2. 3. 6. 1. 8. 7. 5. 4.]
 [1. 5. 8. 7. 0. 2. 3. 6. 0.]]
Optimal: 324 | Actual: 308
[[4. 8. 1. 2. 7. 9. 5. 3. 6.]
 [5. 9. 2. 3. 6. 1. 4. 7. 8.]
 [3. 7. 6. 4. 8. 5. 9. 1. 2.]
 [6. 1. 7. 5. 4. 0. 8. 2. 3.]
 [2. 4. 9. 8. 3. 6. 1. 0. 5.]
 [8. 3. 5. 1. 2. 7. 6. 4. 9.]
 [7. 6. 4. 9. 5. 3. 2. 8. 1.]
 [9. 2. 3. 6. 1. 8. 7. 5. 4.]
 [1. 5. 8. 7. 0. 2. 3. 6. 0.]]
Optimal: 324 | Actual: 308
[[4. 8. 1. 2. 7. 9. 5. 3. 6.]
 [5. 9. 2. 3. 6. 1. 4. 7. 8.]
 [3. 7. 6. 4. 8. 5.

KeyboardInterrupt: ignored