This app solves Sudoku puzzles using an image as input. It uses OpenCV and a neural network to extract the data from the image, and then solves it with a backtrack solver. The goal of the project was to build it in one day, so there are a lot of improvements to be made! There is probably a much better method to do this, but I wanted to see what I could come up with myself without spoiling it too much by looking at tutorials or approaches other people used. The main focus of this project was to work with OpenCV.
python main.py --image dataset/10.png
Running this command gives the following output:
534678912
672195348
198342567
859761423
426853791
713924856
961537284
287419635
345286179
The algorithm uses the following steps to solve the puzzle:
- Normalize the image to 512 by 512 pixels.
- Detect lines in the image with
cv.HoughLinesP
. - Create a histogram of the x and y points.
- Cluster the points to find the 10 points that define the grid (in both directions).
- Use the clustered points to generate the lines of the grid.
- Extract each individual cell.
- Detect if there is something in the cell.
- Classify the cell if there is something in it.
- Create a Sudoku puzzle.
- Solve the puzzle.
- Print the solution.
Detect lines using cv.HoughLinesP
.
Look at the x values and the y values as a histogram, and cluster the points. The clustering is done by finding the standard deviation of the values on a single axis. The standard deviation is then scaled by a factor k
(hand picked for good results) to detect if a point is in close proximity to the cluster. The algorithms is as follows:
- Sort the values.
- Start a cluster with the first point
- If the next point is in proximity to the cluster (|x_i - c| < k*std_dev), add it to the cluster.
- Re-calculate the center of the cluster, c, which is the mean of the values in the cluster.
- If the point is not in proximity then use the cluster mean as the point.
- Start a new cluster with the point that was not in proximity, and repeat this.
It work a little bit like a one dimensional k-means clustering (k has nothing to do with the number of clusters though in this case). The algorithm can be improved to work with noisier histograms, e.g. if more lines are detected.
I made this video to see if there are any obvious patterns arising when rotating a grid.
movie.mp4
The cluster means are used to define a grid.
Each cell is extracted from the puzzle. In this step the cell is thresholded to black and white, and inverted because that aligns with the training data of the neural network (MNIST dataset).
Use the neural network to classify the value of the cell.
I found it isn't very fond of computer fonts, probably because it is trained on the MNIST data set.
The neural network is an MNIST classifier. I didn't want to spend a lot of time on the neural network part, so I used a tutorial to set it up. It is the hello world of neural networks anyways.
The network is trained for 8 epochs and has an accuracy of 98%. The weights are included below.
During the development I ran into multiple issues that need to be resolved to allow it to solve a much greater range of input.
- Improve extraction:
- Remove the border of the cell properly.
- Re-center the digit and add padding around it, to better align it for the classifier.
- Detect perspective and undo the perspective with a transform.
- Improve the extraction of the grid, so it can handle cases where it finds more than 9 clusters on an axis.
- Improve classifier:
- Currently it uses a MNIST classifier from a tutorial, however it does not handle a computer font really well.
- Improve solver:
- Current solver only uses backtracking, add forward propagation so it can also be used to find all the solutions.