The updated problem statement for this assignment can be found at https://github.com/LS-Computer-Vision/sudoku-solver-2
We will use a combination of OpenCV and Deep Learning to build a Sudoku generator and solver
Open up your terminal and execute the following commands:
pip install virtualenv
python -m virtualenv venv
venv/Scripts/activate # For Windows Users
source venv/bin/activate # For OSX/Linux Users
pip install -r requirements.txt
# Pip install the required ML library
pip freeze > requirements.txt
First we will detect the sudoku grid from the image
The image of sudokus are available in assets/sudokus/sudoku*.jpg
sudoku.py contains the relevant classes
It is very helpful to carry out the task in stages, and view the output of every stage as you go along. This will help in debugging the process
The class Detector contains the functions needed to carry out the processing
All you need to do is add methods corresponding to stages. The methods will be executed automatically during processing.
To add a stage to the processing, just add a member function (taking no parameters) whose name is of the format stage_[idx]_[name]. The stages are executed in the order of increasing idx.
Each stage should take no parameters (except self) and return a numpy image for debugging purposes (it can be any data you want which will help you visualize the stage). The numpy image is displayed to the screen for debugging.
If you want to share data between stages (or pass data from one stage to another) use class member variables (eg you might store the output of one stage into self.image and access it in the next stage using self.image)
One example might be, we have the methods stage_1_preprocess(self) and stage_2_transform(self). Then the preprocess stage is carried out first followed by transform stage.
For some stages, you might want have 81 different images corresponding to the 81 cells you extracted from the sudoku grid. To display them, use the Detector.makePreview() method. This takes as input a 9x9 array of similar sized images, and returns a single combined image which you can return from your stage to be displayed
A typical pipeline (set of stages) might go like this:
-
Preprocess the image
-
Detect largest rectangle and apply perspective transform
-
Extract all cells
-
Remove cell borders
Now, there are several different ways to do all of this stuff, and you may want to explore some of this on your own. You may even design an ML based pipeline to extract the grid if you want. Some alternative ideas you can explore
-
Use
Hough Line Detectionto find all the lines, find their intersection points to find the grid and each cell corner. This method is robust to some warped sudoku images -
The method of finding the largest rectangle to get the grid corners fails when there are other larger rectangles in the image. A more robust method may be to use structural elements to find the most likely location of the grid, and extract the cell centers based on this information
-
Removing the cell borders can be done via removing parts touching the boundaries, or removing everything within some margin of the boundaries
-
A more robust method to remove cell borders is to find the largest connected blob at the centre of the image, which will obviously be the digit
You might think that we are done once we have the image of each digit, but sadly we still have a long way to go.
First you need to resize your cell images to 28x28 and possibly denoise it
The problems start to arise when you apply the model on each of your cells (with digits in it) to find the digits. You will often find mispredictions.
We need more accuracy. 90% might be fine for a simple MNIST project, but it will simply not cut it in this case.
For the sudoku to be solved correctly, all the digits have to be recognized correctly. If you have a model with 90% accuracy, assuming we need to recognize 30 digits, that means that the chances that all the digits are recognized correctly is 0.9^30 = 4.2% ! This is way too low. Even an accuracy of 99% implies only a 74% chance of correct sudoku detection.
-
One way of getting your accuracy up is to use a better model, example a CNN. Now that you are more confident in ML, use your knowledge to explore more complicated networks and try to get your accuracy as high as possible
-
MNISTis a handwritten digit dataset. This is very different from the type of data we are trying to classify here, which is printed digits. A model trained on handwritten digits will perform worse on printed digits compared to the test accuracy which was found on handwritten digits. One way around is to augment the dataset or even use our own custom alternative dataset by constructing images of printed digits on the fly. We can do this using thePILlibrary. Training your model on this kind of a dataset you can achieve even99.6%accuracy which translates to an89%chance of detecting the sudoku correctly which is quite good -
Even if you are not able to achieve such high accuracies and an error free sudoku detection, you are allowed upto 3 hardcoded corrections to the recognition.
Detector.run()takes in the array parametercorrection, which you can hardcode for each sudoku insidetest_model.py. Each element of this array is a tuple of the kind(x,y,dig)wherex,yis the location of the correction anddigis the correct digit value. You need to apply these corrections yourself in theDetector.solve()function, before calling theSolver.solve()method to solve the sudoku
This part is pretty simple and does not contain too many roadblocks. While there are many ways to implement this, none of them have any hidden gotchas you need to take care of (unlike the detection tasks which are very fiddly)
solver.py contains the Solver class. Initialize the objects of the Solver instance with the 9x9 array of digits you want to recognize, with None representing empty cells
You need to implement the Solver.solve() function which will solve the sudoku and store it in self.digits
Some ideas you may explore:
The hardest part is over. Everything you have done so far will make you pass the testcases.
But why stop here? Let's be a little fancy and reproject the solution back onto the original image, and that way we have a full augmented reality sudoku solver!
Implement the Detector.showSolved() method to reproject the solved digits onto the original image, show it to the screen using opencv and save the image in a file inside assets/sudoku/ directory (give the file an appropriate name).
The reprojected image may look something like this:
We have 2 sudoku images in the assets folder, and the corresponding solutions in the respective txt files.
You can add more sudoku images (and their solutions) to this folder and to the test cases in test_model.py, and achieve successful solving in all of them
-
If you needed to hardcode corrections into any test image, then analyse what went wrong in detecting those cells. You might try fiddling with parameters like
thresholdof the adaptive thresholding step, or the erosion or dilation kernel of the preprocessing step, or any other parameter you feel like. Sometimes some sudokus work with some parameters and not with others, it is very hard to get it right for the general case. Note down all the experiments that you tried -
If some of your sudoku images straight up do not work, then put them inside the
assets/sudokus/bad/folder. Analyse why these don't work, and suggest improvements on your pipeline to make them work
Put all the conclusions and analysis, experiments, charts and graphs (if any) into explanation.pdf
Your assignment repository (https://github.com/LS-Computer-Vision/sudoku-solver-2-{username}) should have the following contents pushed to it
repository root
├── assets
│ ├── sudokus
│ │ ├── sudoku*.jpg (test images)
│ │ ├── sudoku*.txt (test solutions)
│ │ ├── images of reprojected solutions
│ │ └── bad
│ │ └── images of sudokus that don't pass
│ ├── any other resources like Fonts (for PIL)
│ ├── all the data files
│ └── model
├── .gitignore
├── README.md
├── requirements.txt
├── dataLoader.py
├── model.py
├── solver.py
├── sudoku.py
├── test_model.py
├── explanations.pdf
└── (Not to be pushed, ignored by git) venv
The deadline for this assignment is kept at 5 August 11:59 PM




