A repository for my work on the Sudoku-puzzle coding challenge for the Insight Data Engineering Fellowship
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



By: Tuuli Pöllänen
Date: 31.10.2014

Task and instructions

This repository contains my contribution to the Sudoku-puzzle coding challenge for the Insight Data Engineering Fellowship. The description of the challenge is as follows:

We'd like you to implement a Sudoku puzzle solver in one of the following programming languages: Java, Clojure, Scala, C, C++, Python, or Ruby. It should input a CSV file consisting of an unsolved Sudoku with 0's representing blanks, and output a CSV file with the solved Sudoku.

For instance, the input CSV file can look like the following:

The output should be:


Any additional game play features are optional, but welcome.

The solution

This Python-based program prompts the user for a text file containing a sudoku puzzle in csv-format, solves the puzzle, and outputs the solution in another .csv file.

In the task, I refer to the entire 81-square 'play area' as the grid. The vertical elements are rows, the column elements are columns, and the nine 3x3 units in the grid are boxes. Each square belongs to three units: a row, a column and a box. Each square has twenty peers, or other squares that belong to the same units. Peers to a square cannot contain the same value as that square. Following this "naming convention", the variable names in the code should quite intuitive.

Each square has its own "identification code", with an alphabetical character from A to I indicating its row, and a digit between 1 and 9 indicating its column. (e.g. the units for the first row are A1, A2, A3... A9, and the last column A9, B9, C9, ..., I9. The units and peers of each square are contained in dictionaries, with the square's identification code as key and lists of the squares' units (or peers) as values. At the start of the program, it computes a dictionary containing all the possible values for each square, and the possible values of this dictionary are gradually eliminated to arrive at the solution.

The program uses a combination of two heuristic strategies, the 'naked twins'-strategy, and recursive search to solve the puzzle. The heuristic strategies are the following:

  • a) If there is only one possible value for the square, that value cannot be contained by its peers (squares in the same row, column or box). Therefore, that value can be eliminated from its peers.
  • b) If a unit (row, column or box) has only one possible place for a value, the value must be placed in that location.

A failure in either heuristic strategy implies a contradiction in the puzzle. For simpler puzzles, the heuristic strategies are often enough to arrive at a solution, whereas more difficult ones rely on recursive search.

The naked twins strategy is used when two squares are found within the same unit, containing only the same two digits. This means that no other square in the same unit can contain either of the values, and so the values can safely be removed from all the other squares in the unit.

Running the program

This is a simple command line program that has been tested with Python 2.7. To run it, clone the repository, go into the folder that is created, and run the program by typing 'python sudoku.py'. When prompted, type in the name of the file that contains your puzzle (it has to be a .csv or a .txt file) - the results get stored into a new .csv file with _solution appended to your original filename - e.g. from "sudoku.csv" to "sudoku_solution.csv".