# Numpy Project - Part 4: 3D Sudokus! Increasing dimensions.

Now it's time to increase the number of dimensions of our arrays. We'll use a public [Kaggle Dataset](https://www.kaggle.com/bryanpark/sudoku) that contains 1 million Sudoku games!

We've reduced the total dataset to 5000 games for simplicity, but it'll still be fun. Let's get started

In [37]:
import numpy as np
import csv
from sudoku import Board

First let's take a look at the structure of the CSV file:

In [3]:
!Get-Content data/sudoku-small.csv -Head 10

'Get-Content' is not recognized as an internal or external command,
operable program or batch file.


In [2]:
!(Get-Content data/sudoku-small.csv).Count

    5000 data/sudoku-small.csv


As you can see, it's a very simple CSV containing only 2 columns, the empty board, and the solution. The way the board is expressed is different though; in this case it's just a long string containing all the numbers.

### 1) Parsing long string lines into valid boards

We need to adapt to this new style of expressing Sudoku boards. This is a valuable lesson in data handling: you can't anticipate all the different ways that there will be to express data. It'd be a mistake to extend the `Board` class also including this way of expressing puzzles; we try not to modify our core data structures adding edge cases; instead, we'll write an _"adapter"_ (see [Wikipedia's article about the Software Pattern](https://en.wikipedia.org/wiki/Adapter_pattern)), which is just a tiny function that will turn the long puzzle line into a numpy array:

In [7]:
string = '004300209005009001070060043006002087190007400050083000600000105003508690042910300'
int_listt = [int(i) for i in string]
print(int_listt)

[0, 0, 4, 3, 0, 0, 2, 0, 9, 0, 0, 5, 0, 0, 9, 0, 0, 1, 0, 7, 0, 0, 6, 0, 0, 4, 3, 0, 0, 6, 0, 0, 2, 0, 8, 7, 1, 9, 0, 0, 0, 7, 4, 0, 0, 0, 5, 0, 0, 8, 3, 0, 0, 0, 6, 0, 0, 0, 0, 0, 1, 0, 5, 0, 0, 3, 5, 0, 8, 6, 9, 0, 0, 4, 2, 9, 1, 0, 3, 0, 0]


In [8]:
def adapt_long_sudoku_line_to_array(line):
    rows = []
    int_list = [int(i) for i in line]
    for i in range(0, 81, 9):
        rows.append(int_list[i: i+9])
    return rows
    

In [9]:
adapt_long_sudoku_line_to_array('004300209005009001070060043006002087190007400050083000600000105003508690042910300')

[[0, 0, 4, 3, 0, 0, 2, 0, 9],
 [0, 0, 5, 0, 0, 9, 0, 0, 1],
 [0, 7, 0, 0, 6, 0, 0, 4, 3],
 [0, 0, 6, 0, 0, 2, 0, 8, 7],
 [1, 9, 0, 0, 0, 7, 4, 0, 0],
 [0, 5, 0, 0, 8, 3, 0, 0, 0],
 [6, 0, 0, 0, 0, 0, 1, 0, 5],
 [0, 0, 3, 5, 0, 8, 6, 9, 0],
 [0, 4, 2, 9, 1, 0, 3, 0, 0]]

In [10]:
line = '004300209005009001070060043006002087190007400050083000600000105003508690042910300'

In [11]:
assert np.array_equal(adapt_long_sudoku_line_to_array(line), np.array([
    [0, 0, 4, 3, 0, 0, 2, 0, 9],
    [0, 0, 5, 0, 0, 9, 0, 0, 1],
    [0, 7, 0, 0, 6, 0, 0, 4, 3],
    [0, 0, 6, 0, 0, 2, 0, 8, 7],
    [1, 9, 0, 0, 0, 7, 4, 0, 0],
    [0, 5, 0, 0, 8, 3, 0, 0, 0],
    [6, 0, 0, 0, 0, 0, 1, 0, 5],
    [0, 0, 3, 5, 0, 8, 6, 9, 0],
    [0, 4, 2, 9, 1, 0, 3, 0, 0]
]))

### 2) Reading a CSV file into a 3-dimensional array

Now it's time to read multiple sudoku puzzles into a single Numpy array. We'll end up with a 3-dimensional array, the first 2 dimensions (x, y) are the ones of a puzzle, and the 3rd dimension (z) is for multiple puzzles. Here's a graphical representation of it:

<img width="600px" src="https://user-images.githubusercontent.com/872296/68670705-499dce00-052c-11ea-8e82-18a1f435e274.png">


For example, we want to create something like this:

In [12]:
np.array([
    [
        [0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]
    ],
    [
        [0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]
    ],
    [
        [0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]
    ],
])

array([[[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0

Now it's time to code! Complete the function `read_sudokus_from_csv`; it receives two parameters, the name of the `csv` file to read and an optional one `read_solutions`. If `read_solutions` is True, you're supposed to read from the second column (solutions) instead of empty puzzles. You can assume the following CSV structure:

```
quizzes,solutions
10084..,183048..
30018..,34196..
...
empty,solved
empty,solved
```

In [82]:
def read_sudokus_from_csv(filename, read_solutions=False):
    with open(filename, newline='') as csvfile:
        reader = csv.DictReader(csvfile)  # Read the file as a dictionary
        sudokus = []  # List to store the 9x9 grids
        
        for row in reader:
            try:
                # Get the relevant column (quizzes or solutions)
                sudoku_str = row['solutions'] if read_solutions else row['quizzes']
                
                # Ensure the string has exactly 81 characters
                if len(sudoku_str.strip()) != 81:
                    print(f"Skipping invalid sudoku with length {len(sudoku_str.strip())}: {sudoku_str}")
                    continue
                
                # Convert the string to a list of integers
                sudoku_list = [int(char) for char in sudoku_str.strip()]
                
                # Break the flat list into a 9x9 grid
                sudoku_grid = [sudoku_list[i:i+9] for i in range(0, 81, 9)]
                
                # Append the 9x9 grid to the list of sudokus
                sudokus.append(sudoku_grid)
                
            except KeyError:
                # Handle the case where the expected column is missing
                print(f"Column missing in row: {row}")
                continue
            except ValueError:
                # Handle the case where conversion to integer fails
                print(f"Invalid character in sudoku: {sudoku_str}")
                continue
        
        return sudokus  # Return the list of 9x9 grids


For this test we'll use the file `sudoku-micro.csv` that contains only 3 puzzles:

In [83]:
read_sudokus_from_csv('data/sudoku-micro.csv')

[[[0, 0, 4, 3, 0, 0, 2, 0, 9],
  [0, 0, 5, 0, 0, 9, 0, 0, 1],
  [0, 7, 0, 0, 6, 0, 0, 4, 3],
  [0, 0, 6, 0, 0, 2, 0, 8, 7],
  [1, 9, 0, 0, 0, 7, 4, 0, 0],
  [0, 5, 0, 0, 8, 3, 0, 0, 0],
  [6, 0, 0, 0, 0, 0, 1, 0, 5],
  [0, 0, 3, 5, 0, 8, 6, 9, 0],
  [0, 4, 2, 9, 1, 0, 3, 0, 0]],
 [[0, 4, 0, 1, 0, 0, 0, 5, 0],
  [1, 0, 7, 0, 0, 3, 9, 6, 0],
  [5, 2, 0, 0, 0, 8, 0, 0, 0],
  [0, 0, 0, 0, 0, 0, 0, 1, 7],
  [0, 0, 0, 9, 0, 6, 8, 0, 0],
  [8, 0, 3, 0, 5, 0, 6, 2, 0],
  [0, 9, 0, 0, 6, 0, 5, 4, 3],
  [6, 0, 0, 0, 8, 0, 7, 0, 0],
  [2, 5, 0, 0, 9, 7, 1, 0, 0]],
 [[6, 0, 0, 1, 2, 0, 3, 8, 4],
  [0, 0, 8, 4, 5, 9, 0, 7, 2],
  [0, 0, 0, 0, 0, 6, 0, 0, 5],
  [0, 0, 0, 2, 6, 4, 0, 3, 0],
  [0, 7, 0, 0, 8, 0, 0, 0, 6],
  [9, 4, 0, 0, 0, 3, 0, 0, 0],
  [3, 1, 0, 0, 0, 0, 0, 5, 0],
  [0, 8, 9, 7, 0, 0, 0, 0, 0],
  [5, 0, 2, 0, 0, 0, 1, 9, 0]]]

In [84]:
expected = np.array([[[0, 0, 4, 3, 0, 0, 2, 0, 9],
        [0, 0, 5, 0, 0, 9, 0, 0, 1],
        [0, 7, 0, 0, 6, 0, 0, 4, 3],
        [0, 0, 6, 0, 0, 2, 0, 8, 7],
        [1, 9, 0, 0, 0, 7, 4, 0, 0],
        [0, 5, 0, 0, 8, 3, 0, 0, 0],
        [6, 0, 0, 0, 0, 0, 1, 0, 5],
        [0, 0, 3, 5, 0, 8, 6, 9, 0],
        [0, 4, 2, 9, 1, 0, 3, 0, 0]],

       [[0, 4, 0, 1, 0, 0, 0, 5, 0],
        [1, 0, 7, 0, 0, 3, 9, 6, 0],
        [5, 2, 0, 0, 0, 8, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 7],
        [0, 0, 0, 9, 0, 6, 8, 0, 0],
        [8, 0, 3, 0, 5, 0, 6, 2, 0],
        [0, 9, 0, 0, 6, 0, 5, 4, 3],
        [6, 0, 0, 0, 8, 0, 7, 0, 0],
        [2, 5, 0, 0, 9, 7, 1, 0, 0]],

       [[6, 0, 0, 1, 2, 0, 3, 8, 4],
        [0, 0, 8, 4, 5, 9, 0, 7, 2],
        [0, 0, 0, 0, 0, 6, 0, 0, 5],
        [0, 0, 0, 2, 6, 4, 0, 3, 0],
        [0, 7, 0, 0, 8, 0, 0, 0, 6],
        [9, 4, 0, 0, 0, 3, 0, 0, 0],
        [3, 1, 0, 0, 0, 0, 0, 5, 0],
        [0, 8, 9, 7, 0, 0, 0, 0, 0],
        [5, 0, 2, 0, 0, 0, 1, 9, 0]]])

Reading solutions:

In [85]:
assert np.array_equal(read_sudokus_from_csv('data/sudoku-micro.csv'), expected)

In [86]:
read_sudokus_from_csv('data/sudoku-micro.csv', read_solutions=True)

[[[8, 6, 4, 3, 7, 1, 2, 5, 9],
  [3, 2, 5, 8, 4, 9, 7, 6, 1],
  [9, 7, 1, 2, 6, 5, 8, 4, 3],
  [4, 3, 6, 1, 9, 2, 5, 8, 7],
  [1, 9, 8, 6, 5, 7, 4, 3, 2],
  [2, 5, 7, 4, 8, 3, 9, 1, 6],
  [6, 8, 9, 7, 3, 4, 1, 2, 5],
  [7, 1, 3, 5, 2, 8, 6, 9, 4],
  [5, 4, 2, 9, 1, 6, 3, 7, 8]],
 [[3, 4, 6, 1, 7, 9, 2, 5, 8],
  [1, 8, 7, 5, 2, 3, 9, 6, 4],
  [5, 2, 9, 6, 4, 8, 3, 7, 1],
  [9, 6, 5, 8, 3, 2, 4, 1, 7],
  [4, 7, 2, 9, 1, 6, 8, 3, 5],
  [8, 1, 3, 7, 5, 4, 6, 2, 9],
  [7, 9, 8, 2, 6, 1, 5, 4, 3],
  [6, 3, 1, 4, 8, 5, 7, 9, 2],
  [2, 5, 4, 3, 9, 7, 1, 8, 6]],
 [[6, 9, 5, 1, 2, 7, 3, 8, 4],
  [1, 3, 8, 4, 5, 9, 6, 7, 2],
  [7, 2, 4, 8, 3, 6, 9, 1, 5],
  [8, 5, 1, 2, 6, 4, 7, 3, 9],
  [2, 7, 3, 9, 8, 1, 5, 4, 6],
  [9, 4, 6, 5, 7, 3, 8, 2, 1],
  [3, 1, 7, 6, 9, 2, 4, 5, 8],
  [4, 8, 9, 7, 1, 5, 2, 6, 3],
  [5, 6, 2, 3, 4, 8, 1, 9, 7]]]

In [87]:
expected = np.array([[[8, 6, 4, 3, 7, 1, 2, 5, 9],
        [3, 2, 5, 8, 4, 9, 7, 6, 1],
        [9, 7, 1, 2, 6, 5, 8, 4, 3],
        [4, 3, 6, 1, 9, 2, 5, 8, 7],
        [1, 9, 8, 6, 5, 7, 4, 3, 2],
        [2, 5, 7, 4, 8, 3, 9, 1, 6],
        [6, 8, 9, 7, 3, 4, 1, 2, 5],
        [7, 1, 3, 5, 2, 8, 6, 9, 4],
        [5, 4, 2, 9, 1, 6, 3, 7, 8]],

       [[3, 4, 6, 1, 7, 9, 2, 5, 8],
        [1, 8, 7, 5, 2, 3, 9, 6, 4],
        [5, 2, 9, 6, 4, 8, 3, 7, 1],
        [9, 6, 5, 8, 3, 2, 4, 1, 7],
        [4, 7, 2, 9, 1, 6, 8, 3, 5],
        [8, 1, 3, 7, 5, 4, 6, 2, 9],
        [7, 9, 8, 2, 6, 1, 5, 4, 3],
        [6, 3, 1, 4, 8, 5, 7, 9, 2],
        [2, 5, 4, 3, 9, 7, 1, 8, 6]],

       [[6, 9, 5, 1, 2, 7, 3, 8, 4],
        [1, 3, 8, 4, 5, 9, 6, 7, 2],
        [7, 2, 4, 8, 3, 6, 9, 1, 5],
        [8, 5, 1, 2, 6, 4, 7, 3, 9],
        [2, 7, 3, 9, 8, 1, 5, 4, 6],
        [9, 4, 6, 5, 7, 3, 8, 2, 1],
        [3, 1, 7, 6, 9, 2, 4, 5, 8],
        [4, 8, 9, 7, 1, 5, 2, 6, 3],
        [5, 6, 2, 3, 4, 8, 1, 9, 7]]])

In [88]:
assert np.array_equal(read_sudokus_from_csv('data/sudoku-micro.csv', read_solutions=True), expected)

### Identifying invalid solutions

There's another file, `sudoku-invalids.csv` that contains invalid solutions of Sudokus. Your job is to read the solutions, and return only the ones that are invalid.

In [112]:
import numpy as np
import csv

def detect_invalid_solutions(file_path):
    """ Detect and return invalid Sudoku solutions from a CSV file. """
    sudoku_solutions = []

    # Read the CSV file using the csv module
    with open(file_path, newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            sudoku_str = row['solutions']  # Extract the solution string
            # Convert the string to a 9x9 grid
            grid = [list(map(int, sudoku_str[i:i+9])) for i in range(0, 81, 9)]
            sudoku_solutions.append(grid)

    invalid_solutions = []

    for grid in sudoku_solutions:
        board_array = np.array(grid)

        # Check rows for duplicates
        valid = True
        for row in board_array:
            if len(set(row)) != 9:
                valid = False
                break

        # Check columns for duplicates
        if valid:
            for col in board_array.T:  # Transpose to get columns
                if len(set(col)) != 9:
                    valid = False
                    break

        # Check 3x3 subgrids for duplicates
        if valid:
            for i in range(0, 9, 3):
                for j in range(0, 9, 3):
                    block = board_array[i:i+3, j:j+3].flatten()  # Get the 3x3 block
                    if len(set(block)) != 9:
                        valid = False
                        break

        # If the solution is invalid, add it to the invalid_solutions list
        if not valid:
            invalid_solutions.append(grid)

    return invalid_solutions



In [113]:
detect_invalid_solutions('data/sudoku-invalids.csv')

[[[1, 7, 6, 6, 2, 8, 4, 5, 9],
  [5, 3, 8, 1, 4, 9, 6, 7, 2],
  [4, 9, 2, 7, 6, 5, 1, 3, 8],
  [6, 5, 7, 8, 3, 4, 9, 2, 1],
  [9, 2, 4, 6, 5, 1, 3, 8, 7],
  [3, 8, 1, 9, 7, 2, 5, 6, 4],
  [8, 1, 3, 2, 9, 6, 7, 4, 5],
  [7, 4, 9, 5, 8, 3, 2, 1, 6],
  [2, 6, 5, 4, 1, 7, 8, 9, 3]],
 [[9, 9, 5, 7, 8, 4, 6, 1, 3],
  [8, 4, 3, 6, 2, 1, 9, 5, 7],
  [7, 1, 6, 5, 9, 3, 8, 2, 4],
  [3, 7, 2, 1, 5, 9, 4, 8, 6],
  [5, 9, 8, 4, 6, 7, 1, 3, 2],
  [4, 6, 1, 2, 3, 8, 5, 7, 9],
  [6, 3, 4, 8, 7, 5, 2, 9, 1],
  [1, 8, 7, 9, 4, 2, 3, 6, 5],
  [2, 5, 9, 3, 1, 6, 7, 4, 8]],
 [[5, 8, 5, 9, 6, 3, 2, 1, 7],
  [3, 2, 1, 7, 4, 8, 9, 5, 6],
  [6, 9, 7, 1, 2, 5, 4, 8, 3],
  [1, 6, 9, 8, 5, 7, 3, 2, 4],
  [7, 3, 2, 4, 1, 9, 8, 6, 5],
  [8, 4, 5, 2, 3, 6, 1, 7, 9],
  [4, 5, 8, 6, 9, 1, 7, 3, 2],
  [9, 7, 3, 5, 8, 2, 6, 4, 1],
  [2, 1, 6, 3, 7, 4, 5, 9, 8]],
 [[1, 9, 1, 2, 7, 5, 8, 4, 6],
  [2, 6, 4, 9, 8, 1, 7, 5, 3],
  [7, 5, 8, 6, 4, 3, 2, 1, 9],
  [9, 4, 7, 5, 6, 8, 3, 2, 1],
  [8, 2, 3, 1, 9, 7, 5, 6, 4],
  [5,

In [114]:
assert len(detect_invalid_solutions('data/sudoku-invalids.csv')) == 13

## Time to test!

Now it's time to move your code to `sudoku.py` and then run all the tests; if they're passing, you can move to the next step!

In [79]:
!py.test test_part_4.py

platform darwin -- Python 3.7.4, pytest-5.2.2, py-1.8.0, pluggy-0.13.0
rootdir: /Users/santiagobasulto/code/rmotr/curriculum/sudoku-tests
collected 4 items                                                              [0m

test_part_4.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[36m                                                      [100%][0m

