# Sudoku

In this chapter we prove knowledge of a Sudoku solution in zero knowledge.

# What is Sudoku?

[This famous game](https://en.wikipedia.org/wiki/Sudoku) should be known to everybody, but I will explain it anyway.

In Sudoku, there is a $9 \times 9$ board of cells. Cells contain numbers from 1 to 9. At the start, most cells are empty. The goal is to fill in the empty cells, according to the rules.

The rules are that every number must occur exactly once in every row, in every column and in every 3x3 sub-square (box).

Because the three rules play against each other, solving a Sudoku is hard. The problem is NP-complete.

We will consider a generalization of the game: Each sub-square consists of $n \times n$ cells. The board contains $n \times n$ sub-squares and thus consists of $n^2 \times n^2$ cells. There numbers 1 to $n^2$ to be filled in. We get the original game by setting $n = 3$.

# What is the proof?

Peggy and Victor are engaged in an interactive proof.

There is a Sudoku puzzle.

Peggy (thinks she) knows a solution to the puzzle. She wants to convince Victor that she knows a solution without revealing it.

Victor is sceptical and wants to see evidence. He wants to expose Peggy lying if her solution is invalid.

Peggy wins if she convinces Victor. Victor wins by accepting only valid solutions.

# Set up Jupyter

Run the following snippet to set up your jupyter notebook for the workshop.

In [None]:
import sys

# Add project root so we can import local modules
root_dir = sys.path.append("..")
sys.path.append(root_dir)

# Import here so cells don't depend on each other
from IPython.display import display
from typing import List, Tuple
from itertools import chain
import ipywidgets as widgets
import random

from local.sudoku import Board
from local.ec.static import CurvePoint, Scalar, ONE_POINT
from local.ec.util import Opening
from local.graph import Mapping
import local.stats as stats

# Select the puzzle size

Select how large the board should be.

You select the parameter $n$ which is the width of each sub-square. The board will be $n^2$ cells wide in each direction.

In [None]:
# You can adjust the slider any time

dim = widgets.IntSlider(min=2, max=5, value=3, step=1, description="n²×n² Sudoku")
dim

# Select the scenario

Choose the good or the evil scenario. See how it affects the other cells further down.

1. **Peggy is honest** 😇 She knows a solution. She wants to convince Victor of a true statement.
2. **Peggy is lying**  😈 She doesn't actually know anything! She tries to fool Victor into believing a false statement.

In [None]:
# You can adjust the selection any time

valid_dropdown = widgets.Dropdown(
    options=[
        ("Valid solution (honest Peggy 😇)", True),
        ("Invalid solution (lying Peggy 😈)", False)],
    value=True,
    description="Scenario:",
)
valid_dropdown

# Generate the puzzle

Generate a random Sudoku puzzle and its solution.

Actually, the way to generate puzzles is to generate a complete solution and then to remove most cells to convert it into a partial solution! See how your browser solves _(a small instance of)_ an NP-complete problem within seconds! I spent too much time on [optimizing this](https://en.wikipedia.org/wiki/Knuth%27s_Algorithm_X).

In [None]:
# Rerun this cell after changing the puzzle size or the scenario

secret_solution = Board.random(dim_slider.value).solve()
if not valid_dropdown.value:
    secret_solution.falsify(1)
public_puzzle = secret_solution.to_puzzle()

print(public_puzzle)

# How the proof goes

Onto the protocol. The code might seem complex but it is actually doing something simple.

1. Peggy randomly shuffles (permutes) the numbers of her solution.
1. Peggy sends commitments to the value of each cell to Victor.
1. Victor selects a random area on the board (row, column, box). Victor can also randomly select the preset values.
1. Peggy opens the commitments of the cells in that area.
1. Victor verifies that each number occurs exactly once. If Victor chose the presets, he verifies that the shuffling is valid.

Step 1 is to prevent leaking knowledge of Peggy's solution to Victor. This is why the proof is zero-knowledge.

Step 2 makes sure that Peggy doesn't change the shuffling later.

Steps 3 to 5 verify one area of the board. Victor may choose to verify the presets. The values of these cells are public, so he computes how Peggy mapped the numbers to arrive at her shuffled presets. The shuffling is valid if equal cells are still equal (in value) and inequal cells are still inequal.

This **one round** of the proof. Peggy and Victor can **repeat the same steps to add more rounds**.

In [None]:
punto_uno, = CurvePoint.sample_greater_one(1)


class Peggy:
    def __init__(self, secret_solution: Board, public_puzzle: Board):
        """
        Give Peggy her solution and the public puzzle.
        """
        self.solution = secret_solution
        self.presets = public_puzzle
        self.dim = self.solution.dim
        self.dim_sq = self.solution.dim_sq
        
    def shuffle(self) -> List[List[CurvePoint]]:
        """
        Peggy shuffles how she displays each value.
        
        She commits to the shuffled value in each cell of her solution.
        
        She sends the commitments to Victor.
        """
        shuffling = Mapping.shuffle_list(list(range(1, self.dim_sq + 1)))
        self.openings = [[Opening(Scalar(shuffling[self.solution[row][col]]), ONE_POINT, punto_uno)
                          for col in range(self.dim_sq)] for row in range(self.dim_sq)]
        commitments = [[opening.close() for opening in row_openings] for row_openings in self.openings]
        return commitments

    def reveal(self, row: int, col: int, mode: str) -> List[Opening]:
        """
        Victor asks Peggy to reveal a given area (row, column, mode).
        
        Peggy returns the openings of this area.
        """
        if mode == "row":
            return [self.openings[row][col] for col in range(self.dim_sq)]
        elif mode == "column":
            return [self.openings[row][col] for row in range(self.dim_sq)]
        elif mode == "box":
            return [self.openings[row + row_offset][col + col_offset]
                    for row_offset in range(self.dim) for col_offset in range(self.dim)]
        else:  # mode == "presets"
            return [self.openings[row][col]
                    for row in range(self.dim_sq) for col in range(self.dim_sq) if self.presets[row][col] > 0]

class Victor:
    def __init__(self, public_puzzle: Board):
        """
        Give Victor the public puzzle.
        
        He does not know the solution.
        """
        self.presets = public_puzzle
        self.dim = self.presets.dim
        self.dim_sq = self.presets.dim_sq
        
    def select(self, commitments: List[List[CurvePoint]]) -> Tuple[int, int, str]:
        """
        Victor receives the commitments from Peggy.
        
        He selects a random area on the board (row, column, mode).
        
        He challenges Peggy to reveal this area.
        """
        self.commitments = commitments
        self.mode = random.choice(["row", "column", "box", "presets"])
        
        if self.mode == "row":
            self.row = random.randrange(self.dim_sq)
            self.col = None
        elif self.mode == "column":
            self.row = None
            self.col = random.randrange(self.dim_sq)
        elif self.mode == "box":
            self.row = random.randrange(0, self.dim_sq, self.dim)
            self.col = random.randrange(0, self.dim_sq, self.dim)
        else:  # self.mode == "presets"
            self.row = None
            self.col = None
        
        return self.row, self.col, self.mode
    
    def verify(self, revealed: List[Opening]) -> bool:
        """
        Victor receives the openings to the area that he selected.
        
        He checks if the area has the correct values:
        
        1. No duplicate or zero values in a row, column or box
        2. Consistent mapping of presets
           (Cells that were equal in presets are equal in the mapped presets, and vice versa for unequal cells)
           
        He checks if the openings match the commitments that Peggy sent.
        
        If everything checks out, he accepts. Otherwise he rejects.
        """
        shuffled_values = [opening.value().n for opening in revealed]
        
        if self.mode == "row":
            commitments = [self.commitments[self.row][col] for col in range(self.dim_sq)]
            if not self.presets.verify_area(shuffled_values):
                return False
        elif self.mode == "column":
            commitments = [self.commitments[row][self.col] for row in range(self.dim_sq)]
            if not self.presets.verify_area(shuffled_values):
                return False
        elif self.mode == "box":
            commitments = [self.commitments[row + row_offset][col + col_offset]
                           for row_offset in range(self.dim) for col_offset in range(self.dim)]
            if not self.presets.verify_area(shuffled_values):
                return False
        else:  # self.mode == "presets"
            commitments = [self.commitments[row][col]
                           for row in range(self.dim_sq) for col in range(self.dim_sq) if self.presets[row][col] > 0]
            if not self.presets.verify_shuffling(iter(shuffled_values)):
                return False
        
        return Opening.batch_verify(revealed, commitments)

# Run the proof

Let's see the proof in action.

Run the python code below and see what happens. The outcome depends on the scenario you picked. The outcome is also randomly different each time. Feel free to run the code multiple times!

In [None]:
peggy = Peggy(secret_solution, public_puzzle)
victor = Victor(public_puzzle)

commitments = peggy.shuffle()
print(f"Commitments: {commitments}")
row, col, mode = victor.select(commitments)
print(f"Row: {row}, column: {col}, mode: {mode}")
revealed = peggy.reveal(row, col, mode)
print(f"Revealed: {revealed}")
print()

# Victor is convinced
if victor.verify(revealed):
    # Valid solution (good)
    if True:
        print("Convinced 👌 (expected)")
    # Invalid solution (evil)
    else:
        print("Convinced 👌 (Victor was fooled)")
# Victor is not convinced
else:
    # Valid solution (good)
    if True:
        print("Not convinced... 🤨 (Peggy was dumb)")
    # Invalid solution (evil)
    else:
        print("Not convinced... 🤨 (expected)")

# How the proof is complete

If Peggy knows a solution, then Victor will always be convinced by her proof.

Whichever area Victor wants to verify, Peggy can always provide a valid answer.

Let's run a couple of exchanges and see how they go.

In [None]:
n_exchanges_complete_slider = widgets.IntSlider(min=10, max=1000, value=10, step=10, description="#Exchanges")
n_exchanges_complete_slider

In [None]:
# Honest case
secret_solution2 = Board.random(dim_slider.value).solve()
public_puzzle2 = secret_solution2.to_puzzle()

honest_peggy = Peggy(secret_solution2, public_puzzle2)
victor = Victor(public_puzzle2)

peggy_success = 0

for _ in range(n_exchanges_complete_slider.value):
    commitments = honest_peggy.shuffle()
    row, col, mode = victor.select(commitments)
    revealed = honest_peggy.reveal(row, col, mode)

    if victor.verify(revealed):
        peggy_success += 1
        
peggy_success_rate = peggy_success / n_exchanges_complete_slider.value * 100

print(f"Running {n_exchanges_complete_slider.value} exchanges")
print(f"Honest Peggy wins {peggy_success_rate:0.2f}% of the time")
print()

assert peggy_success_rate == 100
print("Peggy always wins if she is honest")

# How the proof is sound

If Peggy knows a **fake** solution, then Victor can reject her proof.

Victor's chance of finding an error increases with the number of rounds.

Because of the commitments, Peggy cannot change the shuffled cell values later. She doesn't know which area Victor wants to verify when she makes these commitments, so she needs to prepare for all possible cases.

If there is a single error in the cell values inside the commitments, then Victor has a chance of finding it and exposing Peggy as a liar.

Let's run a couple of exchanges and see how they go.

In [None]:
n_exchanges_sound_slider = widgets.IntSlider(min=10, max=1000, value=10, step=10, description="#Exchanges")
n_rounds_slider = widgets.IntSlider(min=1, max=15, value=1, step=1, description="#Rounds")

display(n_exchanges_sound_slider)
display(n_rounds_slider)

In [None]:
# Lying case
secret_solution3 = Board.random(dim_slider.value).solve()
secret_solution3.falsify(1)
public_puzzle3 = secret_solution2.to_puzzle()

lying_peggy = Peggy(secret_solution3, public_puzzle3)
victor = Victor(public_puzzle3)

victor_success = 0

for _ in range(n_exchanges_sound_slider.value):
    for _ in range(n_rounds_slider.value):
        commitments = lying_peggy.shuffle()
        row, col, mode = victor.select(commitments)
        revealed = lying_peggy.reveal(row, col, mode)
    
        if not victor.verify(revealed):
            victor_success += 1
            break
            
victor_success_rate = victor_success / n_exchanges_sound_slider.value * 100

print(f"Running {n_exchanges_sound_slider.value} exchanges with {n_rounds_slider.value} rounds each")
print(f"Victor wins against lying Peggy {victor_success_rate:0.2f}% of the time")
print()

if victor_success_rate < 50:
    print("Victor loses quite often for a small number of rounds")
elif victor_success_rate < 90:
    print("Victor gains more confidence with each added round")
else:
    print("At some point it is basically impossible to fool Victor")

# How the proof is zero-knowledge

The proof itself looks like random noise. Nothing can be extracted from this noise.

Intuitively, everything that is sent over the wire is randomized: Peggy sends commitments which are pseudorandom elliptic curve points. Victor selects a random area. Peggy opens the selected commitments to values which are randomly shuffled.

We can also replicate the same distribution of transcripts from public puzzle:

1. Randomly select the area to select.
1. Assign randomly shuffled values to the cells inside this area.
1. For each selected cell, commit to its assigned value
1. For each other cell, commit to a random value

The case where Victor verifies the presets is slightly more complicated. We have to randomy shuffle the numbers and create corresponding commitments. I skipped this part in the Python code because it is too much work.

Let's run a chi-square test to see if the original transcripts are distinguishable from the fake transcripts.

Because of the large number of commitments, **this experiment requires especially many samples (#transcripts)**. Make sure to choose a small problem size. The code also cheats by making the commitment openings more compact than they actually are.

In [None]:
n_transcripts_slider = widgets.IntSlider(min=1000, max=100000, value=10000, step=1000, description="#Transcripts")
n_transcripts_slider

In [None]:
from typing import Tuple
import local.stats as stats
from itertools import chain

# Make sure to run this for small grids (adjust the parameters)
# Large grids can lead to errors
# The list of critical chi-square values might be too short for the large number of degrees of freedom
# Large grids also require very many transcripts, which is slow

peggy = Peggy(secret_solution, public_puzzle)
victor = Victor(public_puzzle)

def real_transcript() -> Tuple:
    commitments = peggy.shuffle()
    row, col, mode = victor.select(commitments)
    revealed = peggy.reveal(row, col, mode)

    return CurvePoint.batch_serialize(chain(*commitments), compact=2)[0:3], row, col, mode, \
           Opening.batch_serialize(revealed, compact=2)[0:3]


dim = dim_slider.value
dim_sq = dim ** 2

def fake_transcript() -> Tuple:
    mode = random.choice(["row", "column", "box", "presets"])
    if mode == "row":
        row = random.randrange(dim_sq)
        col = None
        size_revealed = dim_sq
    elif mode == "column":
        row = None
        col = random.randrange(dim_sq)
        size_revealed = dim_sq
    elif smode == "box":
        row = random.randrange(0, dim_sq, dim)
        col = random.randrange(0, dim_sq, dim)
        size_revealed = dim_sq
    else:  # mode == "presets"
        # FIXME: Complicated
        row = None
        col = None
        size_revealed = sum(1 for columns in public_presets for value in columns if value > 0)
    
    commitments = [[CurvePoint.random() for _ in range(dim_sq)] for _ in range(dim_sq)]
    revealed = []
    
    return CurvePoint.batch_serialize(chain(*commitments), compact=2)[0:3], row, col, mode, \
           Opening.batch_serialize(revealed, compact=2)[0:3]

print("Real transcript: {}".format(real_transcript()))
print("Fake transcript: {}".format(fake_transcript()))
print()

real_samples = [real_transcript() for _ in range(n_transcripts_slider.value)]
fake_samples = real_samples

# The chi-square test is only valid if most bins are filled
# Increase the number of transcripts if there are too many empty bins

null_hypothesis = stats.chi_square_equal(real_samples, fake_samples)
print()

if null_hypothesis:
    print("Real and fake transcripts are the same distribution.")
    print("Victor learns nothing 👌")
else:
    print("Real and fake transcripts are different distributions.")
    print("Victor might learn something 😧")

stats.plot_comparison(real_samples, fake_samples, "real", "fake")