# Graph Isomorphism

In this chapter we construct a zero-knowledge protocol around graph isomorphism.

This chapter is based on [a lecture from the Max Plank Institute for Informatics](https://resources.mpi-inf.mpg.de/departments/d1/teaching/ss13/gitcs/lecture9.pdf).

# What is a graph?

[A graph](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) consists of nodes and edges. Nodes are points in space. Edges are bridges between nodes.

# What is an isomorphism?

[Two graphs are isomorphic](https://en.wikipedia.org/wiki/Graph_isomorphism) if they have the same structure. By changing the names of the nodes of the first graph, we can obtain the second graph, and vice versa. There exists a translation of node names.

Given two large random graphs, it is hard to know if they are isomorphic. There is no known algorithm to efficiently compute this (in polynomial time).

# What are we proving?

Peggy and Victor are engaged in an interactive proof.

There are two graphs.

Peggy thinks she knows a translation between both graphs (both graphs are isomorphic). She wants to prove that to Victor, without revealing the translation.

Victor is sceptical and wants to see evidence. He wants to expose Peggy as a liar if both graphs are structually different (non-isomorphic).

Peggy wins if she convinces Victor. Victor wins by accepting only graphs that are isomorphic.

# Set up Jupyter

Run the following snippet to set up your Jupyter notebook for the workshop.

In [None]:
import os
import sys

# Add project root so we can import local modules
root_dir = sys.path.append("..")
sys.path.append(root_dir)

# Import here so cells don't depend on each other
from IPython.display import display
from typing import List, Tuple, Dict
import ipywidgets as widgets
import random
import networkx as nx
import matplotlib.pyplot as plt

from local.graph import Mapping, random_graph, non_isomorphic_graph
import local.stats as stats

# Select the scenario

Choose the good or the evil scenario. See how it affects the other cells further down.

1. **Peggy is honest** 😇 She knows a translation between both graphs. She wants to convince Victor of a true statement.
2. **Peggy is lying**  😈 She doesn't know a translation between both graphs! She tries to fool Victor into believing a false statement.

Also select the **size of the graphs**.

In [None]:
def generate_graphs(values: Dict):
    global graph1, graph2, from_1_to_2
    
    n_edges = n_nodes_slider.value
    graph1 = random_graph(n_nodes_slider.value, n_edges)

    if honest_dropdown.value:
        # Good: There is a translation between both graphs
        from_1_to_2 = Mapping.shuffle_graph(graph1)
        graph2 = from_1_to_2.apply_graph(graph1)
    else:
        # Evil: Both graphs are non-isomorphic
        graph2 = non_isomorphic_graph(graph1)

honest_dropdown = widgets.Dropdown(
    options=[
        ("Peggy can translate 😇", True),
        ("Peggy cannot translate 😈", False)],
    value=True,
    description="Scenario:",
)
honest_dropdown.observe(generate_graphs, names="value")

n_nodes_slider = widgets.IntSlider(min=4, max=20, value=4, step=1, description="#Nodes")
n_nodes_slider.observe(generate_graphs, names="value")

# Generate default values
generate_graphs({})
# Display selection
display(honest_dropdown)
display(n_nodes_slider)

# Visualize your graphs

Visualize the graphs you generated.

In [None]:
print("Graph 1")
nx.draw(graph1, with_labels=True)
plt.show()

print("Graph 2")
nx.draw(graph2, with_labels=True)
plt.show()

# How the proof goes

1. Peggy randomly shuffles graph 1 to obtain graph $S$.
1. Peggy sends $S$ to Victor.
1. Victor randomly chooses graph 1 or graph 2, to obtain graph $C$.
1. Peggy computes a translation $t$ from $C$ graph to $S$.
1. Victor checks that $t$ translates $C$ to $S$.

How Peggy computes $t$ depends on Victor's choice:

1. If Victor chooses graph 1, then $t$ is simply the shuffling from step 1. This translates graph 1 to $S$.
1. If Victor chooses graph 2, then $t$ translates graph 2 to graph 1 and then it applies the shuffling from step 1. This translates graph 2 to graph 1 to $S$.

If Peggy can translate between graph 1 and graph 2, then she can compute translations in both directions.

In [None]:
class Peggy:
    def __init__(self, graph1: nx.Graph, from_1_to_2: Mapping):
        self._graph1 = graph1
        self._from_1_to_2 = from_1_to_2
    
    def shuffled_graph(self) -> nx.Graph:
        self._shuffle = Mapping.shuffle_graph(self._graph1)
        shuffled_graph1 = self._shuffle.apply_graph(self._graph1)
        return shuffled_graph1
    
    def respond(self, index: int) -> Mapping:
        if index == 0:
            return self._shuffle
        else:
            assert index == 1
            complex_shuffle = self._from_1_to_2.invert().and_then(self._shuffle)
            return complex_shuffle


class Victor:
    def __init__(self, graph1: nx.Graph, graph2: nx.Graph):
        self._graphs = [graph1, graph2]
        
    def random_index(self, shuffled_graph: nx.Graph) -> int:
        self._shuffled_graph = shuffled_graph
        self._index = random.randrange(0, 2)
        return self._index
    
    def verify(self, shuffle: Mapping) -> bool:
        another_shuffled_graph = shuffle.apply_graph(self._graphs[self._index])
        # `self._shuffled_graph == another_shuffled_graph` compares pointers not data
        return set(sorted(self._shuffled_graph.edges())) == set(sorted(another_shuffled_graph.edges()))

# Run the proof

Let's see the proof in action.

Run the Python code below and see what happens.

The outcome depends on the scenario you picked. The outcome is also randomly different each time.

Feel free to run the code multiple times!

In [None]:
peggy = Peggy(graph1, from_1_to_2)
victor = Victor(graph1, graph2)

shuffled_graph = peggy.shuffled_graph()
index = victor.random_index(shuffled_graph)
response_shuffle = peggy.respond(index)

if victor.verify(response_shuffle):
    if honest_dropdown.value:
        print("Victor is convinced 👌 (expected)")
    else:
        print("Victor is convinced 👌 (Victor was fooled)")
else:
    if honest_dropdown.value:
        print("Victor is not convinced... 🤨 (Peggy was dumb)")
    else:
        print("Victor is not convinced... 🤨 (expected)")

# How the proof is complete

If Peggy can translate between both graphs, then **Victor will always be convinced** by her proof.

This is because Peggy is always able to produce a translation from $C$ to $S$.

Let's run a couple of exchanges and see how they go.

In [None]:
n_exchanges_complete_slider = widgets.IntSlider(min=10, max=1000, value=10, step=10, description="#Exchanges")
n_exchanges_complete_slider

In [None]:
# Good scenario:
# There is a translation between both graphs
from_1_to_3 = Mapping.shuffle_graph(graph1)
graph3 = from_1_to_3.apply_graph(graph1)

honest_peggy = Peggy(graph1, from_1_to_3)
victor = Victor(graph1, graph3)

peggy_success = 0

for _ in range(n_exchanges_complete_slider.value):
    shuffled_graph = honest_peggy.shuffled_graph()
    index = victor.random_index(shuffled_graph)
    response_shuffle = honest_peggy.respond(index)

    if victor.verify(response_shuffle):
        peggy_success += 1
        
peggy_success_rate = peggy_success / n_exchanges_complete_slider.value * 100

print(f"Running {n_exchanges_complete_slider.value} exchanges.")
print(f"Honest Peggy wins {peggy_success_rate:0.2f}% of the time.")
print()

assert peggy_success_rate == 100
print("Peggy always wins if she is honest.")

# How the proof is sound

If Peggy cannot translate between both graphs, then **Victor has a chance to reject** her proof.

Assuming that Victor randomly chooses between both graphs, Peggy has a 50% chance to produce a correct translation. This is the case when Victor chooses graph 1. Peggy learned how to translate graph 1 to $S$ in step 1!

Peggy fails if Victor chooses graph 2, because translating from graph 2 to graph 1 requires Peggy to know the translation in the first place. This case occurs 50% of the time. The probabilities don't look great for Victor.

We can increase Victor's confidence by running the protocol for **multiple rounds**. This means Peggy randomly shuffles and Victor randomly selects a graph multiple times. Each time, Peggy has to produce a translation from $C$ to $S$. Victor accepts if Peggy answered correctly **all** time times. However, he rejects if Peggy answers incorrectly **even once**.

The chance that Peggy answers correctly for $n$ rounds, without being able to translate between both graphs, is $\left(\frac{1}{2}\right)^n$. This decreases exponentially in $n$ and becomes tiny! If Peggy answers correctly, then Victor is confident that she didn't cheat.

Let's run a couple of exchanges and see how they go.

In [None]:
n_exchanges_sound_slider = widgets.IntSlider(min=10, max=1000, value=10, step=10, description="#Exchanges")
n_rounds_slider = widgets.IntSlider(min=1, max=10, value=1, step=1, description="#Rounds")

display(n_exchanges_sound_slider)
display(n_rounds_slider)

In [None]:
# Evil scenario:
# Both graphs are non-isomorphic
graph4 = non_isomorphic_graph(graph1)
from_1_to_4 = Mapping.shuffle_graph(graph1)

lying_peggy = Peggy(graph1, from_1_to_4)
victor = Victor(graph1, graph4)

victor_success = 0

for _ in range(n_exchanges_sound_slider.value):
    for _ in range(n_rounds_slider.value):
        shuffled_graph = lying_peggy.shuffled_graph()
        index = victor.random_index(shuffled_graph)
        response_shuffle = lying_peggy.respond(index)
    
        if not victor.verify(response_shuffle):
            victor_success += 1
            break
            
victor_success_rate = victor_success / n_exchanges_sound_slider.value * 100

print(f"Running {n_exchanges_sound_slider.value} exchanges with {n_rounds_slider.value} rounds each.")
print(f"Victor wins against lying Peggy {victor_success_rate:0.2f}% of the time.")
print()

if victor_success_rate < 50:
    print("Victor loses quite often for a small number of rounds.")
elif victor_success_rate < 90:
    print("Victor gains more confidence with each added round.")
else:
    print("At some point it is basically impossible to fool Victor.")

# How the proof is zero-knowledge

The proof itself looks like random noise. Nothing can be extracted from this noise.

Everything that is sent over the wire is randomized:

1. Peggy sends a randomly shuffled graph.
1. Victor sends a random index.
1. Peggy sends a translation which includes the random shuffling from step 1. This looks like a random mapping.

We can replicate this pattern:

1. Compute a random index (0 or 1).
1. Randomly shuffle the graph at the index.
1. Take the shuffling from step 2 as the translation.

Victor verifies that $t$ translates $C$ to $S$.

In the fake transcripts, $C$ is the graph at the index from step 1. $S$ is the result of step 2. By construction, $t$ from step 3 translates $C$ to $S$.

Let's run a chi-square test to see if the original transcripts are distinguishable from the fake transcripts.

**Try small graphs first!** They require fewer samples than large graphs.

In [None]:
n_transcripts_slider = widgets.IntSlider(min=1000, max=50000, value=10000, step=1000, description="#Transcripts")
n_transcripts_slider

In [None]:
peggy = Peggy(graph1, from_1_to_2)
victor = Victor(graph1, graph2)

def real_transcript() -> Tuple:
    shuffled_graph = peggy.shuffled_graph()
    index = victor.random_index(shuffled_graph)
    response_shuffle = peggy.respond(index)

    return tuple(shuffled_graph.edges()), index, tuple(response_shuffle)


def fake_transcript() -> Tuple:
    index = random.randrange(0, 2)

    if index == 0:
        shuffle = Mapping.shuffle_graph(graph1)
        shuffled_graph = shuffle.apply_graph(graph1)
        response_shuffle = shuffle
    else:
        shuffle = Mapping.shuffle_graph(graph2)
        shuffled_graph = shuffle.apply_graph(graph2)
        response_shuffle = shuffle

    return tuple(shuffled_graph.edges()), index, tuple(response_shuffle)


real_samples = [real_transcript() for _ in range(n_transcripts_slider.value)]
fake_samples = [fake_transcript() for _ in range(n_transcripts_slider.value)]

null_hypothesis = stats.chi_square_equal(real_samples, fake_samples)
print()

if null_hypothesis:
    print("Real and fake transcripts are the same distribution.")
    print("Victor learns nothing 👌")
else:
    print("Real and fake transcripts are different distributions.")
    print("Victor might learn something 😧")

stats.plot_comparison(real_samples, fake_samples, "real", "fake")