# Interactive proofs

In this chapter we explain what interactive proofs are.

Virtually every part of the workshop depends on this concept, so this chapter is a must read.

Read the TL;DR part if you are short on time. Read the rest for more info.

# TL;DR

An interactive proof is a game between two players. We call them Peggy and Victor.

Peggy wants to convince Victor of a statement.

Victor isn't gullible so he wants to see evindence before accepting the statement as true.

Peggy wins if she can convince Victor. Victor wins if he accepts true statements and rejects false statements.

In the good case, Peggy is honest 😇 Her statement is true. Victor wins if he accepts, which means Peggy also wins. Everybody is happy 😊

In the evil case, Peggy is lying 😈 Her statement is false. The winning conditions are asymmetric, so only one party can win this time. If Victor exposes the lie, he wins and Peggy loses. If Peggy fools Victor into accepting, she wins and he loses. This feels like a zero-sum game 😕

Interactive proofs are designed to make the evil case expensive for Peggy. She is motivated to be honest and everybody wins.

# Jupyter setup

Run the following snippet to set up your jupyter notebook for the workshop.

In [None]:
import sys

# Add project root so we can import local modules
root_dir = sys.path.append("..")
sys.path.append(root_dir)

# Import here so cells don't depend on each other
from typing import List, Tuple
from local.primes import is_prime,euler_totient, get_coprime
from functools import reduce
import random

# Statements

A statement is a fact that can be **true or false**.

"The output of function $f$ on input $x$" is not a statement, because the output $f(x)$ can be many different things. Meanwhile "the output of function $f$ on input $x$ equals $y$" is a statement, because we can check if the equation $f(x) = y$ holds true.

Here are more examples:

"This number is prime"

"This Sudoku has a solution"

"You and I buy this commodity for the same price"

"I am member of this club"

"This Bitcoin transaction respects consensus"

# Non-Statements

Statements must be **precise**. There is no room for interpretation.

Here are some examples of non-statements:

"This number is large" _(what does "large" mean?)_

"This Sudoku is hard" _(what does "hard" mean?)_

"You and I are expert traders" _(what does "expert" mean?)_

"I am the coolest guy in this club" _(for sure, man)_

"Bitcoin is sound money" _(this might be true, but it is imprecise; we cannot prove it mathematically)_

# Statements about knowledge

Statements can talk about **knowledge of something**. We will make this mathematically precise when talking about zero-knowledge. For now, appreciate how useful these kinds of statements are. You encounter them every day.

Here are some examples:

"I know the prime factorization of this number"

"I know a solution to this Sudoku"

"I know the password to this account"

"I know the secret key that unlocks these bitcoin"

# Prime factorization

We will prove the statement "I know the prime factorization of this number".

We only prove that we **know** the factorization; we do not **reveal** the factorization. Not revealing the factorization makes the proof more compact and it will enable us to write a zero-knowledge proof later.

First, let's generate a composite number and its prime factorization.

In [None]:
# Rerun this cell to generate a different composite number

primes = [n for n in range(100) if is_prime(n)]

n_factors = 2
factors = random.sample(primes, n_factors)
composite = reduce(lambda x, y: x * y, factors)

print("{} = {}".format(composite, " * ".join(map(str, factors))))

# Meet Peggy 😸

Our interactive proof starts with Peggy.

She (thinks) she knows that a given statement is true. She has some proof or reason to believe so. She wants to convince Victor that the statement is true.

She exchanges messenges with Victor.

# Meet Victor 🧐

Victor listens what Peggy has to say.

He is a critical thinker who wants to see evidence before believing in her claim. He knows that Peggy might be lying or she might not know what she is talking about.

He challenges Peggy with some questions to see if she can answer them. If she can answer enough questions then Victor has confidence that the statement is true.

# Peggy's first protocol

Peggy constructs a protocol to convince Victor that she knows the prime factorization of some number.

She simply sends Victor the factorization in plain!

In [None]:
class Peggy1:
    def __init__(self, factors: List[int]):
        self.factors = factors
        
    def reveal_factors(self) -> List[int]:
        return self.factors
    
class Victor1:
    def __init__(self, composite: int):
        self.composite = composite
        
    def verify(self, factors: List[int]) -> bool:
        return all(is_prime(x) for x in factors) and self.composite == reduce(lambda x, y: x * y, factors)


peggy = Peggy1(factors)
victor = Victor1(composite)
factors = peggy.reveal_factors()

if victor.verify(factors):
    print("Victor is convinced 👌")
else:
    print("Victor is not convinced 🤨")

How do both parties like this protocol?

Peggy feels so-so 😕 Victor feels so-so 😕

What went wrong?

**The proof is too long!** Peggy sent the entire factorization. The longer the factorization, the longer the proof ❌

**The interactive proof should be more compact than simply sending all the data.** Ideally, the proof is logarithmic in size with respect to the problem size. We also want to keep Victor's work logarithmic. Peggy does the heavy lifting and Victor verifies that she did it correctly.

# Peggy's compact protocol

Peggy revises her protocol and makes the proof more compact.

Instead of sending the factorization, now Peggy sends nothing at all. Victor simply accepts. The proof is of constant size!

In [None]:
class Peggy2:
    def __init__(self, factors: List[int]):
        self.factors = factors
    
    def do_nothing(self) -> None:
        return None
    
class Victor2:
    def __init__(self, composite: int):
        self.composite = composite
    
    def verify(self, nothing: None) -> bool:
        return True


peggy = Peggy2(factors)
victor = Victor2(composite)
nothing = peggy.do_nothing()

if victor.verify(nothing):
    print("Victor is convinced 👌")
else:
    print("Victor is not convinced 🤨")

How do both parties like this protocol?

Peggy is happy 😊 Victor is angry 😡

What went wrong?

**Victor is completely gullible!** He unconditionally accepts anything that Peggy tells him ❌

In an interactive proof, **Victor should have a fair chance to expose Peggy if she is lying.** This is called **soundness**.

# Victor's fair protocol

Victor is fed up with Peggy. He takes charge and creates his own protocol.

He wants to expose Peggy whenever she is lying. Since it is hard to know when she is lying, he simply rejects everything. The proof is still constant size and fair to Victor!

In [None]:
class Peggy3:
    def __init__(self, factors: List[int]):
        self.factors = factors
    
    def do_nothing(self) -> None:
        return None
    
class Victor3:
    def __init__(self, composite: int):
        self.composite = composite
    
    def verify(self, nothing: None) -> bool:
        return False

    
peggy = Peggy3(factors)
victor = Victor3(composite)
nothing = peggy.do_nothing()

if victor.verify(nothing):
    print("Victor is convinced 👌")
else:
    print("Victor is not convinced 🤨")

How do both parties like this protocol?

Peggy is angry 😡 Victor is happy 😊

What went wrong?

**Peggy can never convince Victor!** He unconditionally rejects anything, even when Peggy tells the truth ❌

In an interactive proof, **Peggy should have a fair chance of convincing Victor if she is honest.** This is called **completness**.

# Peggy's fair protocol

Peggy creates an alternative protocol that is fair to both parties.

She wants to convince Victor when she is honest. On the other hand, Victor wants to expose any lies. The proof should be compact. These requirements need to be reconciled.

Peggy sends a coprime $a$ of $n$ and the [Euler Totient function](https://en.wikipedia.org/wiki/Euler%27s_totient_function) $p = \varphi(n)$ evaluated at $n$ to Victor.

Victor accepts if $a^{bp} \equiv 1 \mod n$ holds for a random integer $1 \leq b < n$.

# Why it works

Peggy sends two integers, which is constant size, so the proof is compact ✅

The rest makes use of some clever group theory ✨

The coprimes of $n$ form a multiplicative group. The order of this group is equal to $\varphi(n)$. If we exponentiate any element $a$ by the group order $\varphi(n)$ then we end up at the identity 1. So $a^{bp} \equiv 1 \mod N$ for all $b \in \mathbb{N}$. _This is less advanced than it seems, although I will not go into the nitty gritty details._

If Peggy is honest, then both sides of Victor's equation will be equal and he will accept ✅

If Peggy is dishonest, then she cannot compute $\varphi(n)$ because she doesn't know the (entire) factorization. So $p \neq \varphi(N)$. In that case, $a^{bp} \mod N$ is likely to be different from 1. _No formal proof here._ Anyway, Victor successfully rejects ✅

In [None]:
class Peggy4:
    def __init__(self, factors: List[int]):
        self.factors = factors
    
    def element_euler_totient(self) -> Tuple[int, int]:
        a = get_coprime(self.factors)
        p = euler_totient(self.factors)
        return a, p
    
class Victor4:
    def __init__(self, composite: int):
        self.composite = composite
    
    def verify(self, a: int, p: int) -> bool:
        b = random.randrange(composite)
        return pow(a, b * p, composite) == 1


peggy = Peggy4(factors)
victor = Victor4(composite)
a, p = peggy.element_euler_totient()

if victor.verify(a, p):
    print("Victor is convinced 👌")
else:
    print("Victor is not convinced 🤨")

How do both parties like this protocol?

Peggy is happy 😊 Victor is happy 😊

This looks pretty good. Is there anything left that we could improve?

Peggy leaks $\varphi(N)$ to Victor, which is hard to compute without knowing $N$. **Peggy leaks part of her knowledge to Victor.** In the next sections we will formalize "knowledge" and introduce zero-knowledge proofs that don't leak any knowledge.

# What humans "know"

[Human knowledge](https://en.wikipedia.org/wiki/Epistemology) is a highly debated topic, but we will not go into deep philosophical debates here.

Humans learn from sources of knowledge such as perception, intuition, reasoning and revelation. Some of our knowledge is already "built-in" when we are born, some is learned during our lifetime.

The details don't matter right now. We start with some knowledge (it could be nothing) and we add more knowledge to this pool. We can also remove knowledge from this pool by forgetting it.


# What algorithms "know"

[We can apply this model to algorithms](https://arxiv.org/pdf/1108.1791.pdf):

An algorithm starts with some intrisic knowledge. It adds more knowledge by reasoning or by interacting with its environment.

The "intrinsic knowledge" are the initial parameters of the algorithm.

"Interactions with the environment" are function calls to third parties.

"Reasoning" are **efficient** computations on the existing knowledge base. Efficient computations are carried out in polynomial time This forces the algorithm to make "smooth" derivations based on what it already knows. If we allowed exponential computations, then the algorithm would have an almost divine sense why certain things are true.

# Polynomial computations

[Polynomial computations](https://en.wikipedia.org/wiki/P_(complexity)) are seen as easy / feasible / efficient. Finding a path between two nodes in a graph is easy. Verifying the solution to a sudoku is easy. Verifying the integrity of a Bitcoin block is easy.

Humans and computers have no problem solving these kinds of problems.

I am sure you have solved many such problems in your lifetime.

# Exponential computations

The opposite are [exponential computations](https://en.wikipedia.org/wiki/NP_(complexity)) which are seen as hard / infeasible / inefficient. Finding the shortest path that visits all nodes in a graph exactly once is hard. Solving a sudoku is hard. Mining a Bitcoin block is hard.

Both humans and computers struggle to solve these problems.

Have you ever tried to mine a Bitcoin block with pen and paper?

# Polynomial / exponential gap

An algorithm can learn polynomially, but not exponentially. Exponential learning would be cheating. Not only would the algorithm be a genius, but it would be a genius who sits there computing for astronomical time spans to learn a single piece of knowledge! When the algorithm has finished, the universe will long have collapsed!

Exponential learning is not a good model for computational knowledge.

# Peggy's private protocol

Peggy went [online](https://www.zkdocs.com/docs/zkdocs/zero-knowledge-protocols/short-factoring-proofs/) to find a factoring proof that doesn't leak any knowledge. (Victor doesn't care about Peggy's privacy.)

This time there is a blinding factor $r$ that hide $\varphi(N)$ from Victor.

Peggy sends two integers $a = b^r$ and $b$ to Victor. Victor challenges Peggy with a random integer $e$. Peggy computes a value $c$ such that the equation $a \equiv b^{c - Ne} \mod N$ holds. Victor accepts if the equation holds.

# Why it works

Four integers are still constant size ✅

The integers $a$ and $b$ are elements of a multiplicative group whose order is $\varphi(N)$. The integer $c$ is defined in such a way that the exponent $c - Ne$ simplifies to $r + \varphi(N)e$. Because $\varphi(N)$ is the group order, we have $b^r \equiv b^{r + \varphi(N)e} \mod N$.

The equation holds and Victor accepts if Peggy is honest ✅

The equation likely doesn't hold and Victor rejects if Peggy is dishonest ✅

Because Victor sees values that are blinded by $r$, he sees pseudorandom values. He has no way to derive $\varphi(N)$ based on what he saw ✅

In [None]:
class Peggy5:
    def __init__(self, composite: int, factors: List[int]):
        self.composite = composite
        self.factors = factors
    
    def elements(self) -> Tuple[int, int]:
        self.r = random.randrange(self.composite)
        b = get_coprime(self.factors)
        a = pow(b, self.r, self.composite)
        return a, b
    
    def respond(self, e: int) -> int:
        p = euler_totient(self.factors)
        c = self.r + (self.composite + p) * e
        return c
    
class Victor5:
    def __init__(self, composite: int):
        self.composite = composite
        
    def challenge(self, a: int, b: int) -> int:
        self.a = a
        self.b = b
        self.e = random.randrange(self.composite)
        return self.e
    
    def verify(self, c: int) -> bool:
        return self.a == pow(self.b, c - self.composite * self.e, composite)


peggy = Peggy5(composite, factors)
victor = Victor5(composite)
a, b = peggy.elements()
e = victor.challenge(a, b)
c = peggy.respond(e)

if victor.verify(c):
    print("Victor is convinced 👌")
else:
    print("Victor is not convinced 🤨")

# Zero knowledge

**An interactive proof is zero-knowledge if Peggy doesn't leak any knowledge to Victor.**

This seems impossible since Victor sees the messenges that Peggy sends him. Doesn't that necessarily leak knowledge?

Let's rephrase the definition: **Victor doesn't learn anything that he doesn't already know.**

# Victor learns from himself

Victor knows what he can compute in polynomial time. Assume that he learns $X$ during his exchange with Peggy. We prove that Victor already knew $X$ **at the start** because he could have computed it in polynomial time! This contradicts our assumption that Victor learned $X$ **after the start**. Therefore, Victor cannot learn anything. Therefore, the interactive proof is zero-knowledge.

How does Victor compute $X$? According to information theory, $X$ must come from the exchanged messenges. We call these messenges "transcript". We show that Victor can compute the transcript before he starts talking to Peggy!

Imagine Peggy and Victor are sitting in an exam. Victor copies the answers from Peggy's sheet. What he doesn't know, Peggy memorized the answers from Victor's notes! In the end, Victor copies from himself. If we generously assume that Victor doesn't forget what he wrote in his notes, then he already knows the answers that he is copying!

# Victor learns from the distribution that he computed

Interactive proofs are probabilistic, so the situation is slightly more complicated.

The real transcripts between Peggy and Victor form a probability distribution. We show how Victor can compute random **fake transcripts that follow the same distribution**. This happens without any contact with Peggy.

Any knowledge that Victor extracts from the real transcripts forms a distribution. The extraction must work all the time, not just once, so there is a polynomial-time extraction algorithm. Victor can apply the same algorithm on the fake transcripts, which will form the **same distribution of extracted knowledge**.

The real transcripts don't provide anything new to Victor. He already knows anything that can be extracted from the fake transcripts. This is the same as what can be extracted from the real transcripts. So **Victor doesn't learn anything he doesn't already know**.

Imagine Peggy and Victor in an exam. Random questions are asked. Victor copies from Peggy. Peggy memorized the answers from Victor's notes, which include answers to all possible questions (how handy). The answers that Victor copies follow some distribution. There is a way for Victor to shortcut the process: Copy the answers directly from his notes. Anything that Victor copied, he copied from himself. He learned nothing new!