In [1]:
 %load_ext autoreload


# <center><h1>From ZK to Bulletproofs</h1></center>

# I Motivation

**Prove that some condition holds true, without revealing anything else, but even better - do it using only a tiny amount of data!**

# II Commitments

Cryptographic commitments usually used in a situation **where you want to promise something is true before later proving it to be true**.

Commitments **are only possible because of existence of one-way functions'**

E.g. If Alice want to commit to the value "2", She can send Bob its hash (53c234e5e8472b6ac51c1ae1cab3fe06fad053beb8ebfd8977b010655bfdd3c3), the hash is the commitments of value "2"



If Alice always send same commitment for value "2", it's lack of **Semantic Security**, so instead of commit to "2", Alice may commit to "2" + some random data(salt). The commitments may looks like $\mathcal{H}(secret||salt)$

Key properties of any commitment scheme:

* **Hiding** - a commitment $C$ does not reveal the value it commit to.

* **Binding** - have make the commitment $C(m)$ to $m$, you can't change your mind and open it as commitment to a different message $m'$

# III Homorphic and Pedersen

### A Pedersen commitment in elliptic curve form

$$C=rH+aG$$

$H$ is a curve point, for which nobody knows the discrete logarithm $q$ s.t. $H = qG$.

In [2]:
from klefki.types.algebra.concrete import EllipticCurveCyclicSubgroupSecp256k1 as Cruve
from klefki.types.algebra.concrete import EllipticCurveGroupSecp256k1 as ECG
#from klefki.types.algebra.concrete import FiniteFieldCyclicSecp256k1 as CF
from klefki.types.algebra.concrete import FiniteFieldSecp256k1 as CF


G = Cruve.G

In [3]:
import random

N = 0xfffffffffffffffffffffffffffffffe
def random_cf() -> CF: return CF(random.randint(1, N) % CF.P)

In [4]:
q = random_cf()
H = G @ q

In [5]:
def C(r: CF, a: CF): return (H@r + G@a)

### Homomorphism of Pedersen commitment

$$
C(r_1, a_1)+C(r_2, a_2) = r_1H+a_1G+r_2H+a_2G=(r_1+r_2)H+(a_1+a_2)G=C(r_1+r_2, a_1+a_2)
$$

In [6]:
r_1, a_1, r_2, a_2 = random_cf(), random_cf(), random_cf(), random_cf()

In [7]:
assert C(r_1, a_1) + C(r_2, a_2) == H@r_1 + G@a_1 + H@r_2 + G@a_2

In [8]:
assert C(r_1, a_1) + C(r_2, a_2) == C(r_1 + r_2, a_1 + a_2)

### The Vector Pedersen Commitment

### NUMS-ness and binding

NUMS sands for "Noting Up My Sleeve". ref: https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

A easy way is to use a hash fuction like SHA256.

For $C = rH_aG$ to find two scalar values $s,b$ that $s\neq r, b\neq a$ is impossible, unless the ECDLP(Ellipic Curve discrete Logarithm Problem) had cracked.

To extend to a more powerful form of the Pedersen commitment already deﬁned, we go from:
    $$C = rH + aG$$
    to:
        $$C=rH + \sum_{i=0}^n v_iG_i = rH + \mathbf{v}\mathbf{G}$$

The individual G i s can be constructed using a simple algorithm of the form already mentioned (like, take a $\mathcal{H}(encode(G)||i)$ where $\mathcal{H}$ represents some hash function).

Not: we can extend this yet further to create commitments to 2 or even multiple vectors; 

We just need a set of N NUMS base curve points for each vector we’re committing to, for example (2 vectors):

$$
C=rH+\mathbf{vG}+\mathbf{wH}
$$

Note here that the single curve point H is not part of the vector H.

In [9]:
from klefki.types.algebra.utils import encode, decode
from hashlib import sha256

In [10]:
from functools import reduce

def map2curve(x: CF):
    return G @ x

secret = "sec"
Gs = list(map(map2curve,
              (map(lambda x: int(x, 16), 
                  (map(lambda x: sha256(x.encode()).hexdigest(), sha256(secret.encode()).hexdigest()))))))


def vgm(a: [CF]) -> [ECG]:
    return reduce(lambda x,y: x+y,
                  list(map(lambda a: a[0] @ a[1], zip(Gs, a))))

def Cv(r:CF, a:[CF]):
    return (H@r + vgm(a))

### Homomorphism

$$
C(r_1, \mathbf{v}_1)+C(r_2, \mathbf{v}_2) = r_1H+\mathbf{v_1}\mathbf{G}+r_2H+\mathbf{v}_2\mathbf{G}=(r_1+r_2)H+(\mathbf{v}_1+\mathbf{v}_2)\mathbf{G}=C(r_1+r_2, \mathbf{v}_1+\mathbf{v}_2)
$$

In [11]:
r_1, r_2 = random_cf(), random_cf()
v_1, v_2 = [random_cf() for i in range(0, 5)], [random_cf() for i in range(0, 5)]

In [12]:
assert Cv(r_1 + r_2, list(map(lambda x: x[0] + x[1], zip(v_1, v_2)))) == Cv(r_1, v_1) + Cv(r_2, v_2)


### Perfect Hiding and Perfect Binding are incompatible

The perdersen commitment is property as "perfeact" or "satistical" hiding

But it's **not** binding, if **you know the discrete log of**.

Consider, if I have commitment $C=rH+aG$, there is another $r'$ s.t. $C=r'H+a'G$ for any chosen $a'$;

It’s just that we can’t ﬁnd that $r'$ without the ability to crack the ECDLP. But the fact that it even exists means that the commitment is a valid commitment to any value $a$, for the right $r$.

the Pedersen commitment’s binding is not perfect – it is “computational”. What this means is that, much as discussed just above, in a case where the ECDLP was cracked, the binding property could fail and people could open the commitment to other values.

# IV Zero Knowledge argument of knowledge set of vectors

There are three key properties of any zero knowledge proof:

	• Completeness
	- Does an honest Prover successd in convincing the Verifier?

	• Soundness
	- Does the Prover actually **prove** the truth of the statement.

	• Zero-Knowledgeness
	- Can we reveal that the Prover reveals nothing else than that the statement is true.
	

An “argument of knowledge” is a technical term distinguished from “proof of knowledge”.

The proof is only computational – an adversary with enough computing power may be able to convince you that he knows the secret value(s), even if he doesn’t.

Here just assume that **Veriﬁer of the proof will interact with the Prover in real time.**

Our argument of knowledge will come after we have generated a set of commitments for each of m vectors $x_1 , x_2 , \cdots , x_m$ , each of the same dimension $N(\neq m)$).Explicitly:

$$
C_1=r_1H + x_1G\\
C_2=r_2H + x_2G\\
\vdots\\
C_m=r_mH+x_mG\\
$$

In [13]:
m = 5
N = 7
x: [[CF]] = [[random_cf() for j in range(0, N)] for i in range(0, m)]
r: [CF] = [random_cf() for i in range(0, m)]
q: CF = random_cf()
H: ECG = G@q
C: [ECG] = list(map(lambda x: H @ x[0] + vgm(x[1]), zip(r, x)))

#### The process:

1. $P\rightarrow V: C_0$ (0 (a new commitment to a newly chosen random vector of dimension $N$)

2. $V\rightarrow P: e$ (a random scalar)

3. $P\rightarrow V: (\mathbf{z}, s)$(a single vector of dimension N, and another scalar)

$$
z=\sum_{i=0}^m e^ix_i,\ s=\sum_{i=0}^m e^ir_i
$$

Note that the summations start at 0; this means that the sums include the extra, random commitment, indexed 0, that was created as the ﬁrst step of the interaction.

for $z=\sum_{i=0}^m e^ix_i$, $z_n = x_0n + e*x_1n + . . . e^m*x_mn$, aht addition hides the individual values.

In [14]:
m = 5
N = 7
x: [[CF]] = [[random_cf() for j in range(0, N)] for i in range(0, m)]
r: [CF] =  [random_cf() for i in range(0, m)]
q: CF = random_cf()
H: ECG = G@q
C: [ECG] = list(map(lambda x: H @ x[0] + vgm(x[1]), zip(r, x)))

from functools import reduce
P, V = {}, {}

# Step 1
V['C_0'] = C[0]

# Step 2
V['e'] = random_cf()
P['e'] = V['e']

# Step 3
# z: a single vector of dimension N
# z_n = x_0n + e*x_1n + . . . e^m*x_mn ; 

P['z'] = [
    reduce(lambda x, y: x + y , [x[i][j] @ (P['e'] ** (i)) for i in range(0, m)])
    for j in range(0, N)
]

P['s'] = reduce(lambda x, y: x + y, [r[i] @ (P['e'] ** (i)) for i in range(0, m)])
V['z'], V['s'] = P['z'], P['s']


Having received this $(z, s)$, the verifer of course needs to verify whether the proof is valid. He does the following:

$$
\sum_{i=0}^m e^iC_i \stackrel{?}{=} sH+\mathbf{z}\mathbf{G}
$$

In [15]:
assert reduce(lambda x, y: x+y, [C[i] @ (P['e'] ** (i)) for i in range(0, m)]) == H @ V['s'] + vgm(V['z'])

### Completness

For RHS:

\begin{align*}
\sum_{i=0}^m e^iC_i &= sH+\mathbf{zG} \\&=\sum_{i=0}^m e^i (r_iH)+\sum_{i=0}^m x^i\mathbf{x_iG} \\
&=\sum_{i=0}^m (r_iH+\mathbf{x_iG})\\
&=\sum_{i=0}^m e^iC_i
\end{align*}

### Zero-knowledgeness

We deal with zero knowledgeness before soundness, because the latter is the harder proof.


If the distribution of transcripts of the conversation between Prover and Veriﬁer, in the case where the **veriﬁer’s execution environment is controlled** and it is run by a notional entity called a “**Simulator**”, and we can simulate a proof **without** actually having the knowledge, is the same distribution as that obtained for genuine conversations with Prover(s) who do know the opening of the vector commitments, it follows that the **Veriﬁer learns zero** from the interaction other than the aforementioned single bit.

def:

A “witness” is a piece of (usually secret) data corresponding to a “statement” which the Prover possesses but does not want to reveal.

#### E.g. : Schnorr’s identity protocol

Prover starts with a public key P and a corresponding private key $x$ s.t. $P = xG$.
Prover wishes to prove in zero knowledge, that he knows $x$.

1. $P\rightarrow V:xG$
2. $P\rightarrow V:R$ (a new random curve point, but $P$ knows $k$ s.t. $R=kG$)
3. $V\rightarrow P:e$ (a random scalar)
4. $P\rightarrow V:s$ (which $P$ calcuated from the quation $s=k+ex$)

Note: the transcript referred to above, would here be: $(R, e, s)$.

In [16]:
P, V = {}, {}
x = random_cf()
# step 1
P['x'] = x
V['P'] = G @ P['x']

# step 2
P['k'] = random_cf()
V['R'] = G @ P['k']

# step 3
V['e'] = random_cf()
P['e'] = V['e']

# step 4
P['s'] = P['k'] + P['e'] * P['x']
V['s'] = P['s']

transcript = V['R'], V['e'], V['s']

Veriﬁcation works fairly trivially: veriﬁer checks $sG \stackrel{?}{=} R+eP$.


In [17]:
assert G @ V['s'] == V['R'] + V['P'] @ V['e']

It may be entirely valid to the interacting Veriﬁer, but entirely **meaningless** (as in this case) to a third party who is shown the transcript later.

##### zero knowledgeness proof:


The “Simulator”, which controls the execution of the veriﬁer, given the public key P, just as the Veriﬁer would be, can **fake** a valid transcript as follows:

Choose $s$ randomly. Then, choose $e$, also randomly. Finally, we only need to choose $R$ to create a complete conversation transcript; it must be $R = sG − eP$. 

$$
R=sG-eP\\
kG=sG-eP\\
sG=kG+eP\\
sG=kG+xeG\\
s=k+ex\\
$$

Then we have successfully simulated a conversation which is entirely valid: $(R, e, s)$, without ever knowing the secret key $x$, and which it’s easy to see is randomly distributed in the same way as a real one would be ($R$ is a free variable).

In [18]:
s, e = random_cf(), random_cf()
R = G @ s - V['P'] @ e

In [19]:
assert G @ s == R + V['P'] @ e

Another way of looking at it is that this kind of proof is **deniable** –

**it may be entirely valid to the interacting Veriﬁer，but entirely meaningless (as in this case) to a third party who is shown the transcript later.**

##### For Vector proof of knowledge case:

The conversation transcipts looks like: $(C_0, e, (\mathbf{z}, s)$, which is almost the same, except that the ﬁnal portion is a vector + a scalar instead of a single scalar.

And so the same reasoning applies: **a Simulator can fake the transcript by choosing out of order**.

involved issue here: you choose $(\mathbf{z}, s)$ both at random, as well as $e$, and you can deduce the right value of the point:

$$
C_0 = (sH+zG) - \sum_{i=1}^m e^iC^i
$$

The $C 1 , C 2 , . . . , C$ m are all set in advance.

Now we try to do the prove via random $\mathbf{z}, s, e$

In [20]:
z = [random_cf() for i in range(0, N)]
s = random_cf()
e = random_cf()

In [21]:
q: CF = random_cf()
H: ECG = G@q

In [22]:
C_0 = H @ s + vgm(z) - reduce(lambda x, y: x + y, [C[i] @ e ** (i) for i in range(1, m)])

$$
\sum_{i=0}^m e^iC_i \stackrel{?}{=} sH+\mathbf{z}\mathbf{G}
$$

$$
C_0 + \sum_{i=1}^m e^iC_i \stackrel{?}{=} sH+\mathbf{z}\mathbf{G}
$$

We provide $(C_0, e, (\mathbf{z}, s))$

In [23]:
assert reduce(lambda x, y: x+y, [([C_0] + C[1:])[i] @ (e ** (i)) for i in range(0, m)]) == H @ s + vgm(z)

**The entire transcript will look valid to a third party**

### Knowledge sondness - does a verifying interaction actually prove knowledge of the vectors?

Proving “soundness” is somehow complementary/“opposite” to proving zero knowledge, in the following sense: the idea here is to isolate/control the oper- ation of the Prover, as a machine, rather than isolate the veriﬁer.

If we can control the Prover’s environment and by doing so get him to spit out the secret information (the “witness”), it follows that he must have it in the ﬁrst place!


* God (The Extractor) Stealing the secret from the Prover(Machine)
    - **imagine the Prover is literally a function. You can start running it, stop at any point (imagine debugger breakpoints). Crucially you can make copies of its current state.**

#### E.g. In the Schnorr identity protocol case:

we get the Prover to run twice, but only after the same initial random point $R$. So imagine I as “Extractor” (what we call the “God” controlling the Prover) create two runs of the protocol with two diﬀerent values $e_1$ , $e_2$ against the same initial $R$, then:

\begin{align*}
s_1 = k+e_1x\\
s_2=k+e_2x
\end{align*}

$$
\rightarrow x = \frac{s_1-s_2}{e_1-e_2}
$$


Secret $x$ is **LEAK**!

Thus, the Extractor get the secret key in two runs of the protocol that happened to share the same “nonce” point $R$ (remember, $R = kG$ and it’s the same in both runs here). This is such a widely known “exploit” of both Schnorr and ECDSA signatures (when wrongly implemented)

#### Back to Knowledge of set of vector case

* First point:

We need to get the Prover to output not just two transcripts, but m+1. This will be enough to prevent the system of equations from being underdetermined, i.e. it will give us a unique solution.

we have the Extractor start the Prover, who gener- ates here a $C_0$ , then provide it with a random challenge $e$, then retrieve from it a pair $(\mathbf{z}, s)$. Assuming that this response is valid, we can repeat the process, a total of $m + 1$ times, resulting in this set of transcripts:

In [27]:
def gen_ts():
    z = [random_cf() for i in range(0, N)]
    s = random_cf()
    e = random_cf()
    C_0 = H @ s + vgm(z) - reduce(lambda x, y: x + y, [C[i] @ e ** (i) for i in range(1, m)])
    return (C_0, e, (z, s))

In [30]:
ts_set = [gen_ts() for i in range(0, m+1)]

Then Extractor start to constructing the Vandermonde matrix:

In [32]:
from numpy import matrix

In [33]:
A_rev = [[ts[1] ** i for i in range(0, m + 1)] for ts in ts_set]

$$
\mathbb{A}^{-1}= \begin{bmatrix}
    1       & e_0 & e_{0}^2 & \dots & e_0^m \\
    1       & e_1 & e_1^2 & \dots & e_1^m \\
    \vdots \\
    1       & e_m & e_m^2 & \dots & e_m^m
\end{bmatrix}
$$

The Vandermonde matrix, acting on the col- umn vector of a set of coeﬃcients of a polynomial, outputs a new column vec- tor which represents the evaluation of that polynomial at each of the points.

this means the $inverse$ of that matrix, if it $exists$, therefore maps a set of $m+1$ polynomial evaluations (the polynomial here has degree $m$), back to its set of coeﬃcients, and most crucially that mapping is one-one and therefore the solution is unique.

**a set of $N + 1$ evaluations ﬁxes a polynomial of degree $N$.**

##### To continue ....

# V An inner product proof

In Groth’s paper, he presents the core algorithm, which probably- not-coincidentally is also the core of Bulletproofs. The inner product proof here uses all the same elements as we’ve discussed above, although in a slightly more complicated structure.

It starts by assuming that the Prover has two vectors $x$ and $y$, and obviously knows the inner product of those, which we’ll now call $z$.

The Prover’s job will now be to convince the veriﬁer that Pedersen commitments to these three quantities obey $z = \mathbf{x}· \mathbf{y}$; so we assume upfront that the three commitments are known, we’ll call them from now $Cz , C_x , C_y$ :

$$
C_z=tH+zG\\
C_x=rH+\mathbf{xG}\\
C_y=sH+\mathbf{yG}
$$

### Aside: the Sigma protocol

1. $P\rightarrow V:R$ (a new random curve point, but $P$ knows $k$ s.t. $R=kG$.

2. $V \rightarrow P: e$ (a random salar)

3. $P \rightarrow V: s$ (which $P$ calculated from eq. $s=k+ex$

===== And =====

1. $P \rightarrow V: C_0$ (a new commitment to a newly chosen random vector of dim N)

2. $V \rightarrow P: e$ (a random scalar)

3. $P \rightarrow V:(\mathbf{v}, s)$ (a single vector of fim $N$, and another scalar)

(the ﬁrst was Schnorr’s identity protocol; the second was the proof of knowledge of a set of vectors).

**These are both examples of Sigma protocols**, so called because of a vague resemblance to the greek letter $\Sigma$, in that the process goes forwards once, then backwards, then forwards ﬁnally. The common pattern, though, is more than this three step interactive process. We generalise it as something like:

1. $P\rightarrow V:$ commitment

2. $V \rightarrow P$: callange

3. $P \rightarrow V$: response (proof)


### The commitment step  of inner product proof