# The Dutch eHealth use case

Full report [Verheul et al 2016](https://eprint.iacr.org/2016/411.pdf) and [summary](https://redasci.org/wp-content/uploads/2017/02/pep-informal.pdf).

* Crypto
    * Introduces polymorphic encryption and polymorphic pseudonymization to manage sensitive personal data.
    * Data can be encrypted and stored in the cloud without having to pre-determine who can decrypt these data
    * When required, a set of decryption keys can be determined
    * The data can be tweaked to be decryptable by a specific party in a blinded manner
* Third parties
    * Relies on a trusted third party who can tweak the ciphertext to the intended party
    * Requires a trusted third party that stores a master key
* Privacy
    * The data subject is pseudonymized in a polymorphic way, meaning that each data subject has a different pseudonym at different parties and can only be de-pseudonymized by participants who know the original identity
    * Aims to protect against colluding recipients on the basis of a known identifier
    * Does not protect against colluding actors in general. 
        * Even the master private key can be retrieved using $K_A^{-1} \cdot x_A = x$
    * The PEP framework only concentrates on cryptographic protection of identifiers.
* User control
    * "the user is not in complete control over his/her data." (p. 10)

My initial thoughts:

1. Interesting approach that uses a lot of crypto primitives (e.g., blinding) that the "SSI" space also uses.
2. Relies heavily on a Trusted Third Party who will know what data is sent to where (e.g., it knows exactly when the data subject sends data to the Storage Facility) and can thus profile the data subject. Several questions emerge:
    * Check if the TTP can be malicious or if it needs to be trusted
    * Can the TTP run in a decentralized way?
    * What if the recipient and the TTP colludes?
    * Could a relay solve the TTP problem?
    * Can the data subject control the TransCryptor?
3. Similar to a [proxy re-encryption scheme](https://en.wikipedia.org/wiki/Proxy_re-encryption#:~:text=Proxy%20re%2Dencryption%20(PRE),may%20be%20decrypted%20by%20another.) with blinding.
4. The same pseudonym is used for every time the data subject wishes to interact with a recipient. This means that the recipient can profile the data subject. This is also mentioned on page 10. The aim is not privacy in general, but anti collusion based on identifiers.
5. Any actor who wants to store data associated with a $\texttt{pid}$ does so in a way that the TransCryptor can know.
6. Unclear how the private keys of participants are generated. Figure 2.1. states that $x_A = K_A \cdot x$ and that it is only $A$ who knows $x_A$, but
    * $K_A \in \mathbb{F}_p$ is known to TransCryptor only
    * $x \in \mathbb{F}_p$ is known to the Key Server only

## Prelims

* Discrete log based. I will follow the original text and implement an ECDLP.
* For group $G$ with generator $g$ you have:
* Public key $y = xg$
* Message $M \in G$
* Denote ElGamal encryption as $\mathcal{EG}(r,M,y) = \langle rg,ry+M,y \rangle = \langle b,C\rangle$ where $r$ is a random value
* Decryption step requires knowledge of $x$ to do $C-x \cdot b = (rxg + M) - x(rg)$


Note that this is a malleable scheme and that the above is blinded (no two encryptions look the same even if they are both decrypted with $x$).

In [1]:
from Crypto.Util import number
from ecpy import curves

In [2]:
# curves.Curve.get_curve_names()
curve = curves.Curve.get_curve('Ed25519')

In [3]:
# Setup is typical EC setup
g = curve.generator
p = curve.order

x = number.getPrime(p.bit_length()-1)
y = x * g

In [4]:
# Encryption

M = number.getRandomRange(2, p) * g
r = number.getRandomRange(2, p)

enc = [r*g, r*y+M]

In [5]:
(enc[1] - enc[0] * x).x == M.x

True

## ElGamal Manipulations

$\mathcal{EG}(r,M,y)= \langle rg,ry+M,y\rangle$

Three operations, all in $G^3 \times \mathbb{F}_3 \longrightarrow G^3$

### re-randomize ($\mathcal{RR}$)

Changes apparence but not the content of ciphertext (in essence blinding). With new $s \in \mathbb{F}$:

$\mathcal{RR}(\langle b,C,y \rangle, s) := \langle sg+b, sy+C,y \rangle$

$
\begin{align*}
\mathcal{RR}(\mathcal{EG}(r,M,y),s) &= \mathcal{RR}(\langle rg,ry+M,y\rangle,s)\\
&= \langle sg + rg, sy+ry+M,y \rangle\\
&= (s+r)g, (s+r)y +M, y\rangle\\
&= \mathcal{EG}(s+r,M,y)
\end{align*}
$

In [6]:
## Re-randomization

s = number.getRandomRange(2, p)

rand_enc = [s*g + enc[0], s*y + enc[1], y]

(rand_enc[1] - rand_enc[0] * x).x == M.x

True

### re-key ($\mathcal{RK}$)

Changes who can decrypt. With a a new key $k \in \mathbb{F}_p$:

$\mathcal{RK}(\mathcal{EG}(r,M,y),k)= \mathcal{EG}(r \cdot k^{-1}, M, ky)=\langle r \cdot k^{-1}\cdot g, r \cdot k^{-1} \cdot ky + M, ky \rangle = \langle \frac{rg}{k}, ry + M, ky \rangle$

Decrypt with adapted private key $kx$

$ry + M - \frac{kx \cdot rg}{k} = ry + M - xgr = M$

In [7]:
# Re-keying

k = number.getRandomRange(2,p)
k_inv = pow(k,-1,p)

k_enc = [r*g*k_inv, r*y + M]

(k_enc[1] - k_enc[0] * x*k).x == M.x

True

### re-shuffle ($\mathcal{RS}$)

Raise the plaintext to a certain power (an encryption of $n\cdot M$ with random $n \cdot r$). With $n \in \mathbb{F}_p$

$\mathcal{RR}(\langle b,C,y \rangle, n) := \langle nrg, n(ry + M),y \rangle = \langle nb, nC, y \rangle$

Decrypt with $x$ to get $n\cdot M$

Used to create a local pseudonym $\texttt{pid}_A@X$

In [8]:
n = number.getRandomRange(2,p)

shuff_enc = [n*enc[0], n*enc[1]]

(shuff_enc[1] - x * shuff_enc[0]).x == (n*M).x

True

### Algebraic properties

1. $\mathcal{RK}$ and $\mathcal{RS}$ commute: $\mathcal{RK}(\mathcal{RS}(\langle b,C,y\rangle,n),k) = \mathcal{RS}(\mathcal{RK}(\langle b,C,y \rangle, k),n)$ 
2. $\mathcal{RR}(\mathcal{RR}(\langle b,C,y\rangle, s), s') = \mathcal{RR}(\langle b,C,y\rangle, s+ s')$

## Polymorphic encryption via re-keying

* Master key $x \in \mathbb{F}_p$
* Master public key $y = x \cdot g \in \mathbf{G}$
* Each participant $A$ has a private key $x_A=K_A \cdot x$
    1. TransCryptor (TTP) stores the table of pairs $(A, K_A)$ in HSM
* Each participant $A$ has a public key $y_A=K_A \cdot y$

Polymorphic encryption of $D$ is $\mathcal{EG}(r,D,y)$ with master public key $y$. 

* Anyone can encrypt data in this way
* The TransCryptor re-keys the ciphertext to participant A via $\mathcal{RK}(\mathcal{EG}(r,D,y), K_A) = \mathcal{EG}(\frac{r}{K_A}, D, y_A)$


## Polymorphic pseudonymization via re-shuffling

SF is short for "Storage Facility"

* Each holder $B$ has a personal identifier $\texttt{pib}_B \in \mathbf{G}$
* $B$'s local pseudonym at $A$ is $\texttt{pib}_B @ A=S_A \cdot \texttt{pid}_B$
    1. Only the TransCryptor knows the pairs (A,S_A)
    2. The polymorphic pseudonym of $B$ is $\mathcal{EG}(r, \texttt{pid}_B, y)$ 
* TransCryptor reccieves $B$'s data
    1. The TransCryptor re-shuffles and re-keys PP to the local pseudonym $\texttt{pid}_B@SF = S_{SF} \cdot \texttt{pib}_B$
    2. $\mathcal{RK}(\mathcal{RS}(\mathcal{EG}(r, \texttt{pid}_B, y), S_{SF}), K_{SF})=\mathcal{EG}(\frac{S_{SF} \cdot r}{K_{SF}}, S_{SF} \cdot \texttt{pid}_B, K_{SF} \cdot y) = \mathcal{EG}(S_{SF} \cdot r, \texttt{pid}_B @ SF, y_{SF})$
    3. SF decrypts and uses this local pseudonum $\texttt{pid}_B @ SF$ as a database key to store the polymorphically encrypted data of $B$
* If a recipient $A$ wants to retrieve $B$'s data:
    1. $A$ sends PP $\mathcal{EG}(r,\texttt{pid}_B, y)$ to the TransCryptor, who re-keys and reshuffles it to SF, who obtains his local pseudonym of $B$ and looks up and returns the requested data, which gets re-keyed to $A$

## Proposed user flow

1. Data generating device controlled by data subject $A$ contains $\texttt{pid}_A$
2. Device encrypts data and $\texttt{pid}_A$ using ElGamal and sends to TransCryptor
3. The TransCryptor re-shuffles $\texttt{pid}_A$ to $\texttt{pid}_A @ SF$ where $SF$ is storage facility.
4. The TransCryptor re-keys the encrypted $\texttt{pid}_A @ SF$ so that SF can open it.

In [58]:
# Device data and pid for data subject A
data = number.getRandomRange(2, p)*g
pid_A = 198509096577*g

# Device encrypts before sending to TransCryptor
r_data = number.getRandomRange(2,p)
r_pid_A = number.getRandomRange(2,p)

enc_data = [r_data*g, r_data*y + data, y]
enc_pid_A = [r_pid_A*g, r_pid_A*y + pid_A, y]

assert (enc_data[1]-enc_data[0]*x).x ==data.x
assert (enc_pid_A[1]-enc_pid_A[0]*x).x ==pid_A.x

# TransCryptor re-shuffles pid to create local pseudonym pid_A @ SF
n_SF = number.getRandomRange(2,p)
pid_A_at_SF = [n_SF*enc_pid_A[0], n_SF*enc_pid_A[1]] # This is pid_A @ SF

assert (pid_A_at_SF[1] - x * pid_A_at_SF[0]).x == (n_SF*pid_A).x

# Re-key for SF so that SF can decrypt
k_SF = number.getRandomRange(2,p) # Who generates k? Key Server?
k_SF_inv = pow(k_SF,-1,p)
k_SF_enc = [pid_A_at_SF[0] * k_SF_inv, pid_A_at_SF[1]] # TransCryptor knows k_SF_inv and n_SF

assert (k_SF_enc[1] - k_SF_enc[0] * x*k_SF).x == (n_SF*pid_A).x

Note that the user seemingly does not control the keys required for decryption or re-keying. This means that the Key Server is in control and a colluding Key Server / TranCryptor can decrypt any and all of the data subject's data. Nothing technical stopping such a collusion. In particular:

* Any other data generating device, or data source (e.g., a doctor) will have to know the true pid of the data subject before sending it to the TransCryptor.
* De-anonymization is trivial and outside of user control.