# Diffy Crypt Writeup

## The Network

To start solving this challenge, let's take a look at our encryption network:

![image](src/Network.PNG "Figure 1")

## Differential Cryptanalysis

Given we do not know the key schedule for this network, a brute force technique would require enumerating through each of the six 8-bit keys. Effectively, this would be the same brute forcing a 48-bit key. However, in this case, a technique know as differential cryptanalysis will allow us to severely reduce the number of keys we need to 'guess'. In this attack method, we first choose two plaintext $x$ and $x'$ and record there difference $\Delta x$ where

$$x \oplus x' = \Delta x.$$

For our encryption network above, we first examine our XOR operator $\oplus$ for any key $k$.

### The Bitwise XOR and Keys

Let $x$ and $x'$ be plaintexts such that $x \oplus x' = \Delta x$. Simultaneously, let ciphertexts $y = x \oplus k$ and $y' = x' \oplus k$, where $k$ is some key. Additionally, we define $\Delta y = y \oplus y'$. For any key $k$, we have

$$
\begin{align}
\Delta y &= y \oplus y' \\
&= (x \oplus k) \oplus (x' \oplus k) \\
&= x \oplus k \oplus x' \oplus k \\
&= x \oplus x' \oplus k \oplus k \\
&= x \oplus x' \oplus 0 \\
&= x \oplus x' \\
&= \Delta x
\end{align}
$$

That is to say, if two plaintexts differ by $\Delta x$, then the two ciphertexts will differ by $\Delta x$ as well.† For example, let $x$ be the encoded form of 'hi' in Unicode. That is, $x = 6869_{16}$. Further, let $x' = \mathrm{e8e9}_{16}$ so that $x \oplus x' = 8080_{16}$. Without even knowing key $k$, we would know that $y \oplus y' = (x \oplus k) \oplus (x' \oplus k) = 8080_{16}$. Nevertheless, to further understand our encryption, let's look at our round function $R.$

† - The ciphertexts we dicuss here refer to plaintext XOR'd with some key $k$, not resulting ciphertext of the entire network.

### The Round Function

As hinted in the instructions, the round function $R$ define by $R(x) = (-x \textrm{ mod } 256) \lll 4$ is incredibly weak. To see why, we start by defining this function:

In [1]:
def R(x):
    neg_x = -x % 256
    left4_cycshift = (neg_x << 4) & 0xff | (neg_x >> 4)
    return left4_cycshift

print('R(0x01) = ' + hex(R(0x01)))
print('R(0x6d) = ' + hex(R(0x6d)))
print('R(0x54) = ' + hex(R(0x54)))

R(0x01) = 0xff
R(0x6d) = 0x39
R(0x54) = 0xca


Note, by the nature of our encryption algorithm, our function $R$ only accepts a bytes (integer between $0$ and $255$) as inputs. To showcase this function's weakness however, let us take two bytes $x$ and $x'$ with $\Delta x = x \oplus x'$. It follows,

$$
\begin{align}
\Delta x &= x \oplus x' \\
\Delta x &= x' \oplus x \\
\Delta x \oplus x &= x' \oplus x \oplus x \\
\Delta x \oplus x &= x' \oplus 0 \\
\Delta x \oplus x &= x' \\
x \oplus \Delta x &= x' \\
\end{align}
$$

Moreover, if for every byte $x$ we can show that $R(x) \oplus R(x') = R(x) \oplus R(x \oplus \Delta x) = \Delta y$ for some $\Delta x$ and some $\Delta y$, we can glean comprimising information about the encryption network. In fact:

In [2]:
for Delta_x in range(256):
    image = set()
    for x in range(256):
        Delta_y = R(x) ^ R(x ^ Delta_x)
        image.add(Delta_y)
    if len(image) == 1:
        print(f'R(x) XOR R(x XOR {hex(Delta_x)}) = {hex(Delta_y)} for every x.')
    

R(x) XOR R(x XOR 0x0) = 0x0 for every x.
R(x) XOR R(x XOR 0x80) = 0x8 for every x.


Of course, $R(x) \oplus R(x \oplus 0) = R(x) \oplus R(x) = 0$ for every $x$. More interestingly though, 
$$R(x) \oplus R(x \oplus 80_{16}) = 08_{16}$$ for every $x$. This tells us if two bytes $x$ and $x'$ differ by $80_{16}$, then $R(x) \oplus R(x')$ will always equal $08_{16}$. This weakness allows us to inspect our encryption network under a new light.

## The Differential Network

Let's take another look at our network with labled rounds:

![image](src/Rounds.PNG)

Given our block cipher has a size of 16 bits (2 bytes; 1 word), suppose we have two words $x$ and $x'$ such that $x \oplus x' = 8080_{16}$. Then the left half (left byte) will of $x$ differ by $80_{16}$ from $x'$ and the right half of $x$ will also differ by $80_{16}$ from $x'$. From before, we know XORing each half by some key (Round 1) will not affect their difference. However, XORing halves together (Round 2) will affect the difference between two words as they progress through the encryption network. 

To demonstrate (focusing strictly on the Round 2), again take two words $x$ and $x'$ with $x \oplus x' = 8080_{16}$. Let $b_l$ and $b_l'$ be the left byte (most significant byte) of words $x$ and $x'$ respectively, where $y = x \oplus b_l$ and $y' = x' \oplus b_l'$. Given $x \oplus x' = 8080_{16}$, then $b_l \oplus b_l' = 80_{16}$. We can then calculate the resulting difference $\Delta y$, where

$$
\begin{align}
\Delta y &= y \oplus y' \\
&= x \oplus b_l \oplus x' \oplus b_l' \\
&= x \oplus x' \oplus b_l \oplus b_l' \\
&= 8080_{16} \oplus 80_{16} \\
&= 8000_{16}. \\
\end{align}
$$

That is, if two words initially differ by $8080_{16}$, their left bytes will differ by $80_{16}$ and their right bytes will differ by $00_{16}$ after they progress through Round 2.

Additionally, recall from before that if two bytes differ by $80_{16}$, they will then differ by $08_{16}$ after being past through the round function $R$.

Taking all this information into account, say we have two 16-bit blocks (in plaintext) that have a difference of $8080_{16}$. We can then somewhat track the difference between these two blocks as they pass through the network:

![image](src/Diff1.PNG)

For the most part we have some success until we reach the round function in the fifth round. Specifically, if two bytes $x$ and $x'$ differ by $08_{16}$, we cannot guarantee a unique value for $R(x) \oplus R(x')$. This is where we make use of our plaintext-ciphertext pairs:

In [3]:
import pandas

file = open(r'src\plain-cipher.csv', 'r')
df = pandas.read_csv(file)
file.close()
df

Unnamed: 0,P,E(P),E(P XOR 0x8080),E(P XOR 0x0080),E(P XOR 0x0008)
0,0x0041,0x7e0a,0x4732,0x0884,0xd6dc
1,0x0079,0xd522,0xce3a,0xa3ac,0x9d14
2,0x9f5e,0xcea3,0xd6b4,0x4623,0xba5b
3,0x6379,0x7a47,UNKNOWN,UNKNOWN,UNKNOWN
4,0x455f,0x8223,UNKNOWN,UNKNOWN,UNKNOWN
5,0x366f,0x1a2f,UNKNOWN,UNKNOWN,UNKNOWN


Before we assign variables according to our table above, it may help to create a class to keep track of the left and right byte for a block (and automatically update the entire word block as we adjust left and right bytes):

In [4]:
class Block:
    def __init__(self, block):
        self.left = block >> 8
        self.right = block & 0xff

    @property
    def word(self):
        return (self.left << 8) | self.right

    @word.setter
    def word(self, val):
        self.word = val

We can now define our variables utilizing our class above:

In [5]:
P0, EP0, EP0x8080, EP0x0080, EP0x0008 = Block(0x0041), Block(0x7e0a), Block(0x4732), Block(0x0884), Block(0xd6dc)
P1, EP1, EP1x8080, EP1x0080, EP1x0008 = [Block(eval(item)) for item in df.iloc[1]]
P2, EP2, EP2x8080, EP2x0080, EP2x0008 = [Block(eval(item)) for item in df.iloc[2]]
P3, EP3 = [Block(eval(item)) for item in df.iloc[3][:2]]
P4, EP4 = [Block(eval(item)) for item in df.iloc[4][:2]]
P5, EP5 = [Block(eval(item)) for item in df.iloc[5][:2]]

At the same time, we can create a function that inverts (undoes) Round 7 of the encryption network (we need not worry about keys for this round):

In [6]:
def round7_inv(B: Block) -> Block:
    N = Block(0)
    N.left = B.left ^ B.right
    N.right = B.right
    return N

This way, we may examine what undoing Round 7 on our encrypted block $E(P_0)$ would look like:

In [7]:
print('EP0 before Round 7 inversion:', hex(EP0.word))
print('EP0 after Round 7 inversion:', hex(round7_inv(EP0).word))

EP0 before Round 7 inversion: 0x7e0a
EP0 after Round 7 inversion: 0x740a


Before we can invert Round 6, we need to find what possible values $K_5$ may possess. To do this, first take $P_0 = 0041_{16}$. We see from our data that $E(P_0) = \mathrm{7e0a}_{16}$ and $E(P_0 \oplus 8080_{16}) = 4732_{16}$. From our differential analysis before, we know if we reverse Round 6 on $E(P_0)$ and $E(P_0 \oplus 8080_{16})$, then the right byte of these two partially decrypted blocks should differ by $08_{16}$. So, let's iterate through all possible values and capture (8-bit) keys that do not contradict our analysis:

In [8]:
def crack_k5(B: Block, DB: Block) -> set:
    S = set()
    B = round7_inv(B)
    DB = round7_inv(DB)
    for k5 in range(0xff):
        if (R(B.left ^ k5) ^ B.right) ^ (R(DB.left ^ k5) ^ DB.right) == 0x08:
            S.add(k5)
    return S        

The function above is mainly for set building. Assuming a key is valid, it should appear in every set $S_i$ for $E(P_i)$ and $E(P_i \oplus 8080_{16})$ pairs:

In [9]:
S0 = crack_k5(EP0, EP0x8080)
S1 = crack_k5(EP1, EP1x8080)
S2 = crack_k5(EP2, EP2x8080)
valid_K5s = set.intersection(S0, S1, S2)
print(f'Possible K_5 keys are in {valid_K5s}.')

Possible K_5 keys are in {98, 226}.


Ultimately, this means $K_5$ may be $98$ or $226$ ($62_{16}$ or $\mathrm{e2}_{16}$ respectively). Effectively, this takes us from cracking a 48-bit key to a 41-bit key, but we can further reduce this number. With these potential keys in mind, let's make a function that inverts Round 6:

In [10]:
def round6_inv(B: Block, k5):
    B = round7_inv(B)
    N = Block(0)
    N.left = B.left
    N.right = R(B.left ^ k5) ^ B.right
    return N

Using the same $E(P_i)$ and $E(P_i \oplus 8080_{16})$ pairs, we can perform a similar analysis to crack $K_4$ and reverse Round 5.

In [11]:
def crack_k4(B: Block, DB: Block, k5) -> set:
    S = set()
    B = round6_inv(B, k5)
    DB = round6_inv(DB, k5)
    for k4 in range(0xff):
        if (R(B.right ^ k4) ^ B.left) ^ (R(DB.right ^ k4) ^ DB.left) == 0x80:
            S.add(k4)
    return S

However, seeing that $K_5$ may be one of two values, a false $K_5$ may lead to false information. As such, it is important to consider values for $K_4$ when $K_5$ = $62_{16}$ or when $K_5 = \mathrm{e2}_{16}$. To do this, we first suppose $K_5$ = $62_{16}$, build our sets and take the intersection (like before with key $K_5$). We then suppose $K_5 = \mathrm{e2}_{16}$ and calculate the intersection. We then perform a union between these set intersections to gather valid keys for $K_4$:

In [12]:
valid_K4s = []
for k5 in valid_K5s:
    S0 = crack_k4(EP0, EP0x8080, k5)
    S1 = crack_k4(EP1, EP1x8080, k5)
    S2 = crack_k4(EP2, EP2x8080, k5)
    S_cap = set.intersection(S0, S1, S2)
    if S_cap:
        valid_K4s.append(S_cap)
valid_K4s = set.union(*valid_K4s)
print(f'Possible K_4 keys are in {valid_K4s}.')

Possible K_4 keys are in {60, 180, 52, 188}.


Now, we may create a function that reverses Round 5:

In [13]:
def round5_inv(B: Block, k5, k4):
    B = round6_inv(B, k5)
    N = Block(0)
    N.left = R(B.right ^ k4) ^ B.left
    N.right = B.right
    return N

If we continue to work with our current $E(P_i)$ and $E(P_i \oplus 8080_{16})$ pairs, we will find every key is valid for $K_3$. To work around this, let's take a look at our network when two plaintexts differ by $0080_{16}$:

![image](src/Diff2.PNG)

We do not need to traverse the entire network, but we do need these new pairs—namely $E(P_i)$ and $E(P_i \oplus 0080_{16})$ pairs—to crack $K_3$. With a similar process as before:

In [14]:
def crack_k3(B: Block, DB: Block, k5, k4):
    S = set()
    B = round5_inv(B, k5, k4)
    DB = round5_inv(DB, k5, k4)
    for k3 in range(0xff):
        if (R(B.left ^ k3) ^ B.right) ^ (R(DB.left ^ k3) ^ DB.right) == 0x80:
            S.add(k3)
    return S

From here, we need to run our above function to every valid key combination so far. Fortunately, the 'product' function from the native library 'itertools' does this for us:

In [15]:
from itertools import product
list(product(valid_K5s, valid_K4s))

[(98, 60),
 (98, 180),
 (98, 52),
 (98, 188),
 (226, 60),
 (226, 180),
 (226, 52),
 (226, 188)]

In [16]:
valid_K3s = []
for keys in product(valid_K5s, valid_K4s):
    S0 = crack_k3(EP0, EP0x0080, *keys)
    S1 = crack_k3(EP1, EP1x0080, *keys)
    S2 = crack_k3(EP2, EP2x0080, *keys)
    S_cap = set.intersection(S0, S1, S2)
    if S_cap:
        valid_K3s.append(S_cap)
valid_K3s = set.union(*valid_K3s)
valid_K3s

{2, 10, 34, 42, 66, 74, 98, 106, 130, 138, 162, 170, 194, 202, 226, 234}

We can also invert Round 4:

In [17]:
def round4_inv(B, k5, k4, k3):
    B = round5_inv(B, k5, k4)
    N = Block(0)
    N.left = B.left
    N.right = R(B.left ^ k3) ^ B.right
    return N

However, we now need to inspect our network with a differential of $0008_{16}$ to crack $K_2$, otherwise we will not gain any information (eliminate keys):

![image](src/Diff3.PNG)

Analogous to before:

In [18]:
def crack_k2(B, DB, k5, k4, k3):
    S = set()
    B = round4_inv(B, k5, k4, k3)
    DB = round4_inv(DB, k5, k4, k3)
    for k2 in range(0xff):
        if (R(B.right ^ k2) ^ B.left) ^ (R(DB.right ^ k2) ^ DB.left) == 0x00:
            S.add(k2)
    return S

valid_K2s = []
for key in product(valid_K5s, valid_K4s, valid_K3s):
    S0 = crack_k2(EP0, EP0x0008, *key)
    S1 = crack_k2(EP1, EP1x0008, *key)
    S2 = crack_k2(EP2, EP2x0008, *key)
    S_cap = set.intersection(S0, S1, S2)
    if S_cap:
        valid_K2s.append(S_cap)
valid_K2s = set.union(*valid_K2s)
valid_K2s

{49,
 51,
 53,
 55,
 57,
 59,
 61,
 63,
 113,
 115,
 117,
 119,
 121,
 123,
 125,
 127,
 177,
 179,
 181,
 183,
 185,
 187,
 189,
 191,
 241,
 243,
 245,
 247,
 249,
 251,
 253}

We have quite a few potential values for $K_2$, but even these will help us narrow down $K_1$ and $K_0$. Nevertheless, we then undo Round 3 and Round 2:

In [19]:
def round3_inv(B, k5, k4, k3, k2):
    B = round4_inv(B, k5, k4, k3)
    N = Block(0)
    N.left = R(B.right ^ k2) ^ B.left
    N.right = B.right
    return N

def round2_inv(B, k5, k4, k3, k2):
    B = round3_inv(B, k5, k4, k3, k2)
    N = Block(0)
    N.left = B.left
    N.right = B.left ^ B.right
    return N

Further, we may make use of our plaintexts to find values for our remaining keys. In particular, for $K_1$:

In [20]:
def crack_k1(B, P, k5, k4, k3, k2):
    S = set()
    B = round2_inv(B, k5, k4, k3, k2)
    for k1 in range(0xff):
        if B.right ^ k1 == P.right:
            S.add(k1)
    return S

valid_K1s = []
for key in product(valid_K5s, valid_K4s, valid_K3s, valid_K2s):
    S0 = crack_k1(EP0, P0, *key)
    S1 = crack_k1(EP1, P1, *key)
    S2 = crack_k1(EP2, P2, *key)
    S_cap = set.intersection(S0, S1, S2)
    if S_cap:
        valid_K1s.append(S_cap)
valid_K1s = set.union(*valid_K1s)
valid_K1s

{144, 146, 148, 150, 152, 154, 156, 158}

And for $K_0$:

In [21]:
def crack_k0(B, P, k5, k4, k3, k2):
    S = set()
    B = round2_inv(B, k5, k4, k3, k2)
    for k0 in range(0xff):
        if B.left ^ k0 == P.left:
            S.add(k0)
    return S

valid_K0s = []
for key in product(valid_K5s, valid_K4s, valid_K3s, valid_K2s):
    S0 = crack_k0(EP0, P0, *key)
    S1 = crack_k0(EP1, P1, *key)
    S2 = crack_k0(EP2, P2, *key)
    S_cap = set.intersection(S0, S1, S2)
    if S_cap:
        valid_K0s.append(S_cap)
valid_K0s = set.union(*valid_K0s)
valid_K0s

{112, 116, 120, 124}

Finally, we may invert Round 1 and check the remaining plaintext-ciphertext pairs to further prune invalid keys:

In [22]:
def round1_inv(B, k5, k4, k3, k2, k1, k0):
    B = round2_inv(B, k5, k4, k3, k2)
    N = Block(0)
    N.left = B.left ^ k0
    N.right = B.right ^ k1
    return N

valid_keys = []
for key in product(valid_K5s, valid_K4s, valid_K3s, valid_K2s, valid_K1s, valid_K0s):
    check3 = round1_inv(EP3, *key).word == P3.word
    check4 = round1_inv(EP4, *key).word == P4.word
    check5 = round1_inv(EP5, *key).word == P5.word
    if check3 and check4 and check5:
        valid_keys.append(key)
valid_keys

[(98, 60, 98, 251, 154, 120),
 (98, 60, 98, 123, 146, 112),
 (98, 60, 226, 115, 154, 112),
 (98, 60, 226, 243, 146, 120),
 (98, 188, 106, 251, 146, 112),
 (98, 188, 106, 123, 154, 120),
 (98, 188, 234, 115, 146, 120),
 (98, 188, 234, 243, 154, 112),
 (226, 180, 106, 115, 146, 120),
 (226, 180, 106, 243, 154, 112),
 (226, 180, 234, 251, 146, 112),
 (226, 180, 234, 123, 154, 120),
 (226, 52, 98, 115, 154, 112),
 (226, 52, 98, 243, 146, 120),
 (226, 52, 226, 251, 154, 120),
 (226, 52, 226, 123, 146, 112)]

## Decrypting the Ciphertext

We still have 16 potential keys to choose from, but we effectively went from brute-forcing a 48-bit key to a 4-bit key, which requires significantly less resources and time. Still, let's look at our ciphertext:

In [23]:
ciphertext = '7a47857bcbd5a8c5bed41936ad897f463543b35a31bba9a335a97ff6ae0aced65a3182231a2f813ab5b532a8933cd448eaf0'

Given we are working with a 16-bit block cipher, we can split our ciphertext into word-size blocks:

In [24]:
CBlocks = []
h = len(ciphertext)
for i in range(0, h, 4):
    CBlocks.append(int(ciphertext[i:i+4], 16))
CBlocks

[31303,
 34171,
 52181,
 43205,
 48852,
 6454,
 44425,
 32582,
 13635,
 45914,
 12731,
 43427,
 13737,
 32758,
 44554,
 52950,
 23089,
 33315,
 6703,
 33082,
 46517,
 12968,
 37692,
 54344,
 60144]

With some extra work, we can decrypt each ciphertext block for each key and view all potential plaintexts:

In [25]:
valid_plaintexts = set()
for key in valid_keys:
    HexPBlocks = [hex(round1_inv(Block(B), *key).word)[2:] for B in CBlocks]
    plaintext = bytes.fromhex(''.join(HexPBlocks))
    valid_plaintexts.add(plaintext)
valid_plaintexts

{b'cygame{f!nD_y0u12_D37t4_..._0r_50mE_6o0d_h4Rdw4r3}'}

Finally, we are left with only one possible option for our plaintext flag:
    
    'cygame{f!nD_y0u12_D37t4_..._0r_50mE_6o0d_h4Rdw4r3}'

## Reflections

While this challenge may reveal insights into differential cryptanalysis and why it's important we develop cipher algorithms that are resistant to such attacks, the challenge itself has a glaring fault: it's too difficult. That is, the difficulty and time-consuming nature of the problem may dissuade participants from even attempting it, ultimately defeating the oppurtunity to learn.

For future implementations of differential cryptanalysis, it may be worth cutting rounds from the encryption network. For example, cutting Rounds 7 and 6 from the above network would alleviate difficulty without comprimising the heart of the problem. For an even simplier problem, display 6 plaintext-ciphertext pairs that all have the same difference and test whether participants can spot the recurring difference between each pair to ultimately decrypt the flag.