# AlexXiong97/applied-cryptography

Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
index.py

# Many Time Pad (HW01 CS55)

TL;DR: Moral of the story -- never reuse secret key when using determinsitc encryption, which apparently include the one-time-pad construction

Decrypted Target Ciphertext:

`The secret message is: When using a stream cipher, never use the key more than once`

## Attack Approach

Notation: `ct`: ciphertext; `pt`: plaintext; `sk`: secret key

`ct1 = pt1 ^ sk` ,`ct2 = pt2 ^ sk` ...

=> `ct1 ^ ct2 = pt1 ^ pt2` ...

• So what? what can I do with a xor value, still Gibberish to me!

Noticing that:

```chr(ord('a') ^ ord(" "))   # 'a' XOR space yields 'A'
chr(ord('B') ^ ord(" "))   # 'B' XOR space yields 'b'```

Since we are only dealing with English characters, regardless of the encoding scheme (UTF-8 or ASCII or other modern standard) , `white space` <=> `0x20` <=> `32` <=> `00100000`. (Although unnecessary for this specific attack, but amazing encoding intro for programmer by Joel provides basics arsenal for understanding Unicode, especially if this targetted message is written in Chinese, for example.)

• How we could leverage this pattern?

To make this "accidental" discovery(proudly helped by hint provided) more general and conceivable, let's turn to the standard ASCII table.

Noticing that (test yourself):

``````every character in Column 1 and 3, when `xor` with `space`/`00100000`, will get its
right-hand neighbour in Column 2 and 4, vice versa.
``````

But this alone doesn't really help, not until we think about if we `xor` two alphabet characters, we will certainly not get an alphabet!

Wait, what we usually have in a sentence: alphabet, comma, period, colon, semicolon ... (quite predictable, isn't it.) Now we konw that once they `xor` with a `space`, we will see their "neighbours"

..... which means: if we see "familiar neighbours" (alphabet themselves, and neighbours of comma, period etc), there is a high chance one of the ciphertexts being `xor`ed contains a `space` at that position

• That's all well and good, but how do I know which ciphertext does the `space` comes from?

Nasty cat might be cute for a second, cute cat is cute forever.

if at certain position, when `ct1` xor with all the rest ciphertexts, and the results almost always fall into our "familiar baskets", then we could conclude `pt1` at this position is a `space` with high probability.

After locking down those spaces, it is trivial to reveal their "xor partner" their plaintexts.

## Test

run the `index.py`

`python index.py   # will produce a byte array representation of the decrypted text of the targetted message.`

## Problem

Since we mentioned this approach stands on the shoulder of "high probability", there are still a few undecrypted byte or incorrectly decrypted byte.

By increasing the threshold value there might be more undecrypted byte, but less misinterpreted byte.

Another adjustable criteria is what counts as a sign for whitespace , in our current version, I personally include: a-zA-Z,." !?.