Many Time Pad (HW01 CS55)
TL;DR: Moral of the story -- never reuse secret key when using determinsitc encryption, which apparently include the one-time-pad construction
Decrypted Target Ciphertext:
The secret message is: When using a stream cipher, never use the key more than once
sk: secret key
ct1 = pt1 ^ sk ,
ct2 = pt2 ^ sk ...
ct1 ^ ct2 = pt1 ^ pt2 ...
- So what? what can I do with a xor value, still Gibberish to me!
chr(ord('a') ^ ord(" ")) # 'a' XOR space yields 'A' chr(ord('B') ^ ord(" ")) # 'B' XOR space yields 'b'
Since we are only dealing with English characters, regardless of the encoding scheme (UTF-8 or ASCII or other modern standard)
white space <=>
00100000. (Although unnecessary for this specific attack, but amazing encoding
intro for programmer by Joel
provides basics arsenal for understanding Unicode, especially if this targetted message is written in Chinese, for example.)
- How we could leverage this pattern?
To make this "accidental" discovery(proudly helped by hint provided) more general and conceivable, let's turn to the standard ASCII table.
Noticing that (test yourself):
every character in Column 1 and 3, when `xor` with `space`/`00100000`, will get its right-hand neighbour in Column 2 and 4, vice versa.
But this alone doesn't really help, not until we think about if we
xor two alphabet characters, we will certainly not get
Wait, what we usually have in a sentence: alphabet, comma, period, colon, semicolon ... (quite predictable, isn't it.) Now we
konw that once they
xor with a
space, we will see their "neighbours"
..... which means: if we see "familiar neighbours" (alphabet themselves, and neighbours of comma, period etc), there is a high
chance one of the ciphertexts being
xored contains a
space at that position
- That's all well and good, but how do I know which ciphertext does the
Nasty cat might be cute for a second, cute cat is cute forever.
if at certain position, when
ct1 xor with all the rest ciphertexts, and the results almost always fall into
our "familiar baskets", then we could conclude
pt1 at this position is a
space with high probability.
After locking down those spaces, it is trivial to reveal their "xor partner" their plaintexts.
python index.py # will produce a byte array representation of the decrypted text of the targetted message.
Since we mentioned this approach stands on the shoulder of "high probability", there are still a few undecrypted byte or incorrectly decrypted byte.
By increasing the threshold value there might be more undecrypted byte, but less misinterpreted byte.
Another adjustable criteria is what counts as a sign for whitespace , in our current version, I personally include: a-zA-Z,." !?.
Please, please contact me and share your better solution! Coudln't appreciate enough.