# JesseEmond/matasano-cryptopals

My solutions and explanations of the http://cryptopals.com/ challenges.
Switch branches/tags
Nothing to show
Latest commit d2a034b Oct 23, 2017
 Failed to load latest commit information. src Oct 22, 2017 .gitignore Apr 22, 2016 .travis.yml Apr 29, 2016 README.md Oct 22, 2017 requirements.txt Apr 29, 2016 run_tests.sh Apr 29, 2016

# Matasano - Cryptopals

Note: most links below are just links to files within this repo for your convenience.

## Set 1: Basics

• Trivial introduction to hex parsing to get you going. I implemented b64decode out of curiosity.

• Introduction to XOR between strings. Implementation in xor.py.

• Ahh. Crypto! To break it, loop through all 256 possible bytes to find which of the keys produces the decrypted text that looks the most like English.

frequency.py has code to score how much a string looks like English (testing how close to a certain distribution our data is).

xor.py has code to encrypt/decrypt using a single-byte XOR cipher.

• Very similar to challenge #3. Break all ciphertexts, pick the one with the best "English score".

• Simply cycle the key when XOR-ing with the plaintext.

• Interesting! Breaking it amounts to breaking a single-byte XOR cipher for the 1st character of every repeated key, then for the 2nd, the 3rd, etc.

Guessing the keysize is done by picking the keysize that minimizes the normalized hamming distance (`distance / keysize`) between "blocks" of the repeated key. This is because we can expect 2 different blocks encrypted with the right keysize to have similar bit patterns (e.g. matching characters), so minimizing the normalized hamming distance gives us the keysize that produces the most similar blocks. We need to normalize because e.g. a keysize of 4 would have 2 times as many bits as a keysize of 2.

• Intro to using the pycrypto API. aes.py has the code that handles AES encryption/decryption.

• ECB mode encrypts the blocks independently. This means that two identical blocks of plaintext at different positions will produce the same ciphertext block. All that we need to do is find the ciphertext that has the most (or any, really) duplicate ciphertext blocks, which is very unlikely with 16 bytes of otherwise random data.

## Set 2: Block crypto

• We pad the plaintext so that it has a length that is a multiple of the blocksize (e.g. 16). If it already is, we add a full block of padding. The padding byte to use is equal to the length of the padding.

Examples:

``````pad('0123456789') = '0123456789' + '666666'
pad('') = '\x10' * 16
``````
• Wikipedia's image on the matter is enough of a description to implement this. Basically, every encrypted block is XORed with the previous block (or the IV if this is the first block). The result is that 2 identical plaintext blocks will no longer automatically encrypt to the same ciphertext block (compared to ECB).

• When using ECB, two identical plaintext blocks will encrypt to the same ciphertext block.

Therefore, a block that contains duplicate plaintext blocks will contain duplicate ciphertext blocks once encrypted.

Our oracle checks if the ciphertext contains duplicate blocks. If it does, we consider the ciphertext to be encrypted using ECB. Otherwise, we consider that it used CBC.

• Detecting the blocksize is relatively easy. We start by noting `len(ciphertext)`. We continue adding prefix bytes until we notice a change in `len(ciphertext)`. When that happens, it means that we have crossed a block boundary and we know the blocksize to be `len(after) - len(start)`.

Using the ECB property that we exploited earlier (identical plaintext blocks encrypt to identical ciphertext blocks), we can bruteforce individual bytes at the end of blocks.

For example, assume that we want to bruteforce the (unknown) plaintext: `hello this is a test`.

At first, we want to isolate the first character (`h`, but we don't know that yet) at the end of a block, through a prefix of `A`s. Like this:

``````AAAAAAAAAAAAAAAh
``````

We get the encrypted result of this (call this `C0`). We can try all possible bytes that could follow our prefix, like the following:

``````AAAAAAAAAAAAAAAa
AAAAAAAAAAAAAAAb
AAAAAAAAAAAAAAAc
...
``````

Eventually, we will get an encrypted block that will match `C0`, and thus we will know the first byte of our plaintext.

We can bruteforce the second byte (`e`) similarly. We start by getting `C1` by encrypting:

``````AAAAAAAAAAAAAAhe
``````

Then by trying all possible bytes:

``````AAAAAAAAAAAAAAha
AAAAAAAAAAAAAAhb
AAAAAAAAAAAAAAhc
...
``````

Until we find a block that encrypts to `C1`. And so on, for all bytes in the first block.

For the following blocks, the idea is the same, but a little more thought must be put into where we get our `Ci` and what should be put as a prefix.

The block `Ci` is merely the index of the block we already are at with the next byte that we want to bruteforce. The reason is quite simple: with our padding, we will only be pushing that byte at the end of its current block, not produce a next block.

The padding only has to be enough to put the next bruteforced byte at the end of a block. When bruteforcing, however, the bytes before the tried byte must be the last 15 (blocksize - 1) known bytes, to fit `Ci`.

This is all clearer through an example:

``````plaintext:    hello this is a (...unknown....)
|---block 0----||---block 1----|
bruteforcing: t (unknown yet)

We will pad with As:
padded:       AAAAAAAAAAAAAAAhello this is a t(...unknown....)
|---block 0----||---block 1----||---block 2----|
Ci = block 1

Bruteforce tries: ello this is a a
ello this is a b
ello this is a c
...
``````

We repeat the steps up until the point where we reach the last block. Then, we will eventually hit the padding.

If the padding happens to be 1 byte, we will successfully find it (this is just like bruteforcing a normal byte that just happens to be `0x01`).

If the padding is anything else, we will first successfully "bruteforce" a `0x01` in the plaintext (because we will be padding the plaintext until it is only missing a byte, so it will be padded with `0x01`). Then, we will try to bruteforce the next byte, but fail. The reason is that the padding will now be `[0x02, 0x02]`, while we are trying `[0x01, 0x??]`. Once that happens, we know that we have successfully bruteforced the whole plaintext. We can remove our undesirable bruteforced `0x01` padding and enjoy reading our plaintext. 🎉

• Using the same exploitable ECB property as before, we craft multiple messages and pick-and-choose the parts that we want.

Blocks A:

``````email=aaaaaaaaaa (A0)
adminPPPPPPPPPPP (A1) (P is the padding that will be needed as a final block)
&uid=10&role=use (A2)
rPPPPPPPPPPPPPPP (A3)
``````

Blocks B:

``````email=aaaaaaaaaa (B0)
aaa&uid=10&role= (B1)
userPPPPPPPPPPPP (B2)
``````

By crafting the ciphertext `B0 + B1 + A1`, we get:

``````email=aaaaaaaaaa (B0)
aaa&uid=10&role= (B1)
adminPPPPPPPPPPP (A1)
``````

Which grants us admin access to log in.

• This attack is the same as the challenge #12, but with some required initial work, offsets and padding to apply.

We first need to find out how long the prefix is. We do this by generating 2 blocks of fixed data (e.g. `[0xA] * 32`) and gradually increasing the size of the fixed data until we find 2 neighbour duplicate blocks in the ciphertext. This indicates that we have fully padded the last block of the prefix and that we have produced two blocks of our own input after that. To make sure that we don't just have identical blocks because the prefix happened to end with our fixed value (therefore fooling us into thinking that we have padded 1 more byte than we really have), we can try with 2 different fixed values, e.g. `[0] * 32` and `[1] * 32`.

Then, one can use the index where the duplicate blocks begin to find where the first block after the prefix starts. With that information, we can find the amount of padding that was required to pad the prefix to a multiple of blocksize through `len(fixed_data) - 2 * blocksize`. The length of the prefix is then `index of first of the duplicates - padding length`.

With the length of the prefix, we just use our algorithm from challenge #12, but prefixing our input with some padding to pad the prefix to a blocksize multiple. We also need to offset any index in the produced ciphertext by the amount of blocks in the prefix.

• We get the last byte of the plaintext as `n` and make sure that the last `n` bytes of the plaintext are equal to `n`. If it is not the case, raise an exception.

• Start out by encrypting a normal token for a block of 16 bytes. This will be where we will inject our crafted block. Call this encrypted block `current_block`.

We want to inject `target = ";admin=true;abc="`.

We know that the plaintext block following our input is: `next_plain = ";comment2=%20lik"`.

When decrypting with CBC, the following is done: `next_plain = next_block_pre_xor ^ current_block`

We can calculate `next_block_pre_xor = next_plain ^ current_block`.

We want `next_block_pre_xor ^ crafted_block` to yield `target`, so we choose: `crafted_block = target ^ next_block_pre_xor`.

Then, all we need is to swap `current_block` with our `crafted_block` to get admin access. The decryption of `current_block` will yield scrambled plaintext, but it is not a problem since it only modifies `comment1`.

## Set 3: Block & stream crypto

• If we have an oracle that tells us if the padding on a ciphertext is valid or not, we are able to recover the plaintext.

Keep in mind that decryption happens like this:

``````block_pre_xor xxxxxxxxxxxxxxxx ^
prev_block pppppppppppppppp =
plaintext tttttttttttttttt
``````

We can play with the `prev_block` to search for a byte that results in valid padding:

``````block_pre_xor xxxxxxxxxxxxxxx? ^
prev_block pppppppppppppppB = (we bruteforce 'B')
plaintext ttttttttttttttt1
``````

Once we have the `prev_block` byte that gives valid padding, we can deduce `pre_xor` byte `?`.

Once we know the last `N` bytes of the `pre_xor`, we can use that information to craft the ending of the padding in just the right way to bruteforce the `N+1`th byte (starting from the right).

With all the `pre_xor` bytes recovered from a block, we can XOR them with out real `prev_block` to find the plaintext.

However, we must keep the following tricky thing in mind:

The last block (and other plaintext blocks, if we're unlucky) can produce two valid `pre_xor` bytes, instead of just one: E.g. block `...55555` We'll find these 2 valid `pre_xor` bytes:

• the one that produces `...55551`
• the one that produces `...55555`

When this happens, we can just alter the 2nd last byte to produce junk, so that only the padding ending with a `1` will pass: E.g. changing `5` to `4`

• the `pre_xor` that produces `...55541` passes
• the `pre_xor` that produces `...55545` does not
• There isn't much to say here... We generate a keystream with:

`aes(key, nonce + counter)`

Where `nonce` and `counter` are 64-bits integers encoded in little-endian and `counter` is the amount of blocks generated so far.

• We reuse our `english_test` function and pick the key that has the best score. In some cases, we pick a key that is not as good as some others because we notice (by analysis of the plaintexts obtained) that the key is not the right one.

As mentioned by the challenge description, this is a bit manual indeed.

• Very similar to the previous challenge, except that we can heavily reuse code that we've written before.

• This task surprisingly took a little longer than just copying Wikipedia's pseudocode. I spent some time wondering why Python's random would yield different numbers (and state!) from me, so I investigated.

I compared with C++'s std::mt19937, which yielded the same results as me. I couldn't find posts complaining about this specifically so I took a look at CPython's source. It turns out that the underlying C module seeds in a different way from what we see on Wikipedia. It assumes that the seed could be greater than 32-bits and just uses all of the bits of the seed in a different procedure. The result: it ends up with a different state from us.

Just to make sure I looked at how `numpy`'s random and it does generate the same state as us when seeding. Therefore, the tests for this one use values that I extracted with C++'s implementation.

• Pretty straightforward. Bruteforce all sensible seeds (a couple of seconds in the past) and pick the one that matches.

• The state is directly coming from the next output, so it is really easy to recover the next state from a given output.

Essentially, we need to look at `624` transformations of the state to fully recover the MT state:

``````state = [untemper(output) for output in outputs[-624:]]
``````

Our `untemper` needs to reverse the right and left shift operations.

Let's look at a case with 8-bits: `y = 10101010`

If we're shifting right by 3 bits, we'll do the following:

``````10101010 y
00010101 y >> 3
10111111 y ^ y >> 3
``````

We notice that the first 3 bits of the result will match the first 3 bits of the original `y`. Then, those original 3 bits will be used (once shifted) to xor with the original `y`. By using the known first 3 bits, we can recover the next 3 bits of `y` by doing this:

``````10111111 y ^ y >> 3
00010100 known_bits >> 3
10101011 (y ^ y >> 3) ^ (known_bits >> 3)
``````

We can then recover the whole `y` this way. The left shift is very similar, but we `&` with a constant before xoring.

• This challenge was very similar to challenge 22. Basically, we bruteforce the seed space to find one that gives an expected decryption/token.

## Set 4: Stream crypto and randomness

• Here we're doing a terrible mistake: we are reusing our keystream on a different plaintext. Solution? We provide `00000000...` and it gets directly xored with the keystream (yielding the keystream). We xor that with the original ciphertext and we get the secret back!

• This one is really straightforward. We know the plaintext, so we can find the keystream and use it to encrypt our token the way that we want.

• When we split the obtained ciphertext in 3 blocks:

``````C1, C2, C3
``````

We can then produce a ciphertext in the following way:

``````C1, 0, C1
``````

and capture the produced plaintext from the error message.

This means that `P1` is the result of `KEY ^ P3` (since `P3` is unchanged by xoring with `0`). We can recover the key through: `P1 ^ P3 = (P3 ^ KEY) ^ P3 = KEY`.

• Not much to do apart from adding a python sha1 implementation here...

In progress.