**Angeline Micaela Cantal & Sebastian Louis de Leon [200982 & 205636]**

# Homework 1

*Due date:* February 4, 2025 (Tuesday) at 8 PM

This homework is designed to get you started in implementing some cryptographic algorithms in Python.
This should be easy enough for you to do solo, but it doesn't hurt if you want to work with a partner.
These exercises are important in the sense that they serve as stepping stones to future attacks, and most likely you'll need your code for these for the final contest.
If you can finish this homework on time, then you'll very likely do well in the final contest!

Much of classical crypto operates on alphabetic strings (strings only containing letters) while modern 
crypto deals excusively with binary strings.
Despite that crucial difference, we may see some familiar parallels between historical ciphers like 
Caesar or Vigenère and relatively modern ones like the XOR cipher and the one-time pad.

This homework has 32 points in total, but will be divided by 30 to get the final percentage. Final percentages are capped at 100%.

Please be guided on the policies regarding late submissions, regrading, and collaboration.
If any, please direct all your questions and clarifications about this homework in the `#hw1-help` channel on the Discord server.

## Dealing with binary data in Python

Python supports *byte literals* (I just call them "bytestrings" for convenience), string-like sequences that are prefixed by `b`.

In [3]:
b'This is a byte literal'

b'This is a byte literal'

In [4]:
'Meanwhile this is a string literal'

'Meanwhile this is a string literal'

A bytestring is not the same as its string counterpart...

In [5]:
'some_string' == b'some_string'

False

...because they have different types.

In [6]:
print(type('some_string'))
print(type(b'some_string'))

<class 'str'>
<class 'bytes'>


The usual string operations work with bytestrings.

In [7]:
test = b'this is a long bytestring'
print(test[3:10])   # slicing
print(test[4])      # get the 4th byte (returns an int)

b's is a '
32


In [8]:
another = b' and anotha one!'
print(test + another)   # concatenating bytestrings
print(another * 10)     # n-fold concatenation

b'this is a long bytestring and anotha one!'
b' and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one!'


We can use the following libraries to convert between encodings (hex and Base64).

In [9]:
from binascii import hexlify, unhexlify

In [10]:
from base64 import b64encode, b64decode

In [11]:
hexlify(b'this is gonna be hex soon')

b'7468697320697320676f6e6e612062652068657820736f6f6e'

In [12]:
unhexlify(b'7468697320697320686578206e6f206d6f7265')

b'this is hex no more'

In [13]:
b64encode(b'Base64 this thing!')

b'QmFzZTY0IHRoaXMgdGhpbmch'

In [14]:
b64decode(b'YmFzZTY0IGlzIG5vdCBlbmNyeXB0aW9uIQ==')

b'base64 is not encryption!'

## Some reminders

You are not allowed to use additional libraries (even within the Python standard library) other than those explicitly used here, though you may implement additional functions of your own.

**Very important:** Always work with raw bytes, never with encoded strings.

The following pages from the Python documentation may be helpful:
- https://docs.python.org/3/library/stdtypes.html#bytes-objects
- https://docs.python.org/3/library/binascii.html
- https://docs.python.org/3/library/base64.html

## 1-1. Hex to Base64 [2 pts]

Write a function called `hex_to_b64` to convert a hex-encoded bytestring into Base64. For example, the string
```
48656c7021204920676f7420584f52206579657320616e642069742773206b696c6c696e67206d65
```
should output
```
SGVscCEgSSBnb3QgWE9SIGV5ZXMgYW5kIGl0J3Mga2lsbGluZyBtZQ==
```

In [1]:
from binascii import hexlify, unhexlify
from base64 import b64encode, b64decode

def hex_to_b64(h):
    #Decode hexstring to bytestring
    hex_bytes = unhexlify(h)

    b64_bytes = b64encode(hex_bytes)
    return b64_bytes

test = hex_to_b64(b'48656c7021204920676f7420584f52206579657320616e642069742773206b696c6c696e67206d65')
print(test)

b'SGVscCEgSSBnb3QgWE9SIGV5ZXMgYW5kIGl0J3Mga2lsbGluZyBtZQ=='


In [2]:
# if your function works properly, no error should appear when running this line
assert hex_to_b64(b'48656c7021204920676f7420584f52206579657320616e642069742773206b696c6c696e67206d65') == b'SGVscCEgSSBnb3QgWE9SIGV5ZXMgYW5kIGl0J3Mga2lsbGluZyBtZQ=='

## 1-2. XOR'ing two bytestrings [4 pts]

Write a function called `xor_bytes` that takes two bytestrings of equal length and outputs their XOR. For example, if I have a bytestring
```
49207374696c6c206861766520584f522065796573206e6f772077686174
```
after hex-decoding it, and when I XOR it against
```
3a4f1e11060209001b04180100302a3e5045141c5345170a040016292955
```
it should output
```
736f6d656f6e652073656e642068656c70206d7920657965732061414821
```

In [3]:
from binascii import hexlify, unhexlify

def xor_bytes(a, b):
    #this is to see if the length of the two bytestrings are equal or not
    if len(a) != len(b):
        #return an empty bytes object if lengths are different
        return b''
    
    # XOR the byte strings and return the result as a bytes object
    return bytes(byte_a ^ byte_b for byte_a, byte_b in zip(a, b))

a = b'49207374696c6c206861766520584f522065796573206e6f772077686174'
b = b'3a4f1e11060209001b04180100302a3e5045141c5345170a040016292955'

test = xor_bytes(unhexlify(a), unhexlify(b))

print(test)

print(hexlify(test))

b'someone send help my eyes aAH!'
b'736f6d656f6e652073656e642068656c70206d7920657965732061414821'


In [4]:
# if your function works properly, no error should appear when running this line
assert hexlify(xor_bytes(unhexlify(b'49207374696c6c206861766520584f522065796573206e6f772077686174'), unhexlify(b'3a4f1e11060209001b04180100302a3e5045141c5345170a040016292955'))) == b'736f6d656f6e652073656e642068656c70206d7920657965732061414821'

## 1-3. Single-byte XOR cipher [6 pts]

You could say that this is just the Caesar cipher but it works over binary strings instead. Anyways, I have encrypted the hex-encoded bytestring
```
264f0300190a4f0d06084f1c0a02061f1d06020a1c4f0e010b4f264f0c0e0101001b4f03060a4e
```
by XOR'ing it against a single character. Find out what key I used, and decrypt the message.

You can do this by hand (since there are only 256 possible keys to choose from), but don't!
**Write code to do this for you.**
Have some way of "scoring" a piece of English plaintext, like using character frequency, so you can try each possible key and just output the one with the best score.

Write a function called `try_decrypt_xor` that does just that, i.e., it takes a bytestring and outputs the key and the decrypted message (and optionally the score). And yes, you may implement your own functions as well.

In [6]:
# taken from http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
FREQ_TABLE = {'e': 12.02, 't': 9.10, 'a': 8.12, 'o': 7.68, 'i': 7.31, 'n': 6.95, 
              's': 6.28,  'r': 6.02, 'h': 5.92, 'd': 4.32, 'l': 3.98, 'u': 2.88, 
              'c': 2.71,  'm': 2.61, 'f': 2.30, 'y': 2.11, 'w': 2.09, 'g': 2.03, 
              'p': 1.82,  'b': 1.49, 'v': 1.11, 'k': 0.69, 'x': 0.17, 'q': 0.11, 
              'j': 0.10,  'z': 0.07, ' ': 19.18}

In [10]:
from binascii import unhexlify

def score_text(text):
    #Score a string based on the frequency of English letters.
    return sum(FREQ_TABLE.get(chr(c).lower(), 0) for c in text)

def xor_with_key(data, key):
    #XOR the data with a single byte key.
    return bytes(byte ^ key for byte in data)

def try_decrypt_xor(c):
    #Try decrypting a bytestring XOR'd with a single-character key.
    ciphertext = unhexlify(c)  # Convert the hex-encoded bytestring to raw bytes
    best_score = 0
    best_key = None
    best_message = None

    for key in range(256):  # Try all possible single-byte keys
        decrypted = xor_with_key(ciphertext, key)
        try:
            score = score_text(decrypted)
            if score > best_score:  # Keep track of the best-scoring result
                best_score = score
                best_key = key
                best_message = decrypted
        except:
            continue  # Skip invalid decryption attempts

    return best_key, best_message, best_score

# Example usage
key, message, score = try_decrypt_xor(b"264f0300190a4f0d06084f1c0a02061f1d06020a1c4f0e010b4f264f0c0e0101001b4f03060a4e")

print(f"Key: {key} ({chr(key)})")
print(f"Message: {message}")
print(f"Score: {score}")

Key: 111 (o)
Message: b'I love big semiprimes and I cannot lie!'
Score: 332.99000000000007


## 1-4. Detect single-character XOR [6 pts]

I have a [file](http://lunchtimeattack.wtf/csci184.03/xor_strings.txt) that contains 420 64-character hex-encoded strings, and **only one** of them has been encrypted by single-character XOR. 

Find that string.

*Hint:* Your code from the previous item should help.

In [None]:
fin = open('xor_strings.txt', 'rb')

: 

In [None]:
strings = fin.readlines()

: 

: 

## 1-5. Implement repeating-key XOR [4 pts]

Here are the first two lines of what people consider one of [Frank's greatest hits](https://youtu.be/ntULIFnj7MY):
```
Shawty had them apple bottom jeans,
The boots with the fur
```
Encrypt it using repeating-key XOR, with the key "`LOW`".

The way repeating-key XOR works is that the first byte of plaintext will be XOR'd against `L`, the next `O`, the next `W`, then `L` again for the 4th byte, and so on. In effect, it will look something like this:
```
Plaintext:   S h a w t y   h a d   t ...
Key:         L O W L O W L O W L O W ...
```
If this reminds you of the Vigenère cipher, you're absolutely right! This *is* just Vigenère, but we're working over binary strings instead.

Write a function called `xor_repeating` that takes two bytestrings corresponding to the plaintext and the key, and encrypts it using repeating-key XOR. Using the plaintext and key above, your function should output:
```
1f27363b3b2e6c2736286f23242a3a6c2e273c23326c2d38383b38216f
3d292e393f635d1827326c2d38233b246c383e3827773827326c29223e
```
when hex-encoded. (Note that this is a single hex string; it's only split into two lines to make it fit.)

In [None]:
def xor_repeating(m, key):
    # TO-DO: do stuff here

: 

In [None]:
# if your function works properly, no error should appear when running this line
assert hexlify(xor_repeating(b'Shawty had them apple bottom jeans,\nThe boots with the fur', b'LOW')) == b'1f27363b3b2e6c2736286f23242a3a6c2e273c23326c2d38383b38216f3d292e393f635d1827326c2d38233b246c383e3827773827326c29223e'

: 

## 1-6. Break repeating-key XOR [10 pts]

I have [another file here](http://lunchtimeattack.wtf/csci184.03/encrypted.txt).
It has been Base64-encoded after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

1. Let $n$ be the guessed key length. Try values from $n = 2$ up to $n = 40$.

2. Write a function called `hamming_dist` that computes the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between two bytestrings of equal length.
The Hamming distance is just the number of differing bits.
For example, the Hamming distance between `twelve plus one` and `ElEveN pLUs twO` is $30$. **Please make sure your function works correctly first before proceeding.**

3. For each $n$, take the first $n$ bytes and the second $n$ bytes, and find the Hamming distance between them. 
Divide the result by $n$ to normalize it.

4. The $n$ with the smallest normalized Hamming distance is *probably* the key. But just to make sure, you could proceed with the smallest 2 or 3 values of $n$, or you could take 4 blocks of $n$ instead of 2 and average their distances.

5. Now that you probably know $n$, split the ciphertext into blocks of length $n$.

6. Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on. For example, suppose we have the ciphertext split into blocks of length 5:
```
QWERT YUIOP ASDFG HJKLZ
```
then transposing these blocks will yield five blocks of length 4:
```
QYAH WUSJ EIDK ROFL TPGZ
```

7. Solve each block as if it was single-character XOR. You have code to do this, so go use that.

8. For each block, the single-byte XOR key that has the best score is the repeating-key XOR key byte for that block. All you need to do is to put them together and you have the key!

As before, you may implement additional functions of your own.

In [None]:
def hamming_dist(a, b):
    # TO-DO: do stuff here

: 

In [None]:
# if your function works properly, no error should appear when running this line
assert hamming_dist(b'twelve plus one', b'ElEveN pLUs twO') == 30

: 

In [None]:
fin = open('encrypted.txt', 'rb')

: 

In [None]:
b64str = fin.read()

: 

: 