**Angeline Micaela Cantal & Sebastian Louis de Leon [200982 & 205636]**

# Homework 1

*Due date:* February 4, 2025 (Tuesday) at 8 PM

This homework is designed to get you started in implementing some cryptographic algorithms in Python.
This should be easy enough for you to do solo, but it doesn't hurt if you want to work with a partner.
These exercises are important in the sense that they serve as stepping stones to future attacks, and most likely you'll need your code for these for the final contest.
If you can finish this homework on time, then you'll very likely do well in the final contest!

Much of classical crypto operates on alphabetic strings (strings only containing letters) while modern 
crypto deals excusively with binary strings.
Despite that crucial difference, we may see some familiar parallels between historical ciphers like 
Caesar or Vigenère and relatively modern ones like the XOR cipher and the one-time pad.

This homework has 32 points in total, but will be divided by 30 to get the final percentage. Final percentages are capped at 100%.

Please be guided on the policies regarding late submissions, regrading, and collaboration.
If any, please direct all your questions and clarifications about this homework in the `#hw1-help` channel on the Discord server.

## Dealing with binary data in Python

Python supports *byte literals* (I just call them "bytestrings" for convenience), string-like sequences that are prefixed by `b`.

In [1]:
b'This is a byte literal'

b'This is a byte literal'

In [2]:
'Meanwhile this is a string literal'

'Meanwhile this is a string literal'

A bytestring is not the same as its string counterpart...

In [3]:
'some_string' == b'some_string'

False

...because they have different types.

In [4]:
print(type('some_string'))
print(type(b'some_string'))

<class 'str'>
<class 'bytes'>


The usual string operations work with bytestrings.

In [5]:
test = b'this is a long bytestring'
print(test[3:10])   # slicing
print(test[4])      # get the 4th byte (returns an int)

b's is a '
32


In [6]:
another = b' and anotha one!'
print(test + another)   # concatenating bytestrings
print(another * 10)     # n-fold concatenation

b'this is a long bytestring and anotha one!'
b' and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one! and anotha one!'


We can use the following libraries to convert between encodings (hex and Base64).

In [7]:
from binascii import hexlify, unhexlify

In [8]:
from base64 import b64encode, b64decode

In [9]:
hexlify(b'this is gonna be hex soon')

b'7468697320697320676f6e6e612062652068657820736f6f6e'

In [10]:
unhexlify(b'7468697320697320686578206e6f206d6f7265')

b'this is hex no more'

In [11]:
b64encode(b'Base64 this thing!')

b'QmFzZTY0IHRoaXMgdGhpbmch'

In [12]:
b64decode(b'YmFzZTY0IGlzIG5vdCBlbmNyeXB0aW9uIQ==')

b'base64 is not encryption!'

## Some reminders

You are not allowed to use additional libraries (even within the Python standard library) other than those explicitly used here, though you may implement additional functions of your own.

**Very important:** Always work with raw bytes, never with encoded strings.

The following pages from the Python documentation may be helpful:
- https://docs.python.org/3/library/stdtypes.html#bytes-objects
- https://docs.python.org/3/library/binascii.html
- https://docs.python.org/3/library/base64.html

## 1-1. Hex to Base64 [2 pts]

Write a function called `hex_to_b64` to convert a hex-encoded bytestring into Base64. For example, the string
```
48656c7021204920676f7420584f52206579657320616e642069742773206b696c6c696e67206d65
```
should output
```
SGVscCEgSSBnb3QgWE9SIGV5ZXMgYW5kIGl0J3Mga2lsbGluZyBtZQ==
```

In [13]:
from binascii import hexlify, unhexlify
from base64 import b64encode, b64decode

def hex_to_b64(h):
    #Decode hexstring to bytestring
    hex_bytes = unhexlify(h)

    b64_bytes = b64encode(hex_bytes)
    return b64_bytes

test = hex_to_b64(b'48656c7021204920676f7420584f52206579657320616e642069742773206b696c6c696e67206d65')
print(test)

b'SGVscCEgSSBnb3QgWE9SIGV5ZXMgYW5kIGl0J3Mga2lsbGluZyBtZQ=='


In [69]:
# if your function works properly, no error should appear when running this line
assert hex_to_b64(b'48656c7021204920676f7420584f52206579657320616e642069742773206b696c6c696e67206d65') == b'SGVscCEgSSBnb3QgWE9SIGV5ZXMgYW5kIGl0J3Mga2lsbGluZyBtZQ=='

## 1-2. XOR'ing two bytestrings [4 pts]

Write a function called `xor_bytes` that takes two bytestrings of equal length and outputs their XOR. For example, if I have a bytestring
```
49207374696c6c206861766520584f522065796573206e6f772077686174
```
after hex-decoding it, and when I XOR it against
```
3a4f1e11060209001b04180100302a3e5045141c5345170a040016292955
```
it should output
```
736f6d656f6e652073656e642068656c70206d7920657965732061414821
```

In [14]:
from binascii import hexlify, unhexlify

def xor_bytes(a, b):
    #this is to see if the length of the two bytestrings are equal or not
    if len(a) != len(b):
        #return an empty bytes object if lengths are different
        return b''
    
    # XOR the byte strings and return the result as a bytes object
    return bytes(byte_a ^ byte_b for byte_a, byte_b in zip(a, b))

a = b'49207374696c6c206861766520584f522065796573206e6f772077686174'
b = b'3a4f1e11060209001b04180100302a3e5045141c5345170a040016292955'

test = xor_bytes(unhexlify(a), unhexlify(b))

print(unhexlify(a))
print(unhexlify(b))

print(test)

print(hexlify(test))

b'I still have XOR eyes now what'
b':O\x1e\x11\x06\x02\t\x00\x1b\x04\x18\x01\x000*>PE\x14\x1cSE\x17\n\x04\x00\x16))U'
b'someone send help my eyes aAH!'
b'736f6d656f6e652073656e642068656c70206d7920657965732061414821'


In [15]:
# if your function works properly, no error should appear when running this line
assert hexlify(xor_bytes(unhexlify(b'49207374696c6c206861766520584f522065796573206e6f772077686174'), unhexlify(b'3a4f1e11060209001b04180100302a3e5045141c5345170a040016292955'))) == b'736f6d656f6e652073656e642068656c70206d7920657965732061414821'

## 1-3. Single-byte XOR cipher [6 pts]

You could say that this is just the Caesar cipher but it works over binary strings instead. Anyways, I have encrypted the hex-encoded bytestring
```
264f0300190a4f0d06084f1c0a02061f1d06020a1c4f0e010b4f264f0c0e0101001b4f03060a4e
```
by XOR'ing it against a single character. Find out what key I used, and decrypt the message.

You can do this by hand (since there are only 256 possible keys to choose from), but don't!
**Write code to do this for you.**
Have some way of "scoring" a piece of English plaintext, like using character frequency, so you can try each possible key and just output the one with the best score.

Write a function called `try_decrypt_xor` that does just that, i.e., it takes a bytestring and outputs the key and the decrypted message (and optionally the score). And yes, you may implement your own functions as well.

In [16]:
# taken from http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
FREQ_TABLE = {'e': 12.02, 't': 9.10, 'a': 8.12, 'o': 7.68, 'i': 7.31, 'n': 6.95, 
              's': 6.28,  'r': 6.02, 'h': 5.92, 'd': 4.32, 'l': 3.98, 'u': 2.88, 
              'c': 2.71,  'm': 2.61, 'f': 2.30, 'y': 2.11, 'w': 2.09, 'g': 2.03, 
              'p': 1.82,  'b': 1.49, 'v': 1.11, 'k': 0.69, 'x': 0.17, 'q': 0.11, 
              'j': 0.10,  'z': 0.07, ' ': 19.18}

In [17]:
from binascii import unhexlify

def score_text(text):
    # Use frequency table to score the encryption
    return sum(FREQ_TABLE.get(chr(c).lower(), 0) for c in text)

def xor_bytes(data, key):
    #XOR the byte string with a single byte key.
    key_bytes = bytes([key]) * len(data)  # Repeat the key for the length of data
    return bytes(byte_a ^ byte_b for byte_a, byte_b in zip(data, key_bytes))

def try_decrypt_xor(c):
    #Try decrypting a bytestring XOR'd with a single-character key.
    ciphertext = unhexlify(c.strip())  # Convert hex to raw bytes
    best_score = 0
    best_key = None
    best_message = None

    for key in range(256):  # Try all possible single-byte keys
        decrypted = xor_bytes(ciphertext, key)
        try:
            score = score_text(decrypted)
            if score > best_score:
                best_score = score
                best_key = key
                best_message = decrypted
        except:
            continue  # Skip invalid decryption attempts

    return best_key, best_message, best_score

# Test
key, message, score = try_decrypt_xor(b"264f0300190a4f0d06084f1c0a02061f1d06020a1c4f0e010b4f264f0c0e0101001b4f03060a4e")

print("Key: " + str(key) + " (" + chr(key) + ")")
print("Message: " + str(message))
print("Score: " + str(score))


Key: 111 (o)
Message: b'I love big semiprimes and I cannot lie!'
Score: 332.99000000000007


## 1-4. Detect single-character XOR [6 pts]

I have a [file](http://lunchtimeattack.wtf/csci184.03/xor_strings.txt) that contains 420 64-character hex-encoded strings, and **only one** of them has been encrypted by single-character XOR. 

Find that string.

*Hint:* Your code from the previous item should help.

In [18]:
fin = open('xor_strings.txt', 'rb')

In [19]:
strings = fin.readlines()

In [20]:
from binascii import unhexlify

def score_text(text):
    # Score a string based on the frequency of English letters.
    return sum(FREQ_TABLE.get(chr(c).lower(), 0) for c in text)

def xor_with_key(data, key):
    #XOR the byte string with a single byte key.
    key_bytes = bytes([key]) * len(data)  # Repeat the key for the length of data
    return bytes(byte_a ^ byte_b for byte_a, byte_b in zip(data, key_bytes))

def try_decrypt_xor(c):
    ciphertext = unhexlify(c.strip())  # Convert from hex to raw bytes
    best_score = 0
    best_key = None
    best_message = None

    for key in range(256):  # Try all 256 possible single-byte keys
        decrypted = xor_with_key(ciphertext, key)
        try:
            score = score_text(decrypted)
            if score > best_score:
                best_score = score
                best_key = key
                best_message = decrypted
        except:
            continue  # Skip invalid results

    return best_key, best_message, best_score

def find_encrypted_string(strings):
    #Processes multiple hex strings and finds the one encrypted with single-character XOR.
    best_overall_score = 0
    best_string = None
    best_key = None
    best_decrypted_message = None

    for c in strings:
        key, message, score = try_decrypt_xor(c)
        if score > best_overall_score:  # Track the best result across all strings
            best_overall_score = score
            best_string = c
            best_key = key
            best_decrypted_message = message

    return best_string, best_key, best_decrypted_message, best_overall_score

# Example usage (assuming `strings` is already defined in a previous cell)
best_string, best_key, best_message, best_score = find_encrypted_string(strings)

# Test
if best_string:
    print("Encrypted string: " + best_string.strip().decode())
    print("Key: " + str(best_key) + " (" + chr(best_key) + ")")
    print("Decrypted message: " + str(best_message))
    print("Score: " + str(best_score))
else:
    print("No valid encrypted string found.")


Encrypted string: 761f59504a515b1f4b575a1f575e461f56511f5e1f515a5a5b535a4c4b5e5c54
Key: 63 (?)
Decrypted message: b'I found the hay in a needlestack'
Score: 290.3


## 1-5. Implement repeating-key XOR [4 pts]

Here are the first two lines of what people consider one of [Frank's greatest hits](https://youtu.be/ntULIFnj7MY):
```
Shawty had them apple bottom jeans,
The boots with the fur
```
Encrypt it using repeating-key XOR, with the key "`LOW`".

The way repeating-key XOR works is that the first byte of plaintext will be XOR'd against `L`, the next `O`, the next `W`, then `L` again for the 4th byte, and so on. In effect, it will look something like this:
```
Plaintext:   S h a w t y   h a d   t ...
Key:         L O W L O W L O W L O W ...
```
If this reminds you of the Vigenère cipher, you're absolutely right! This *is* just Vigenère, but we're working over binary strings instead.

Write a function called `xor_repeating` that takes two bytestrings corresponding to the plaintext and the key, and encrypts it using repeating-key XOR. Using the plaintext and key above, your function should output:
```
1f27363b3b2e6c2736286f23242a3a6c2e273c23326c2d38383b38216f
3d292e393f635d1827326c2d38233b246c383e3827773827326c29223e
```
when hex-encoded. (Note that this is a single hex string; it's only split into two lines to make it fit.)

In [21]:
from binascii import hexlify

#Get message, key, encrypt using key
def xor_repeating(m, key):
    key_length = len(key) #low
    encode = bytes([b ^ key[i % key_length] for i, b in enumerate(m)]) #go through each byte of m, cycle depending on key-leth
    return encode


# Test 
test = hexlify(xor_repeating(b'Shawty had them apple bottom jeans,\nThe boots with the fur', b'LOW'))
print(test)

b'1f27363b3b2e6c2736286f23242a3a6c2e273c23326c2d38383b38216f3d292e393f635d1827326c2d38233b246c383e3827773827326c29223e'


In [22]:
# if your function works properly, no error should appear when running this line
assert hexlify(xor_repeating(b'Shawty had them apple bottom jeans,\nThe boots with the fur', b'LOW')) == b'1f27363b3b2e6c2736286f23242a3a6c2e273c23326c2d38383b38216f3d292e393f635d1827326c2d38233b246c383e3827773827326c29223e'

## 1-6. Break repeating-key XOR [10 pts]

I have [another file here](http://lunchtimeattack.wtf/csci184.03/encrypted.txt).
It has been Base64-encoded after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

1. Let $n$ be the guessed key length. Try values from $n = 2$ up to $n = 40$.

2. Write a function called `hamming_dist` that computes the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between two bytestrings of equal length.
The Hamming distance is just the number of differing bits.
For example, the Hamming distance between `twelve plus one` and `ElEveN pLUs twO` is $30$. **Please make sure your function works correctly first before proceeding.**

3. For each $n$, take the first $n$ bytes and the second $n$ bytes, and find the Hamming distance between them. 
Divide the result by $n$ to normalize it.

4. The $n$ with the smallest normalized Hamming distance is *probably* the key. But just to make sure, you could proceed with the smallest 2 or 3 values of $n$, or you could take 4 blocks of $n$ instead of 2 and average their distances.

5. Now that you probably know $n$, split the ciphertext into blocks of length $n$.

6. Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on. For example, suppose we have the ciphertext split into blocks of length 5:
```
QWERT YUIOP ASDFG HJKLZ
```
then transposing these blocks will yield five blocks of length 4:
```
QYAH WUSJ EIDK ROFL TPGZ
```

7. Solve each block as if it was single-character XOR. You have code to do this, so go use that.

8. For each block, the single-byte XOR key that has the best score is the repeating-key XOR key byte for that block. All you need to do is to put them together and you have the key!

As before, you may implement additional functions of your own.

In [24]:
def hamming_dist(a, b):
    if len(a) != len(b):
        return b'' 
    return sum(bin(byte1 ^ byte2).count('1') for byte1, byte2 in zip(a, b))


#test hamming distance
print(hamming_dist(b'twelve plus one', b'ElEveN pLUs twO'))

30


In [25]:
# if your function works properly, no error should appear when running this line
assert hamming_dist(b'twelve plus one', b'ElEveN pLUs twO') == 30

In [26]:
fin = open('encrypted.txt', 'rb')

In [27]:
b64str = fin.read()

In [28]:
import base64
from binascii import unhexlify

def find_key_size(ciphertext, min_size=2, max_size=40):
    #Get the most likely key size by computing normalized Hamming distances. 
    key_sizes = []

    for key_size in range(min_size, max_size + 1):
        block1 = ciphertext[:key_size]
        block2 = ciphertext[key_size:key_size * 2]

        if len(block1) < key_size or len(block2) < key_size:
            continue  # Skip if blocks are incomplete

        distance = hamming_dist(block1, block2)
        normalized_distance = distance / key_size  # Normalize
        key_sizes.append((key_size, normalized_distance))

    # Sort by smallest normalized distance (most likely key sizes first)
    key_sizes.sort(key=lambda x: x[1])
    
    return [x[0] for x in key_sizes[:3]]  # Return the top 3 key sizes

def transpose_blocks(ciphertext, key_size):
    blocks = [b'' for _ in range(key_size)]
    
    for i in range(len(ciphertext)):
        block_index = i % key_size
        blocks[block_index] += bytes([ciphertext[i]])

    return blocks


def xor_with_key(data, key):
    #XOR the byte string with a single byte key.
    key_length = len(key)
    return bytes([data[i] ^ key[i % key_length] for i in range(len(data))])

def try_decrypt_xor(data):
    best_score = 0
    best_key = None

    for key in range(256):  # Try all 256 possible single-byte keys
        decrypted = xor_with_key(data, bytes([key]))
        try:
            score = sum(FREQ_TABLE.get(chr(c).lower(), 0) for c in decrypted)
            if score > best_score:
                best_score = score
                best_key = key
        except:
            continue  # Skip invalid results

    return best_key

def break_repeating_key_xor(ciphertext):
    #Break repeating-key XOR encryption.
    ciphertext = b64decode(ciphertext)  # Convert hex to bytes

    # Step 1: Find the best key sizes
    likely_key_sizes = find_key_size(ciphertext)

    for key_size in likely_key_sizes:
        # Step 2: Transpose ciphertext into blocks
        blocks = transpose_blocks(ciphertext, key_size)

        # Step 3: Solve each block as a single-character XOR
        key = bytes(try_decrypt_xor(block) for block in blocks)

        # Step 4: Decrypt the full message
        decrypted_text = xor_with_key(ciphertext, key)

        print("Possible Key Size: " + str(key_size))
        print("Possible Keys: " + str(key))
        print("Decrypt: " + str(decrypted_text))

    return key_size, key, decrypted_text

#test
key_size, key, decrypted_message = break_repeating_key_xor(b64str)


Possible Key Size: 2
Possible Keys: b'ne'
Decrypt: b'tLUD\x0fXR\x17:*P$Hmg is<aT-ne"ms! hceLdl se} \\dkz+h=2i~reNcecanseI-fx*,t1\'y4k\x16\x06\x03CHONUi7\'T+mh1ilo)\x19\x06\x03NARNAnBU,nEsetb\x7feAihr --,\n=7:~<-i0:*k\x16")Jap}n\x1a`nq&x=+o~:-Yzl beyn\x1aebd+"=\x0cn*n-],pear<-\x0e=+&~<=,t*m$K,aere<aTi\'o!y=&o\x7fv!\x18{hlk ho\x1ads8nMs! yu(],yeoppe\x1azfz%iyete:,L"\x03\nThy Chfdneneneme\x15=;,00,\n0CFD\x1cMI\nR0:\x11Pig it<gUy\'a/~p r&:6Wal icyb_\x7f`enax)to~i\x18e} be\x7faWh\'w ,t6lkt!\x14,hnd roM-s~+~xbs*v*L\x7f)of hr_ht6,i~$uy\x7feQx.s w}rWhu8D\x06I-e*c Y~)is roM-*\'~ -u0\x00\x10\x0by^[ATON:\x1a^h:nbr2 ~r Ji.s pyoJab6!b=1ho:,K`hnd }n^-s~+u:7e*x$Kejalle Ibubnc{ehkt"Qbn ouh Sc\'t+xj ed:1Pi)mourt[dieni|1id}eVy}s ozf\x1ayus+\x7f=$nn:0Kegg tte\x1aafb+\x7fietoy-Vceogy2 vdlsn\x7fi*noieYbm boklI#\r\x1c\x1adxeseo+\\,ff a<dUbut+`qecbs(]")The<y_lu6\'\x7f=+o}:h\r<9.\n\nSUn^NR\x0b,J\nRF^e\x10GFREA5:\x1aInx),y*nm4O2BHRRAHOh7\'_:+netb\x7feWy}sidy Mbuz* =$nn:1Pip haje\x1aybu&br)omce^~fm tte\x1akrb;~