## Task 1: Convert hex to base64

The string: "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d"

Should produce: "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t"

https://www.cryptopals.com/sets/1/challenges/1

In [1]:
import base64

In [2]:
byte_array = bytearray.fromhex('49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d')
print(byte_array)

bytearray(b"I\'m killing your brain like a poisonous mushroom")


In [3]:
b64_value = base64.b64encode(byte_array)
print(b64_value)

b'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'


In [4]:
check_value = 'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'

print(b64_value == check_value)

False


In [5]:
print(type(b64_value), type(check_value))

<class 'bytes'> <class 'str'>


The reason they differ is because b64_value is a bytes type, but check_value is string

We need to use the method .decode() with the standard UTF-8 encoding format to make b64_value a string

In [6]:
b64_value_str = b64_value.decode('utf-8')
print(b64_value_str)

SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t


In [7]:
print(b64_value_str == check_value)

True


## Task 2: Fixed XOR

https://www.cryptopals.com/sets/1/challenges/2

First, what is XOR and buffers, and why they should produce a XOR combination.

**XOR**: operation that outputs true (1) and only if its binary/bit inputs are different, and false (0) if they are the same. For example, if one value is 1, and the other is 0, applying XOR will give the outcome of 1. 

**Buffer**: a binary code of different length working as the input for the XOR combination.

**XOR in encryption**: there is data/message that you want to transmit (plaintext), and then there's a key. Each of them represent one of the buffers for creating a certain XOR combination. Combination is ciphertext, public encrypted data. If you have key, it's used to decipher the XOR combination and find out the 1st buffer (plaintext).

For this task, I assign both buffers a bytes data type because it's can't be changed when transferred.

In [8]:
import operator

In [9]:
# Function for creating a XOR combination

def xor_buffer(buffer1: bytes | bytearray, buffer2: bytes | bytearray) -> bytes:
    # make sure both buffers have the exact same length
    if len(buffer1) != len(buffer2):
        raise ValueError("Buffers must have an equal length")
    xor_combo = bytes(map(operator.xor, buffer1, buffer2))
    
    return xor_combo

In [10]:
# Example of a XOR combination

buffera = b'\x0A'
bufferb = b'\x05'

xor_combo_example = xor_buffer(buffera, bufferb)
print(xor_combo_example)

b'\x0f'


In [11]:
# Example with real values

## Key and data strings must have the same length 

key = b'secretkey is this!!!'
data = b'this is the message!'

try:
    ciphertext = xor_buffer(key, data)
    print(f"key: {key}")
    print(f"message: {data}")
    print(f"ciphertext: {ciphertext}")
    print(f"ciphertext hex: {ciphertext.hex()}")
    ## decrypt the combination with the key
    decrypted_data = xor_buffer(ciphertext, key)
    print(decrypted_data)
except ValueError as e:
    print(f"Error: {e}")

key: b'secretkey is this!!!'
message: b'this is the message!'
ciphertext: b'\x07\r\n\x01E\x1d\x18E\rH\x0cSM\x11\x1b\x1a\x12FD\x00'
ciphertext hex: 070d0a01451d18450d480c534d111b1a12464400
b'this is the message!'


For the task, I learned one more concept, key repetitions, because it allows to combine two buffers even if their length is different. Stretched key is the key that's extended to the plaintext's length by repeating it (key) N time + some number of remaining symbols.

In [12]:
key = b'secretkey'
data = b'this is my message!'

In [13]:
# learn how many times the key needs to be repeated

data_length = len(data)
key_length = len(key)

repetitions = data_length // key_length

remaining = data_length % key_length

print(data_length, key_length)
print(repetitions, remaining)

19 9
2 1


In [14]:
## plaintext's length is 19, key's is 9, so it has to be repeated 2 times + 1 remaining symbol

stretched_key = key * repetitions + key[:remaining]
print(stretched_key)

b'secretkeysecretkeys'


In [15]:
try:
    # use the stretched key
    ciphertext = xor_buffer(stretched_key, data)
    print(f"Stretched key: {stretched_key.decode(errors='ignore')}")
    print(f"Message: {data.decode()}")
    print(f"Ciphertext (hex): {ciphertext.hex()}")
    
    # decrypt the combination with stretched key
    decrypted_data = xor_buffer(ciphertext, stretched_key)
    print(f"Decrypted data: {decrypted_data.decode()}")
    
except ValueError as e:
    # This error should no longer happen
    print(f"Error: {e}")

Stretched key: secretkeysecretkeys
Message: this is my message!
Ciphertext (hex): 070d0a01451d1845140a450e1716070a021c52
Decrypted data: this is my message!


In [16]:
# Task

hex_value1 = "1c0111001f010100061a024b53535009181c"
hex_value2 = "686974207468652062756c6c277320657965"
expected_hex = "746865206b696420646f6e277420706c6179"

buffer1 = bytes.fromhex(hex_value1)
buffer2 = bytes.fromhex(hex_value2)

expected_bytes = bytes.fromhex(expected_hex)

print(expected_bytes)

b"the kid don't play"


In [17]:
# verify buffer length

if len(buffer1) != len(buffer2) or len(buffer1) != len(expected_bytes):
    print("Error: lengths don't match")
else:
    print(f"Length: {len(buffer1)} bytes")

    try:
        # XOR combination
        xor_combo = xor_buffer(buffer1, buffer2)
        
        # Check if values match
        if xor_combo == expected_bytes:
            print("Test passed")
            # Decode to ASCII to see the message
            print(f"Decoded message: {xor_combo.decode('utf-8')}")
        else:
            print("Test failed")
            print(f"Expected hex: {expected_hex}")
            print(f"Actual hex:   {xor_combo.hex()}")

    except ValueError as e:
        print(f"Test failed due to diff lengths: {e}")

Length: 18 bytes
Test passed
Decoded message: the kid don't play


## Task 3: Single-byte XOR cipher

https://www.cryptopals.com/sets/1/challenges/3

Unfortunately I couldn’t solve this task on my own and had to resort to searching for solutions online. It turned out that the common solution is to use frequency analysis and then finding the highest score to match the cipher with the buffer (single character). 

I used the logic of this solution, although it’s in Go: https://dev.to/stefanalfbo/single-byte-xor-cipher-1mlo

There is at least one more for Python that uses Fitting Quotient, to measure matching between two letter frequency distributions: https://www.codementor.io/@arpitbhayani/deciphering-single-byte-xor-ciphertext-17mtwlzh30

In my code: 

1. Starting point is xor_buffer (already done in the previous task)
2. I create a frequency dictionary that has frequency values for 26 English letters.
3. The calculate_score function iterates through all characters of a potenial message. If a character is a letter, its frequency value from the dictionary is added to the total score. Spaces get a reward score because they’re an indicator that the decrypted message is plaintext (it’s a cheat since there’s a hint about that). The solution doesn’t work without adding spaces to the equation. Other characters get a penalty.
4. The solve_single function decodes the hex string we know about into a byte buffer.
5. It then begins a brute-force loop that iterates through all 256 possible single-byte key values. Inside the loop, the single key byte is repeated to create a full-length key_stream matching the cipher length.
6. The XOR combination function is applied to the ciphertext and the key_stream to produce a potential message. It is decoded into a string.
7. The calculate_score function evaluates the resulting plaintext string and returns its current_score. If it’s higher than best_score, function updates best_score, best_key, and best_message.
8. Highest best_score signals about the likely found plaintext.
9. Finally, the function prints the key in bytes and integer, along with the decrypted message string.

In [18]:
from typing import Dict, Tuple
from collections import Counter # Counter for frequency analysis

In [19]:
# Create a dictionary for character frequency, assuming the encoded message is plain English 

eng_freq: Dict[str, float] = {'a': 8.2389258,    'b': 1.5051398,    'c': 2.8065007,    'd': 4.2904556,
    'e': 12.813865,    'f': 2.2476217,    'g': 2.0327458,    'h': 6.1476691,
    'i': 6.1476691,    'j': 0.1543474,    'k': 0.7787989,    'l': 4.0604477,
    'm': 2.4271893,    'n': 6.8084376,    'o': 7.5731132,    'p': 1.9459884,
    'q': 0.0958366,    'r': 6.0397268,    's': 6.3827211,    't': 9.1357551,
    'u': 2.7822893,    'v': 0.9866131,    'w': 2.3807842,    'x': 0.1513210,
    'y': 1.9913847,    'z': 0.0746517
                             }

In [20]:
def calculate_score(text_attempt: str) -> float:    
    score = 0.0
    
    for char in text_attempt:
        # character to lowercase
        lower_char = char.lower()
        
        # Check if the character is in English
        if lower_char in eng_freq:
            # add its frequency weight if it's a letter
            score += eng_freq[lower_char]
        elif char == ' ':
            # add blank space to scoring
            score += 10.0
        else:
            score -= 10.0
             
    return score

In [21]:
def solve_single(hex_cipher: str) -> Tuple[bytes, int, str]:    
    # Decode hex into bytes
    cipher_bytes = bytes.fromhex(hex_cipher)
    cipher_length = len(cipher_bytes)    
    
    # Track best score
    best_score = -float('inf')
    best_key = 0
    best_message = ""
    
    for key_int in range(256):
        # create the stretched key
        key_byte = bytes([key_int])
        key_stream = key_byte * cipher_length
        
        # perform XOR, calculate the potential plaintext
        try:
            plaintext_bytes = xor_buffer(cipher_bytes, key_stream) 
        except ValueError:
            continue 

        # Decode into a string (using 'ignore' for safety, though 'utf-8' is common)
        try:
            potential_plaintext = plaintext_bytes.decode('utf-8')
        except UnicodeDecodeError:
            continue
            
        # Score: evaluate the potential plaintext using the new metric
        current_score = calculate_score(potential_plaintext)
        
        # check results
        if current_score > best_score:
            best_score = current_score
            best_key = key_int
            best_message = potential_plaintext
            
    return bytes([best_key]), best_key, best_message, best_score

In [22]:
hex_ciphertext = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"

In [23]:
key_bytes, key_int, message, current_score = solve_single(hex_ciphertext)

print(f"Result: key hex: {key_bytes.hex()}, key string: {chr(key_int)}, decrypted message: {message.strip()}, score: {current_score}")

Result: key hex: 58, key string: X, decrypted message: Cooking MC's like a pound of bacon, score: 187.52963219999995


## Task 4: Detect single-character XOR

We keep using the XOR combination and single-character finder solution from the previous tasks, which:
1. Goes through the text file by taking a hex string,
2. Attempts all 256 keys
3. Scores the output
4. Returns the best candidate (key, message, score) for that line.

I add a new function called single_xor. It:
1. Sets variables, so that the first found valid result will become the baseline
2. Opens the target file
3. Reads its content line by line, processing them (removes whitespace, filters out empty lines) in a loop
4. Each line is converted into bytes and compared againsth the key
5. Every line goes through scoring
6. If the current line's score is higher than the best score, it is saved as the new found target
7. After the loop finishes, it saves the best score and prints the decoded message with its key, line number and the key.

In [24]:
def single_xor(filepath: str) -> Dict:
    # Initialize vars for the best result
    best_file_score = -float('inf')
    best_result = None
    ciphertexts = 0

    try:
        with open(filepath, 'r') as f:
            # Read all lines and strip whitespace, filtering out empty lines
            ciphertexts = [line.strip() for line in f if line.strip()]
    except FileNotFoundError:
        print(f"Error: file not found: {filepath}")
        return None
    
    print(f"Cipher count: {len(ciphertexts)}")
    
    # Process each line
    for line_number, hex_cipher in enumerate(ciphertexts): 
        key_bytes, key_int, message, score = solve_single(hex_cipher)        
        
        if score > best_file_score:
            best_file_score = score
            best_result = {
                'line_number': line_number + 1,
                'ciphertext': hex_cipher,
                'decrypted_message': message.strip(),
                'key_char': chr(key_int),
                'score': best_file_score
            }
            
    return best_result

In [25]:
file_text = r"C:\haaga-helia\cyber\4-file.txt"

In [26]:
print(file_text)

C:\haaga-helia\cyber\4-file.txt


In [27]:
# Find the best result across the entire file
final_result = single_xor(file_text)

if final_result:    
    print("Successful")    
    print(f"Original cipher: {final_result['ciphertext']}")
    print(f"Line number: {final_result['line_number']}")
    print(f"Key (char): '{final_result['key_char']}'")
    print(f"Final score: {final_result['score']:.2f}")
    print("\nDecrypted Message:")
    print(f"-> {final_result['decrypted_message']}")
else:
    print("No English plaintext detected with enough score")

Cipher count: 327
Successful
Original cipher: 7b5a4215415d544115415d5015455447414c155c46155f4058455c5b523f
Line number: 171
Key (char): '5'
Final score: 179.70

Decrypted Message:
-> Now that the party is jumping
