[59 XOR Decryption](https://projecteuler.net/problem=59)

Each character on a computer is assigned a unique code and the preferred standard is ASCII (American Standard Code for Information Interchange). For example, uppercase A = 65, asterisk (*) = 42, and lowercase k = 107.

A modern encryption method is to take a text file, convert the bytes to ASCII, then XOR each byte with a given value, taken from a secret key. The advantage with the XOR function is that using the same encryption key on the cipher text, restores the plain text; for example, 65 XOR 42 = 107, then 107 XOR 42 = 65.

For unbreakable encryption, the key is the same length as the plain text message, and the key is made up of random bytes. The user would keep the encrypted message and the encryption key in different locations, and without both "halves", it is impossible to decrypt the message.

Unfortunately, this method is impractical for most users, so the modified method is to use a password as a key. If the password is shorter than the message, which is likely, the key is repeated cyclically throughout the message. The balance for this method is using a sufficiently long password key for security, but short enough to be memorable.

Your task has been made easy, as the encryption key consists of three lower case characters. Using 0059_cipher.txt (right click and 'Save Link/Target As...'), a file containing the encrypted ASCII codes, and the knowledge that the plain text must contain common English words, decrypt the message and find the sum of the ASCII values in the original text.

In [1]:
import numpy as np
import re
import os
import io

file = './0059_cipher.txt'

with io.open(file, 'r') as tha_file:
    data = tha_file.read()
    tha_file.close()
    
data = np.array(
    data.split(','),
    dtype = int
)

In [2]:
# get all collections of possible codes. Three lower case characters.
ord('a'), ord('z')

(97, 122)

In [3]:
possible_characters = list(map(chr, range(97,122+1)))

possible_codes = np.array(
    [
            [a,b,c]
        for a in possible_characters
        for b in possible_characters
        for c in possible_characters
    ]
)
possible_codes

array([['a', 'a', 'a'],
       ['a', 'a', 'b'],
       ['a', 'a', 'c'],
       ...,
       ['z', 'z', 'x'],
       ['z', 'z', 'y'],
       ['z', 'z', 'z']], dtype='<U1')

In [4]:
len(possible_codes)

17576

In [5]:
data

array([36, 22, 80, ..., 23, 11, 94])

In [6]:
possible_codes

array([['a', 'a', 'a'],
       ['a', 'a', 'b'],
       ['a', 'a', 'c'],
       ...,
       ['z', 'z', 'x'],
       ['z', 'z', 'y'],
       ['z', 'z', 'z']], dtype='<U1')

Game plan:
- XOR each possible code with the data
- count the number of the common english words that show up.

In [31]:
common_english_words = ['the', 'and', 'an', 'be', 'to', 'of', 'in', 'that', 'have']

In [59]:
def repeat_code_to_length(code:list, new_length:int):
    num_times_multiply = 1 + int(np.ceil(new_length / len(code)))
    # return num_times_multiply
    new_list = (list(code)*num_times_multiply)[:new_length]
    return new_list

def code_list_to_ord(code_list):
    ord_list = [
        ord(character)
        for character in code_list
    ]
    return ord_list

def ord_list_to_code(ord_list):
    code_list = [
        chr(ordinal)
        for ordinal in ord_list
    ]
    return ''.join(code_list)

In [45]:
list(possible_codes[0])

['a', 'a', 'a']

In [56]:
possible_codes_extended = [
    code_list_to_ord(
        repeat_code_to_length(
            this_code,
            len(data)
        )
    )
    
    for this_code in possible_codes
]

In [70]:
decoded_datas = [
    ord_list_to_code(data^this_code)
    for this_code in possible_codes_extended
]

In [72]:
decoded_datas[0]

'Ew1aaevxrp9eertj9wvv|$mya9xjmck}dgmxkw1k\x7f1kwt$vw$\\dh|c#j1ivbp9rautfkpp|u$ipt|cw51&]t$jditxw9bakxakdi9cazxtk~gxcqt3$B^j9el|1wl|w9~b9bakxaj1k\x7f1v|rmickzphjL>9X$qpr|1v|raweh`1bvdj}=$hdmmt$l\x7faaaazea}}}51ew1autcx\x7fp9t|icajbmv\x7f$\x7f~v9el|1awemkt$jdi9~b9elpb$jtvptw9 $2156%$2156($2156 29:$|eg7=$nymzy$}tt|\x7f`j1kw1pqt$hde}cemdv|1k\x7f1pqt$zxvz}a51wv1pqpp9xb9el|1pkda9bqt1k\x7f1pqxw9bakxaj1mj1k{eep\x7fa}=$\x7fckt1mm1em1kwra9el|1ulp`kpplca9~b9el|1gpcgut$\x7f~hu~sj?$Wpi|}}51M9yeot$\x7f~qwu$myem1pqt$jdi9~b9elpb$jtvptw9xw9p$jx|my$ipvm1k\x7f1pqt$j`qxca9~b9el|1t|cmttp|c$vw$mya9rmkrh|1sq~w|1`ppi|eak1mj15"1kk1f`1tlepp\x7fc9el|1wl|$vw$mymj1w|cm|b$|`qx}$m~$j=$pe$qpw9el|1vxemv1whcp1\'-9|quemi}m|u${h$j1pv159~b9el|1t|cmttp|c$m~$mya9umx|amtv71M9fmu}$j~kw1wq~s9elxe$mya9bqt1k\x7f1pqxw9bakxaj1pv1f|1eiavvimtpp|}}9 */%0 "0)\'2!%6+\'0*\'0"1ewu$\x7fckt1il}ppah`xj~1pqxw9\x7fqtsak1f`1wpi(9pj}1pqtj9eerxj~1pqt$j`qxca9ckve(9el|1jl|f|c$*?5- 1 #2,"1!(3 "6*)$pb$p\x7f`|t`9avvuqzt`51sqxgq1aaav|bw|b$mya9aakxi|ea

In [85]:
def count_common_english_words(text, common_words, total = True):
    word_counts = [
        len(re.findall(word, text))
        for word in common_words
    ]
    if total:
        return list(map(len, word_counts))
    else:
        return word_counts

In [86]:
re.findall('hvfm',decoded_datas[0])

['hvfm']

In [92]:
english_word_counts = list(map(sum,[
    count_common_english_words(decoded_data, common_english_words, False)
    for decoded_data in decoded_datas
]))

In [94]:
np.argmax(english_word_counts)

3317

In [89]:
count_common_english_words('the boy is the fox', common_english_words, False)

[2, 0, 0, 0, 0, 0, 0, 0, 0]

In [95]:
decoded_datas[3317]

'An extract taken from the introduction of one of Euler\'s most celebrated papers, "De summis serierum reciprocarum" [On the sums of series of reciprocals]: I have recently found, quite unexpectedly, an elegant expression for the entire sum of this series 1 + 1/4 + 1/9 + 1/16 + etc., which depends on the quadrature of the circle, so that if the true sum of this series is obtained, from it at once the quadrature of the circle follows. Namely, I have found that the sum of this series is a sixth part of the square of the perimeter of the circle whose diameter is 1; or by putting the sum of this series equal to s, it has the ratio sqrt(6) multiplied by s to 1 of the perimeter to the diameter. I will soon show that the sum of this series to be approximately 1.644934066842264364; and from multiplying this number by six, and then taking the square root, the number 3.141592653589793238 is indeed produced, which expresses the perimeter of a circle whose diameter is 1. Following again the same s

In [96]:
data[3317]

IndexError: index 3317 is out of bounds for axis 0 with size 1455

In [98]:
unorded_datas = [
    data^this_code
    for this_code in possible_codes_extended
]
sum(unorded_datas[3317])

129448