# The Trouble with One-to-One Solution

First, let us take a look at what our ciphertext looks like:

In [1]:
file = open('ciphertext.txt', 'r')
ciphertext = file.read()
print(ciphertext)

QKLOJKT FJTNZWH ZA JKLEMDTOZK VJEJ YZTN AQHLOKQTOKX QKW JAAJLTOSJ.
UZKX YJAZEJ FZWJEK LZFDRTJEH, XEJQT FOKWH UOIJ QU-IOKWO DQSJW TNJ VQM.
IKZVUJWXJ QYZRT TNJ DQTTJEKH OK UQKXRQXJH VQH NOH TZZU ZA LNZOLJ.
OKTEOLQLOJH OK TNJ WOHTEOYRTOZK ZA UJTTJEH VJEJ NOH DUQMXEZRKW.
KRFJEZRH JKLEMDTJW TJGTH VJEJ WJLEMDTJW WRJ TZ NOH FJTNZWH.
WJUSJ WJJD OKTZ NOHTZEM, QKW MZR'UU AOKW QU-IOKWO'H FQEI OK LEMDTQKQUMHOH.
OT'H Q TJHTQFJKT TZ NOH XJKORH TNQT NOH FJTNZWH HTOUU EJHZKQTJ TZWQM.
WJLZWOKX FJHHQXJH EJBROEJH YZTN OKTROTOZK QKW Q IJJK RKWJEHTQKWOKX ZA DQTTJEKH.
JSJEM JKLEMDTJW UJTTJE NQH Q HTZEM, VQOTOKX TZ YJ RKEQSJUJW.
LZKLJDTH TNQT QU-IOKWO OKTEZWRLJW NQSJ UQOW TNJ XEZRKWVZEI AZE ARTREJ LEMDTQKQUMHTH.
ZATJK, UZZIOKX TZ TNJ DQHT LQK OUURFOKQTJ ZRE LREEJKT LNQUUJKXJH.
WEQVOKX OKHDOEQTOZK AEZF NOH VZEI LQK LJETQOKUM YJKJAOT QKM QHDOEOKX LZWJYEJQIJE.
JSJK TNJK, TZ AOKW TNJ AUQX, UZZI KZ ARETNJE TNQK TNJ AOEHT UJTTJE ZA JQLN HJKTJKLJ.
WZK'T AZEXJT TZ JKLQDHRUQTJ OK LREUM YEQLIJTH QKW DEJDJKW TNJ AUQX D

As our instructions mention, this message is likely enciphered with some substitution method. Seeing that ciphertext contains seemingly regular spacing and punctuation, it's likely this substitution cipher is only concerned with alphabetic characters. A naive approach to crack such a cipher may be to try all permutations of the English alphabet, but we are quick to realize how infeasible it is to examine $26! \approx 4.03 \cdot 10^{26}$  possibilities. Instead, we may exploit a partiticular weakness of one-to-one ciphers: the frequency of each letter.

English texts tend to follow a particular distribution of letters where some letters are more common than others. Particularly, 'E', 'T', 'A', and 'O' tend to be the most common characters (in that order). So let's first rank the commonality of each letter in our cihpertext:

In [2]:
alphabet = [chr(i) for i in range(ord('A'), ord('Z') + 1)]
freq = sorted(alphabet, key=lambda x: ciphertext.count(x), reverse=True)
freq[:4]

['J', 'T', 'K', 'O']

That is to say, 'J', 'T', 'K', and 'O' are the most common letters in our ciphertext. From here, to reverse the cipher, it is natural to assume 'J' maps to 'E', 'T' maps to 'T', 'K' maps to 'A', and so on. However, not every message is going to follow the typical distribution exactly (this is especially true for short messages and lipograms). Instead, let's start small and just assume 'E' maps to 'J'. Redacting all other letters, we see

In [3]:
inverse_map = {'J': 'E'}

def decipher(ciphertext, inverse_map):
    plaintext_attempt = ''
    for character in ciphertext:
        if character in inverse_map.keys():
            plaintext_attempt += inverse_map[character]
        elif character in alphabet:
            plaintext_attempt += '*'
        else:
            plaintext_attempt += character
    return plaintext_attempt

print(decipher(ciphertext, inverse_map))

****E** *E***** ** E********* *E*E **** *********** *** E**E****E.
**** *E***E ***E** ******E**, **E** ***** ***E **-***** ***E* **E ***.
*****E**E ***** **E ****E*** ** *******E* *** *** **** ** *****E.
*********E* ** **E ************ ** *E**E** *E*E *** **********.
***E**** E******E* *E*** *E*E *E*****E* **E ** *** *E*****.
*E**E *EE* **** *******, *** ***'** **** **-*****'* **** ** *************.
**'* * *E****E** ** *** *E**** **** *** *E***** ***** *E*****E *****.
*E****** *E****E* *E****E* **** ********* *** * *EE* ***E********* ** ****E***.
E*E** E******E* *E**E* *** * *****, ******* ** *E *****E*E*.
****E*** **** **-***** ********E* ***E **** **E ********** *** *****E *************.
***E*, ******* ** **E **** *** *********E *** ****E** *****E**E*.
******* *********** **** *** **** *** *E******* *E*E*** *** ******** ***E**E**E*.
E*E* **E*, ** **** **E ****, **** ** *****E* **** **E ***** *E**E* ** E*** *E**E**E.
***'* ****E* ** E*********E ** ***** *****E** *** **E*E** **E **** *

With some inspection, some patterns appear assuming 'J' truely does map to 'E'.  For example, words like '\*EE\*' may be 'BEEN' or 'SEEN', which may tell us 'K' maps to 'N'. Nevertheless, let's assume 'T' maps to 'T' (as counterintuitive as that may be).

inverse_map = {'J': 'E',
               'T': 'T',
              }

print(decipher(ciphertext, inverse_map))

Next, let's assume 'K' maps to 'A'.

In [4]:
inverse_map = {'J': 'E',
               'T': 'T',
               'K': 'A',
              }

print(decipher(ciphertext, inverse_map))

*A**EAT *ET**** ** EA****T**A *E*E **T* *****A*T*A* *A* E**E*T**E.
**A* *E***E ***E*A *****TE**, **E*T **A** ***E **-**A** ***E* T*E ***.
*A***E**E ****T T*E **TTE*A* *A **A****E* *** *** T*** ** *****E.
*AT******E* *A T*E ***T****T**A ** *ETTE** *E*E *** ********A*.
A**E**** EA****TE* TE*T* *E*E *E****TE* **E T* *** *ET****.
*E**E *EE* *AT* ***T***, *A* ***'** **A* **-**A**'* **** *A ****T*A******.
*T'* * TE*T**EAT T* *** *EA*** T**T *** *ET**** *T*** *E**A*TE T****.
*E****A* *E****E* *E****E* **T* *AT**T**A *A* * *EEA *A*E**T*A**A* ** **TTE*A*.
E*E** EA****TE* *ETTE* *** * *T***, ***T*A* T* *E *A***E*E*.
**A*E*T* T**T **-**A** *AT*****E* ***E **** T*E ****A***** *** **T**E ****T*A****T*.
**TEA, *****A* T* T*E ***T **A ******A*TE *** ****EAT *****EA*E*.
*****A* *A*****T**A **** *** **** **A *E*T**A** *EAE**T *A* ******A* ***E**E**E*.
E*EA T*EA, T* **A* T*E ****, **** A* ***T*E* T**A T*E ****T *ETTE* ** E*** *EATEA*E.
**A'T ****ET T* EA*******TE *A ***** *****ET* *A* **E*EA* T*E **** *

We see a couple instances of '\*A'; however, two letter words ending in 'A' are relatively uncommon in English, so let's take a step back. Still, a word like 'T\*E' may very well be 'THE', a very common word. Ultimately, by comparing the ciphertext with our previous attempted plaintext, this tells us 'N' may map to 'H'. Supposing as much, we have:

In [5]:
inverse_map = {'J': 'E',
               'T': 'T',
               'N': 'H',
              }

print(decipher(ciphertext, inverse_map))

****E*T *ETH*** ** E*****T*** *E*E **TH *******T*** *** E**E*T**E.
**** *E***E ***E** *****TE**, **E*T ***** ***E **-***** ***E* THE ***.
*****E**E ****T THE **TTE*** ** *******E* *** H** T*** ** *H***E.
**T******E* ** THE ***T****T*** ** *ETTE** *E*E H** **********.
***E**** E*****TE* TE*T* *E*E *E****TE* **E T* H** *ETH***.
*E**E *EE* **T* H**T***, *** ***'** **** **-*****'* **** ** ****T********.
*T'* * TE*T**E*T T* H** *E**** TH*T H** *ETH*** *T*** *E****TE T****.
*E****** *E****E* *E****E* **TH **T**T*** *** * *EE* ***E**T****** ** **TTE***.
E*E** E*****TE* *ETTE* H** * *T***, ***T*** T* *E *****E*E*.
****E*T* TH*T **-***** **T*****E* H**E **** THE ********** *** **T**E ****T******T*.
**TE*, ******* T* THE ***T *** ********TE *** ****E*T *H***E**E*.
******* *******T*** **** H** **** *** *E*T***** *E*E**T *** ******** ***E**E**E*.
E*E* THE*, T* **** THE ****, **** ** ***THE* TH** THE ****T *ETTE* ** E**H *E*TE**E.
***'T ****ET T* E********TE ** ***** *****ET* *** **E*E** THE **** *

Further, 'T\*' and 'TH\*T' may be 'TO' and 'THAT' respectively. Not to mention, single '\*' may be 'A' if not 'I'. Expanding our map, we see:

In [6]:
inverse_map = {'J': 'E',
               'T': 'T',
               'N': 'H',
               'Z': 'O',
               'Q': 'A',
              }

print(decipher(ciphertext, inverse_map))

A***E*T *ETHO** O* E*****T*O* *E*E *OTH *A****AT*** A** E**E*T**E.
*O** *E*O*E *O*E** *O***TE**, **EAT ***** ***E A*-***** *A*E* THE *A*.
**O**E**E A*O*T THE *ATTE*** ** *A***A*E* *A* H** TOO* O* *HO**E.
**T***A**E* ** THE ***T****T*O* O* *ETTE** *E*E H** **A***O***.
***E*O** E*****TE* TE*T* *E*E *E****TE* **E TO H** *ETHO**.
*E**E *EE* **TO H**TO**, A** *O*'** **** A*-*****'* *A** ** ****TA*A*****.
*T'* A TE*TA*E*T TO H** *E**** THAT H** *ETHO** *T*** *E*O*ATE TO*A*.
*E*O**** *E**A*E* *E****E* *OTH **T**T*O* A** A *EE* ***E**TA***** O* *ATTE***.
E*E** E*****TE* *ETTE* HA* A *TO**, *A*T*** TO *E ***A*E*E*.
*O**E*T* THAT A*-***** **T*O***E* HA*E *A** THE **O****O** *O* **T**E ****TA*A***T*.
O*TE*, *OO**** TO THE *A*T *A* *******ATE O** ****E*T *HA**E**E*.
**A**** ******AT*O* **O* H** *O** *A* *E*TA**** *E*E**T A** A******* *O*E**EA*E*.
E*E* THE*, TO **** THE **A*, *OO* *O ***THE* THA* THE ****T *ETTE* O* EA*H *E*TE**E.
*O*'T *O**ET TO E**A****ATE ** ***** **A**ET* A** **E*E** THE **A* *

With a deeper look into our ciphertext, we notice "MZR'UU". The apostrophe tells us the word is either a contraction or possessive. Given 'UU' follows the apostrophe and 'WILL' is commonly contracted, we surmise 'U' has been substituted for 'L'. Even further, we can make some fair guesses for our inverse map with words like 'EA\*H', 'THA\*', 'HA\*E', and 'O\*'.

In [7]:
inverse_map = {'J': 'E',
               'T': 'T',
               'N': 'H',
               'Z': 'O',
               'Q': 'A',
               'U': 'L',
               'L': 'C',
               'K': 'N',
               'S': 'V',
               'A': 'F',
              }

print(decipher(ciphertext, inverse_map))

ANC*ENT *ETHO** OF ENC***T*ON *E*E *OTH FA*C*NAT*N* AN* EFFECT*VE.
LON* *EFO*E *O*E*N CO***TE**, **EAT **N** L**E AL-**N** *AVE* THE *A*.
*NO*LE**E A*O*T THE *ATTE*N* *N LAN**A*E* *A* H** TOOL OF CHO*CE.
*NT**CAC*E* *N THE ***T****T*ON OF LETTE** *E*E H** *LA***O*N*.
N**E*O** ENC***TE* TE*T* *E*E *EC***TE* **E TO H** *ETHO**.
*ELVE *EE* *NTO H**TO**, AN* *O*'LL F*N* AL-**N**'* *A** *N C***TANAL****.
*T'* A TE*TA*ENT TO H** *EN*** THAT H** *ETHO** *T*LL *E*ONATE TO*A*.
*ECO**N* *E**A*E* *E****E* *OTH *NT**T*ON AN* A *EEN *N*E**TAN**N* OF *ATTE*N*.
EVE** ENC***TE* LETTE* HA* A *TO**, *A*T*N* TO *E *N*AVELE*.
CONCE*T* THAT AL-**N** *NT*O**CE* HAVE LA** THE **O*N**O** FO* F*T**E C***TANAL**T*.
OFTEN, LOO**N* TO THE *A*T CAN *LL***NATE O** C***ENT CHALLEN*E*.
**A**N* *N****AT*ON F*O* H** *O** CAN CE*TA*NL* *ENEF*T AN* A*****N* CO*E**EA*E*.
EVEN THEN, TO F*N* THE FLA*, LOO* NO F**THE* THAN THE F***T LETTE* OF EACH *ENTENCE.
*ON'T FO**ET TO ENCA***LATE *N C**L* **AC*ET* AN* **E*EN* THE FLA* *

From here, with some extra work, we can find our plaintext:

In [8]:
inverse_map = {'J': 'E',
               'T': 'T',
               'N': 'H',
               'Z': 'O',
               'Q': 'A',
               'U': 'L',
               'L': 'C',
               'K': 'N',
               'S': 'V',
               'A': 'F',
               'O': 'I',
               'H': 'S',
               'W': 'D',
               'X': 'G',
               'F': 'M',
               'Y': 'B',
               'V': 'W',
               'M': 'Y',
               'R': 'U',
               'E': 'R',
               'D': 'P',
               'I': 'K',
               'G': 'X',
               'B': 'Q',
              }

print(decipher(ciphertext, inverse_map))

ANCIENT METHODS OF ENCRYPTION WERE BOTH FASCINATING AND EFFECTIVE.
LONG BEFORE MODERN COMPUTERS, GREAT MINDS LIKE AL-KINDI PAVED THE WAY.
KNOWLEDGE ABOUT THE PATTERNS IN LANGUAGES WAS HIS TOOL OF CHOICE.
INTRICACIES IN THE DISTRIBUTION OF LETTERS WERE HIS PLAYGROUND.
NUMEROUS ENCRYPTED TEXTS WERE DECRYPTED DUE TO HIS METHODS.
DELVE DEEP INTO HISTORY, AND YOU'LL FIND AL-KINDI'S MARK IN CRYPTANALYSIS.
IT'S A TESTAMENT TO HIS GENIUS THAT HIS METHODS STILL RESONATE TODAY.
DECODING MESSAGES REQUIRES BOTH INTUITION AND A KEEN UNDERSTANDING OF PATTERNS.
EVERY ENCRYPTED LETTER HAS A STORY, WAITING TO BE UNRAVELED.
CONCEPTS THAT AL-KINDI INTRODUCED HAVE LAID THE GROUNDWORK FOR FUTURE CRYPTANALYSTS.
OFTEN, LOOKING TO THE PAST CAN ILLUMINATE OUR CURRENT CHALLENGES.
DRAWING INSPIRATION FROM HIS WORK CAN CERTAINLY BENEFIT ANY ASPIRING CODEBREAKER.
EVEN THEN, TO FIND THE FLAG, LOOK NO FURTHER THAN THE FIRST LETTER OF EACH SENTENCE.
DON'T FORGET TO ENCAPSULATE IN CURLY BRACKETS AND PREPEND THE FLAG P

Note that our inverse map is not quite complete as letters like 'J' and 'Z' are not in our plaintext. Even still, we have enough information to recover our flag: cygame{ALKINDIDECODED}.