# HV19.06 bacon and eggs

We get a text about Francis Bacon which looks kind of weird: Some letters are italic. Let's feed it to [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) to take a look!

In [2]:
import bs4

with open('francis.html') as f:
    soup = bs4.BeautifulSoup(f.read(), 'html.parser')

In [3]:
print(''.join(c.text for c in soup.p.contents if c.name == 'em'))  # All italic letters

FnnwsangiospnseahrveatneLdChcelEnldsweredeithelongteeitmnedflenalroutcienrouoBcoasbelthfefiricsHiworgessiliycncknoedgadypncirsgaareflbsvfeesnaestiorttyhergueiculdbahiespicalndtilaroaerssmoiieadgveAoughpcadaoutschamhtaioitaoglaticthgneioftonsbitfci


In [4]:
print(''.join(c for c in soup.p.contents if c.name is None))  # All other text

racis Baco a n Elish phloher ad tatsmn wo sed s Atorey Genral and as or anlor of gan. Hi orks ar citd w dvepi h scintfic mehod and reai inuti thgh he stific evltin.
an h en caled e athr o empim. s ks arued for th pobit of sietifi wle bse onl uon idutve eaonin nd cu oeration o vnt in tur. Mo mpanl,  ad scene co e ceved by us of a cet a mehodca ppch wheby cientist ai t avod mslin themsels. lthh is raticl ies ab u  etod, he Bconan methd, dd no have  ln-sing nfluene, e eral dea  he imprtace and posiily o a septcal methodology makes Bacon the father of the scientific method. This method was a new rhetorical and theoretical framework for science, the practical details of which are still central in debates about science and methodology.


Nothing interesting so far. A bit of research reveals [Bacon's cipher](https://en.wikipedia.org/wiki/Bacon's_cipher):
                                                       
*Bacon's cipher or the Baconian cipher is a method of message encoding devised by Francis Bacon in 1605. A message is concealed in the presentation of text, rather than its content.*
                                                       
Oh, that sounds exactly like what we're looking at! Let's try to convert our tags into A and B's.

In [7]:
def convert(tags):
    for t in tags:
        if t.name is None:  # text outside tags
            s = str(t)
            for c in s:
                yield 'A'
        elif t.name == 'em':  # text inside <em> tags
            for c in t.text:
                yield 'B'
        else:  # should never happen, but always good to be sure
            raise Exception(t.name)

In [15]:
data = list(convert(soup.p.contents))
print(''.join(data))

BAABAAAAAAAABABABABAAABBAAAAAAABABBABAAAAABAABAAABAABAAABAAAABBBAABAAAABAABAAAAAABAAAAAAAAAAABAABABBAABBABAAAAAABBABAABAAAABABAAAAAAABAABBBAABAAABBBAABAABBAABBABABAAAABAABAAAAAABAAAAAAAAAABAABBBAAABBABBAABBAAABBBAAABAAAABBBBAAAAAABAABABAABAAABABBAAABBABABAAAABAAAABBAABAAABAAABAAAABBBBABAAABBAABBBAAAAABAAAAAAAAAABAAABBBAABBABAAAAABAABAAAABABBBAABBBAAABAABAAAABAABAAAABAABABAAABAABAAAABABAAAABBBBABAABBAABAAAAAAABABABAABAAAABBAAABAAAABBABAABBBAABABAABBAABBBBAAAABAABAAAABBBABAABABBAAAAAAAAAABAAAAAABAABABBBBAABBAAABAAABAABABAABBBAAAAABBAAAABAAAAAAAABAAABAABAAAABAAABAABBBAABAAAAAAAABBAAABAAABBBAABAAABAABAAABAAABABAAAABBBABABBABABAABAAAABAAAABAAABAAAAAAABAAAABAAAABAAAAAABAABABABBABAAAABAAAAAABAAABBAABABBAAAABAAAABBABAAAAAABAAABAAAAAAAAAABABAABBAAABAAAABAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


Now let's write some code to convert that into plain text. In reality I used an [online tool](http://rumkin.com/tools/cipher/baconian.php) during the challenge, but it's relatively easy to convert it via Python.

In [11]:
import string

bacon_mapping = {f'{i:05b}': c for i, c in enumerate(string.ascii_uppercase)}
print(bacon_mapping)

bacon_mapping = {k.replace('0', 'A').replace('1', 'B'): v for k, v in bacon_mapping.items()}
print(bacon_mapping)

{'00000': 'A', '00001': 'B', '00010': 'C', '00011': 'D', '00100': 'E', '00101': 'F', '00110': 'G', '00111': 'H', '01000': 'I', '01001': 'J', '01010': 'K', '01011': 'L', '01100': 'M', '01101': 'N', '01110': 'O', '01111': 'P', '10000': 'Q', '10001': 'R', '10010': 'S', '10011': 'T', '10100': 'U', '10101': 'V', '10110': 'W', '10111': 'X', '11000': 'Y', '11001': 'Z'}
{'AAAAA': 'A', 'AAAAB': 'B', 'AAABA': 'C', 'AAABB': 'D', 'AABAA': 'E', 'AABAB': 'F', 'AABBA': 'G', 'AABBB': 'H', 'ABAAA': 'I', 'ABAAB': 'J', 'ABABA': 'K', 'ABABB': 'L', 'ABBAA': 'M', 'ABBAB': 'N', 'ABBBA': 'O', 'ABBBB': 'P', 'BAAAA': 'Q', 'BAAAB': 'R', 'BAABA': 'S', 'BAABB': 'T', 'BABAA': 'U', 'BABAB': 'V', 'BABBA': 'W', 'BABBB': 'X', 'BBAAA': 'Y', 'BBAAB': 'Z'}


In [21]:
def chunk(elems, n):
    for i in range(0, len(elems), n):
        yield elems[i:i + n]
        
chunks = [''.join(e) for e in chunk(data, 5)]
print(chunks)

['BAABA', 'AAAAA', 'AABAB', 'ABABA', 'AABBA', 'AAAAA', 'ABABB', 'ABAAA', 'AABAA', 'BAAAB', 'AABAA', 'ABAAA', 'ABBBA', 'ABAAA', 'ABAAB', 'AAAAA', 'ABAAA', 'AAAAA', 'AAABA', 'ABABB', 'AABBA', 'BAAAA', 'AABBA', 'BAABA', 'AAABA', 'BAAAA', 'AAABA', 'ABBBA', 'ABAAA', 'BBBAA', 'BAABB', 'AABBA', 'BABAA', 'AABAA', 'BAAAA', 'AABAA', 'AAAAA', 'AAABA', 'ABBBA', 'AABBA', 'BBAAB', 'BAAAB', 'BBAAA', 'BAAAA', 'BBBBA', 'AAAAA', 'BAABA', 'BAABA', 'AABAB', 'BAAAB', 'BABAB', 'AAAAB', 'AAAAB', 'BAABA', 'AABAA', 'ABAAA', 'ABBBB', 'ABAAA', 'BBAAB', 'BBAAA', 'AABAA', 'AAAAA', 'AAABA', 'AABBB', 'AABBA', 'BAAAA', 'ABAAB', 'AAAAB', 'ABBBA', 'ABBBA', 'AABAA', 'BAAAA', 'BAABA', 'AAABA', 'ABABA', 'AABAA', 'BAAAA', 'BABAA', 'AABBB', 'BABAA', 'BBAAB', 'AAAAA', 'AABAB', 'ABAAB', 'AAAAB', 'BAAAB', 'AAAAB', 'BABAA', 'BBBAA', 'BABAA', 'BBAAB', 'BBBAA', 'AABAA', 'BAAAA', 'BBBAB', 'AABAB', 'BAAAA', 'AAAAA', 'ABAAA', 'AAABA', 'ABABB', 'BBAAB', 'BAAAB', 'AAABA', 'ABABA', 'ABBBA', 'AAAAB', 'BAAAA', 'BAAAA', 'AAAAB', 'AAABA', 

In [22]:
print(''.join(bacon_mapping.get(chunk, '?') for chunk in chunks))

SAFKGALIEREIOIJAIACLGQGSCQCOI?TGUEQEACOGZRYQ?ASSFRVBBSEIPIZYEACHGQJBOOEQSCKEQUHUZAFJBRBU?UZ?EQ?FQAICLZRCKOBQQBCIITSAGEOISEKDVVEEEIBBBAJLIICGLBBUBCABJRBCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA?


Hmm, that looks bad. There even are some unknown combinations in there... We have spaces in our input, so we probably shouldn't count those! Let's try again:

In [23]:
def clean(s):
    return s.replace(' ', '')


def convert(tags):
    for t in tags:
        if t.name is None:  # text outside tags
            s = clean(str(t))   # NEW: cleaning the string here.
            for c in s:
                yield 'A'
        elif t.name == 'em':  # text inside <em> tags
            for c in clean(t.text):  # NEW: cleaning the string here.
                yield 'B'
        else:  # should never happen, but always good to be sure
            raise Exception(t.name)
            
data = list(convert(soup.p.contents))
chunks = [''.join(e) for e in chunk(data, 5)]
print(''.join(bacon_mapping.get(chunk, '?') for chunk in chunks))

SANTALIKESHISBACONBURQFZHJTUJAQBHGZTSHQJJCZ?ENCI?TOCAE??EQ?OJCRFEQY?WIDJGEOOK?YSHVQBCL?EJTQYQBFCJZAGI?ERFD?ZCEICEIKWRAJVRDQIQBJSIQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA?


Finally something readable: *Santa likes his bacon*! We're on the right track, but something still throws us off. Oh, right, sentences have punctiation marks. Let's try removing `.` and `,` too.

In [24]:
def clean(s):
    for c in [' ', ',', '.']:
        s = s.replace(c, '')
    return s

data = list(convert(soup.p.contents))
chunks = [''.join(e) for e in chunk(data, 5)]
print(''.join(bacon_mapping.get(chunk, '?') for chunk in chunks))

SANTALIKESHISBACONBUTALSOTHISBACONTHEPASSLHIRUJD??QQBHGREHTSIUJJEGHVSA?JRHHF?YSHVQBCL?EJTQYQBFCJZAGR?JCKHXSIRAJCCVUIC?YRYEIAUZEIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA?


Arrrgh. We get *Santa likes his bacon but also this bacon the pass* and then something throws us off again... Let's just filter out everything except characters then.

In [25]:
import re

def clean(s):
    return re.sub('[^a-zA-Z]', '', s)


data = list(convert(soup.p.contents))
chunks = [''.join(e) for e in chunk(data, 5)]
print(''.join(bacon_mapping.get(chunk, '?') for chunk in chunks))

SANTALIKESHISBACONBUTALSOTHISBACONTHEPASSWORDISHVXBACONCIPHERISSIMPLEBUTCOOLXREPLACEXWITHBRACKETSANDUSEUPPERCASEFORALLCHARACTERAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


There we go! *Santa likes his bacon but also this bacon, the password is "HVXBACONCIPHERISSIMPLEBUTCOOLX", replace "X" with brackets and use uppercase for all character.*

That sounds like the flag would be `HV{BACONCIPHERISSIMPLEBUTCOOL}` - however, that's wrong, as it should start with `HV19{`. After fixing that, the flag is the correct one.