# The Shakespeare Ciphers in Python

## Introduction

<img src="https://upload.wikimedia.org/wikipedia/commons/c/c4/William_Shakespeare_-_First_Folio_1623.jpg" style="float: right; width:40%" />
The question if there are hints about a possible alternative theory on the origin of Shakespeare's writings, or other kinds or 'secret' information, hidden in some form of cipher accessible in the original folios of the published works is a subject of long-standing debate.

Before we continue, it is important to acknowledge that the current scientific consensus-majority neither questions the authorship of William Shakespeare nor supports evidence for the existence of ciphers in Shakespeare's work.

So, before we start here some pointers to the main critics of the potential existence of ciphers within the works of William Shakespeare:

- William Friedman and Elizebeth Friedman, _The Shakespearean Ciphers Examined_, Cambridge University Press, 1957

- William H. Sherman, _[How to make anything signify anything](http://www.cabinetmagazine.org/issues/40/sherman.php), William F. Friedman and birth of modern cryptanalysis._, 2010-2011

The famous cryptographers Elizebeth and William Friedman researched the subject of ciphers within Shakespeare's work, and the result of their study for negative.

The human mind is very susceptible to recognizing patterns where there is no underlying cause (i.e. castles in the clouds). Critical analysis of any assumption of 'patterns' as done in Sherman's work is therefore of outmost importance.

So, with the armour of critical analysis and a good dose of Python, let's have a look at the data!

## Step One: Acquiring the data

For our analysis, we will make use of many different data-sources: the original works of Shakespeare as Text, as scans, dictionaries and images. We will collect the data from freely available internet source. We will require:

- The text of all works of William Shakespeare (from Project Gutenberg),
- the scans of the original first edition of Shakespeare's work (the 'Folio') from archive.org
- Dictionary data of valid English words
- And images of Tombstones, title pages, mostly from wikipedia.

### Downloading the data, filter, and word-lists

First, we will download the text of the "Complete works" from Project Gutenberg, we will filter the text (removing the non-Shakespearean blurbs) and the result will be in `data/complete_shakespeare.txt`.

Then, we will download a list of UK English words and store it do `data/ukenglish.txt`

In [58]:
import urllib.request
import os
import io
import zipfile
from io import BytesIO
from zipfile import ZipFile


def download_text(url, destination_directory, filename):
    """Download text from URL, if not already present in destination_directory."""
    import os
    if not os.path.exists(destination_directory):
        os.makedirs(destination_directory)
    filepath = os.path.join(destination_directory, filename)
    if not os.path.exists(filepath):
        try:
            urllib.request.urlretrieve(url, filepath)
        except Exception as e:
            print(f"Failed to download {url} to {filepath}: {e}")
            return None
        print(f"Downloaded {url} to {filepath}")
    else:
        print(f"Reading from cached file {filepath}")
    with open(filepath) as f:
        text = f.read()
    return text

url = "https://www.gutenberg.org/cache/epub/100/pg100.txt"
text = download_text(url, "data", "gutenberg_complete_shakespeare_raw.txt")

Reading from cached file data/gutenberg_complete_shakespeare_raw.txt


In [45]:
def filter_text(text, destination_directory, filename):
    """Filter out the Project Gutenberg header and footer."""
    filepath = os.path.join(destination_directory, filename)
    if os.path.exists(filepath):
        print(f"Reading from cached file {filepath}")
        with open(filepath) as f:
            return f.read()
    start_tok = "WILLIAM SHAKESPEARE ***"
    end_tok = "*** END OF THE PROJECT GUTENBERG EBOOK"
    start = text.find(start_tok)
    if start == -1:
        print("Unexpected text format")
        return None
    start += len(start_tok)
    while text[start] == '\n':
        start += 1

    end = text.find(end_tok)
    if end == -1:
        print("Unexpected text format")
        return None
    while text[end-1] == '\n':
        end -= 1
    filtered_text = text[start:end]
    # Save filtered_text
    with open(filepath, "w") as f:
        f.write(filtered_text)
    return filtered_text

In [80]:
def get_words(corpus):
    """Get a list of unique words from a text corpus."""
    punctuation = ['.', ',', '!', '?', ';', ':', '(', ')', '[', ']', '{', '}', '"', "'"]
    for p in punctuation:
        corpus = corpus.replace(p, '')
    words = list(set(corpus.split()))
    filtered_words = []
    for word in words:
        if word.isalpha():
            filtered_words.append(word.lower())
    filtered_words = list(set(filtered_words))
    return filtered_words

In [113]:
complete_shakespeare_text = filter_text(text, "data", "complete_shakespeare.txt")
shakespearean_words = get_words(complete_shakespeare_text)
print(len(shakespearean_words), shakespearean_words[:10])

Reading from cached file data/complete_shakespeare.txt
23364 ['ambiguous', 'incarnate', 'charactered', 'wave', 'impertinency', 'amities', 'dexterity', 'marted', 'intended', 'redeeming']


In [76]:
def get_uk_english_word_list(directory):
    filename = "ukenglish.txt"
    filepath = os.path.join(directory, filename)
    if os.path.exists(filepath):
        print(f"Reading from cached file {filepath}")
        with open(filepath, 'rb') as f:
            words = f.read().decode('cp850').split('\n')
    else:
        # http://www.gwicks.net/textlists/ukenglish.zip
        url = "http://www.gwicks.net/textlists/ukenglish.zip"
        response = urllib.request.urlopen(url)
        try:
            zipcontent = response.read()
        except Exception as e:
            print(f"Failed to download {url}: {e}")
            return None
        try:
            z = ZipFile(BytesIO(zipcontent))
            z.extractall(directory)
        except Exception as e:
            print(f"Failed to extract {url}: {e}")
            return None
        # read words:
        with open(f"{directory}/ukenglish.txt", 'rb') as f:
            data = f.read().decode('cp850')
        words = data.split('\n')

    # Filter words
    
    filtered_words = []
    for word in words:
        if word != word.lower() or not word.isalpha():
            continue
        filtered_words.append(word)
    return filtered_words

In [77]:
uk_word_list = get_uk_english_word_list("data")

Reading from cached file data/ukenglish.txt


In [85]:
print(len(uk_word_list), uk_word_list[:10])

82093 ['a', 'aa', 'aaa', 'aachen', 'aardvark', 'aardvarks', 'aaron', 'ab', 'aba', 'ababa']


In [86]:
all_words = list(set(shakespearean_words + uk_word_list))
print(len(all_words), all_words[:10])

87822 ['lanarkshire', 'mbytes', 'subbasements', 'incarnate', 'charactered', 'pontefract', 'vending', 'liras', 'outfielders', 'pasteurises']


### Anagrams

We'll use the word-lists to calculate all anagrams for a given word.

In [87]:
def get_anagrams(words):
    anagram_dict = {}
    for word in words:
        word_sorted = ''.join(sorted(word)).lower()
        if word_sorted in anagram_dict:
            anagram_dict[word_sorted].append(word)
        else:
            anagram_dict[word_sorted] = [word]
    return anagram_dict

def get_anagrams_from_word(anagram_dict, word):
    word_sorted = ''.join(sorted(word)).lower()
    if word_sorted in anagram_dict:
        return anagram_dict[word_sorted]
    else:
        return []

In [89]:
anagram_dict = get_anagrams(all_words)
len(anagram_dict)

80514

In [90]:
get_anagrams_from_word(anagram_dict, "listen")

['listen', 'silent', 'enlist', 'inlets', 'tinsel']

## Shakespeare's secrets

### Chapter 1: The tombstone cipher

### Reading list

- [Shakespeare's funeral monument (Wikipedia)](https://en.wikipedia.org/wiki/Shakespeare%27s_funerary_monument)
- Alan William Green: _Dee-Coding Shakespeare_, 2011, Act I, Puzzle 1

![shakespeare tombstone](https://upload.wikimedia.org/wikipedia/commons/6/6f/Shakespeare_monument_plaque.JPG)
_Shakespeare's tombstone in Stratford-upon-Avon, from [Wikipedia](https://en.wikipedia.org/wiki/Shakespeare%27s_funerary_monument)_

Some words are written in capital letters on Shakespeare's tombstone within the first two lines of the epitaph, following no obvious rule. This and the fact that the latin text is written in a very unusual way has led to the assumption that  there is a hidden message in the text. The first two lines of the epitaph read:

> **I**VDICIO **P**YLIVM, GENIO **S**OCRATEM, ARTE **M**ARONEM, **T**ERRA TEGIT, POPVLVS MÃ†RET, **O**LYMPVS HABET.

The capitalized letters (here in bold) are "I P S M T O".

A clear case for the anagram searcher! Let's see what we can find:

In [131]:
encoded_word = "IPSMTO"
secret = get_anagrams_from_word(anagram_dict, encoded_word)
print(secret)

['impost']


In [129]:
freemasonry_dict = "https://archive.org/stream/An_Encyclopedia_Of_Freemasonry_1916_Vol_1_-_A_G_Mackey/An_Encyclopedia_Of_Freemasonry_1916_Vol_1_-_A_G_Mackey_djvu.txt"
freemasonry_text = download_text(freemasonry_dict, "data", "freemasonry.txt")


Downloaded https://archive.org/stream/An_Encyclopedia_Of_Freemasonry_1916_Vol_1_-_A_G_Mackey/An_Encyclopedia_Of_Freemasonry_1916_Vol_1_-_A_G_Mackey_djvu.txt to data/freemasonry.txt


In [132]:
def find_quotes(text, keyword):
    indices = []
    i = text.find(keyword)
    while i!=-1:
        indices.append(i)
        i = text.find(keyword, i+1)
    return indices

def print_quotes(text, indices, pre=100, post=100):
    for i in indices:
        print(text[i-pre:i+post])
        print("------")

In [135]:
idx = find_quotes(freemasonry_text, "impost")
print_quotes(freemasonry_text, idx, 150, 150)

ishing 
this Rite and its system of Templar Ma- 
sonry. But he was denounced and expelled 
by the Baron de Hund, who, having proved 
Johnson to be an impostor and charlatan, 
was himself proclaimed Grand Master or 
the German Masons by the congress. See 
Johnson and Hund; also Strict Observance, 
Bi
------
ed to Germany, but 
is not heard of after 1815, when he pub- 
lished at Pyrmont a justification of niin- 
self. Findel (Hist., p. 560,) calls him an 
imposter, but I know not why. He was 
ratner a Masonic fanatic, who was ignorant 
of or had forgotten the wide difference that 
there is between Freem
------
e 
credulity made them his victims. The his- 
tory of Masonry in that century would not 
be complete without a reference to this 

Sri nee of Masonic impostors. To write the 
istory of Masonry in the eighteenth cen- 
tury and to leave out Cagliostro, would be 
like enacting the play of Hamlet and le
------
g up outlines of copperplate engravings 
with India ink, which he sold for p