# Murder Mystery 2

In December 1948 the dead body of a man was found near Adelaide, Australia. To this day his identity is unknown and his death is still a mystery. One of the only clues recovered to what happened is a small piece of paper with the words "Tamam Shud" written on it. The piece has turned out to be torn from the last page of the book "Rubaiyat" by Omar Khayyam. The police managed to find the copy of the book from which the piece was torn. This book had some letters written inside the cover:

**WRGOABABD**

**MLIAOI**

**WTBIMPANETP**

**MLIABOAIAQC**

**ITTMTSAMSTGAB**

The second line seems to have been crossed out. The similarity to the penultimate line could suggest that it was a mistake.

What does this mean? Is it some kind of code?

![Tamam Shud](https://storage.googleapis.com/big-data-course-datasets/Actual-tamam-shud.jpg)

![Code](https://storage.googleapis.com/big-data-course-datasets/SomertonManCode.jpg)

In [1]:
code=["WRGOABABD",
#"MLIAOI",
"WTBIMPANETP",
"MLIABOAIAQC",
"ITTMTSAMSTGAB"]

The goal of this exercise is to look for sentences in the world literature that matches as long parts of the code as possible.

In [2]:
from scipy import stats
import numpy as np
import pandas as pd
import string

books = spark.sparkContext.wholeTextFiles("gs://big-data-course-datasets/gutenberg/")
sentences = books.flatMap(lambda x: x[1].replace("\n", " ").replace("\r", " ").split(".")).map(lambda x: x.strip()).cache()

In [3]:
def sentenceToCode(s):
    return "".join( \
            filter(lambda x: x in string.ascii_uppercase, \
            map(lambda x: x.upper()[0], \
            filter(lambda x: len(x)>0, s.split(" ")))))

One approach to solving this is to use SparkSQL to create a table that we can query with SQL.

In [4]:
from pyspark.sql import Row

df=sentences.map(lambda x: Row(sentence=x, code=sentenceToCode(x))).toDF()

In [5]:
df.sample(False, 0.0001).toPandas()

Unnamed: 0,code,sentence
0,LRCDNMTMM,"""This little rectory CAN do no more than make Mr"
1,WTIAOKODHIWNDYHPTW,Well THAT is an odd kind of delicacy! However...
2,ATCWHGUTLPOMDWAOTP,All the children who had given up their littl...


In [6]:
df.registerTempTable("sentences")

In [7]:
spark.sql("SELECT * FROM sentences LIMIT 10").toPandas()

Unnamed: 0,code,sentence
0,KJBTOTOTKJBTFBOMCGITBGCTHATE,[The King James Bible] The Old Testament of t...
1,ATEWWFAVADWUTFOTD,"1:2 And the earth was without form, and void; ..."
2,ATSOGMUTFOTW,And the Spirit of God moved upon the face of t...
3,AGSLTBLATWL,"1:3 And God said, Let there be light: and ther..."
4,AGSTLTIWGAGDTLFTD,"1:4 And God saw the light, that it was good: a..."
5,AGCTLDATDHCN,"1:5 And God called the light Day, and the dark..."
6,ATEATMWTFD,And the evening and the morning were the first...
7,AGSLTBAFITMOTWALIDTWFTW,"1:6 And God said, Let there be a firmament in ..."
8,AGMTFADTWWWUTFFTWWWATFAIWS,"1:7 And God made the firmament, and divided th..."
9,AGCTFH,1:8 And God called the firmament Heaven


In [8]:
spark.sql("SELECT * FROM sentences WHERE code LIKE 'MLIA%'").toPandas()

Unnamed: 0,code,sentence
0,MLIASDNMAST,"Miss Laura, I am sure, did not mean any such ..."
