# Cracking the Enigma Using Apache Spark

The Enigma machine was a configurable device used by the Germans during World War II to encrypt confidential
information. Although Nazi Germany placed great faith in the capabilities of the machine, it suffered from a few design
flaws that the Allies managed to exploit, enabling them to read secret information. The story of deciphering the Enigma
has been depicted in the film "The Imitation Game."

In this post, I will demonstrate how I cracked the Enigma machine using Apache Spark. This framework is widely used in
big data analysis because it allows for distributed calculations and storage of data. This way, one can execute a
massive amount of calculations on different machines, which can dramatically speed up the overall execution time. The
process of cracking the Enigma involves several number crunching steps that can be executed in parallel. Because of
this, Spark is an ideal tool to speed up the process.

## The Overall Approach for Cracking the Enigma

In this post, I won't go into detail about the underlying workings of the Enigma machine or the algorithms that can be
used to crack it. In short, the Enigma machine is a device that scrambles input letters to produce different output
letters. With each button press, the mapping changes. This, combined with the fact that the Enigma machine could be
configured in 158,962,555,217,826,360,000 different ways, led the Germans to believe that it was uncrackable. However,
the fact that each input letter differs from the output makes it relatively easy to align a "crib," a short known piece
of the original text, with the encrypted text. Numberphile has some
informative [videos](https://www.youtube.com/watch?v=G2_Q9FoD-oQ) about the Enigma in which they explain the algorithms
involved in breaking the Enigma. I implemented these algorithms and an Enigma emulator in a separate Python package,
which I used in a standalone script to break the Enigma with parallel computation on Databricks (sort of Spark as a
service). All the code for this project is available on Github.



### Setting up the infrastructure
In this project I use Databricks as the Spark provider. To set up the Databricks workspace in azure cloud I use
[Terraform](https://www.terraform.io/). Assuming that terraform is installed, enter the terraform folder and run:
```commandline
terraform init
terraform apply
```
After the whole infrastructure is created, which will probably take a few minutes, you can navigate to
the databricks workspace by clicking on the URL that is outputed in the terminal.
The wokspace if preconfigured with a single node cluster with the engima-emulator installed and this notebook that you can run interactively


### Cracking the Enigma using a crib
We start with encrypting a message using the emulator with a configured Enigma machine that we are going to crack.
In this project we asume that we have a crib, that is, a decipherd/original piece of the encrypted text. In reality, these cribs were obtained either by informants or simply because they where trivial and included in daily messages. In this notebook I assume that we no the alignment of the crib with the cipher text, it is just the first piece of the text.

In [0]:
import pyspark.sql.functions as F
from typing import List
# from enigma.crack import get_possible_rotor_settings, PlugBoardResolver, InvalidSettings
from enigma.emulator import Enigma
from pyspark.sql.session import SparkSession
from pyspark.sql import Row
from functools import partial


# build spark session
spark = SparkSession.builder.getOrCreate()

crib = "WettervorhersageXXXfurxdiexRegionXXXOstXXXMoskau".upper()
complete_message = f"{crib}xxxextremxKaltexxxTemperaturenxxxundxxxstarkerxxxSchneefallxxxseixxxgewarnt"

e = Enigma(rotor_types="II,III,I", ring_settings="I,A,A", plugboard_pairs="CU,DL,EP,KN,MO,XZ")
cypher = e.encrypt(crib)
print(cypher)

[0;31m---------------------------------------------------------------------------[0m
[0;31mModuleNotFoundError[0m                       Traceback (most recent call last)
File [0;32m<command-831146424884685>, line 4[0m
[1;32m      2[0m [38;5;28;01mfrom[39;00m [38;5;21;01mtyping[39;00m [38;5;28;01mimport[39;00m List
[1;32m      3[0m [38;5;66;03m# from enigma.crack import get_possible_rotor_settings, PlugBoardResolver, InvalidSettings[39;00m
[0;32m----> 4[0m [38;5;28;01mfrom[39;00m [38;5;21;01menigma[39;00m[38;5;21;01m.[39;00m[38;5;21;01memulator[39;00m [38;5;28;01mimport[39;00m Enigma
[1;32m      5[0m [38;5;28;01mfrom[39;00m [38;5;21;01mpyspark[39;00m[38;5;21;01m.[39;00m[38;5;21;01msql[39;00m[38;5;21;01m.[39;00m[38;5;21;01msession[39;00m [38;5;28;01mimport[39;00m SparkSession
[1;32m      6[0m [38;5;28;01mfrom[39;00m [38;5;21;01mpyspark[39;00m[38;5;21;01m.[39;00m[38;5;21;01msql[39;00m [38;5;28;01mimport[39;00m Row

[0;31mModuleNot