![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=Mathematics/CryptographyCyphers/cryptography-cyphers.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

<h1 align='center'>Introduction to Cryptography</h1>

### What Is Cryptography?

Cryptography is about how to communicate so that only our intended readers can get the information. It is the application and study of methods for secure communications in the presence of third parties (called adversaries).

The process of converting an ordinary message (called the _plain text_) into one that cannot be read (called the _cipher text_) is **encryption**. The reverse process of figuring out the plaintext message in a cipher text is called **decryption**.

Cryptography concerns itself with a process by which an encrypted message is made non-readable, but can be made readable by decryption. In the real world there are always adversaries trying to decrypt messages not intended for them, which drives the improvement of both encryption and decryption techniques.

Cryptography is also now used for identity authentication, digital signatures, secure computation, banking, and the processing of online payment transactions.

<h2 align='center'>Modular Arithmetic</h2>

### What Is Modular Arithmetic?

Modular arithmetic is a system of arithmetic for integers. The main idea is that numbers "wrap around" once a certain value, called the **modulus** is reached. 

One example of modular arithmetic is the 12-hour clock. Usual addition suggests that if it is 10:00 now, then 5 hours later it would be 15:00, since $10 + 5 = 15$. But this is not the case, since in a 12-hour clock, the time "wraps around" every 12 hours. So the time is 3:00. In a 12-hour clock, the arithmetic used is **modulo 12**. Another familiar use of modular arithmetic is the modern (Gregorian) calendar system, the arithmetic used is modulo 365 (or 366 during leap years).

Modular arithmetic can be demonstrated visually. Consider the following clockface below. In this clockface, we can work out $6+3$ modulo $7$ by starting at $6$ then moving $3$ spaces clockwise, which bring us to $2$.

<img src="./images/clockface.png" style="width: 300px;"/>

*Clockface picture: Simon Singh's _The Code Book_, Delacorte Press, Figure 47, page 199.*

### Mathematics of Modular Arithmetic

In mathematics, we are familiar with the **equality relation** on the integers. The equality relation tells us that two integers are the same. For example, $5=5$. The equality relation even tells us when two statements are the same, for example, $3+7+1 = 9+2$. Some other relations that we may be familiar with are **inequality relations** such as: $<,\leq,>$, and $\geq$, for example $3 < 5$.

With modular arithmetic, a different type of relation on the integers is used, called a **congruence relation**, denoted by $\equiv$ (like the equals symbol, but with an extra horizontal bar). We will be looking at modular arithmetic with addition and subtraction.

As in our 12-hour clock example, 15:00 is the same as 3:00. This we would write mathematically as:

$$15 \equiv 3 \pmod{12}.$$

Here, we say that $15$ and $3$ are congruent modulo $12$.

<h2 align='center'>Caesar Cipher</h2>

### A Brief History & Introduction

The Caesar cipher is named after Julius Caesar, who, according to Suetonius, used it to protect messages of military significance. It is not known as to how effective the Caesar cipher was during its time, but it was likely reasonably secure since most of Caesar's enemies were illiterate and would assume that encrypted letters were written in a foreign language.

In cryptography, the **Caesar Cipher** is also known as the **Shift Cipher**, which is one of the simplest encryption techniques. It is a type of **Substitution Cipher** in which each letter is replaced by some fixed number of positions down in the alphabet. In more generality, a Substitution Cipher is any method of encryption by which plain text characters are replaced with cipher text according to some fixed system: either to single letters, pairs of letters, triplets, and so on.

### How Is Modular Arithmetic Used in the Caesar Cipher?

In the Caesar cipher, modular arithmetic can be used by first transforming the letters into numbers.

In [None]:
import string
alphabet_list = list(string.ascii_uppercase)
print("Letter      Number")
for i in range(0,26):
    print( " "*2 + alphabet_list[i] + " "*4 + " → " + " "*4 + "{:2d}".format(i+1))

Encryption is then defined mathematically as a function $E$ with two inputs $n$ and $x$, where $n$ is the shift and $x$ is the numeric representation of a letter. We write:

$$E(n,x) = (x+n) \pmod{26}$$

We can represent the transformation by aligning two alphabets: the plain alphabet and the cipher alphabet. The cipher alphabet is simply the plain alphabet shifted by some number of positions. For instance, with a shift of three, `a` becomes `d`, `b` becomes `e`, `c` becomes `f`, and so on. If we were to encode the word `CAESAR` with the Caesar cipher with a shift of three, then `CAESAR` is encoded as `FDHVDU`.

In [None]:
from ipywidgets import interact
def Caesar_cipher_map_display(shift):
    unshifted_alphabet = list(range(1,27))
    shifted_alphabet = list(map(lambda x: (x + shift) % 26, unshifted_alphabet))
    zero_index = shifted_alphabet.index(0)
    shifted_alphabet[zero_index] = 26
    print("Plaintext     Ciphertext")
    for i in range(26):
        alphabet_maps = " "*3 + str(alphabet_list[i]) + " "*6 +  " → " + " "*6 + alphabet_list[shifted_alphabet[i-1] % 26]
        print(alphabet_maps) 
interact(Caesar_cipher_map_display, shift=(0,25));

To decrypt, we can use $(x - n)\pmod{26}$ where $x$ is a number that represents the letter to be decrypted, and $n$ is the amount of shift.

### Breaking the Caesar Cipher

Suppose we encountered the following encrypted message:

```
TIPGKFXIRGYP FI TIPGKFCFXP ZJ KYV GIRTKZTV REU JKLUP FW KVTYEZHLVJ WFI JVTLIV TFDDLEZTRKZFE ZE KYV GIVJVETV FW KYZIU GRIKZVJ TRCCVU RUMVIJRIZVJ.[2] DFIV XVEVIRCCP, TIPGKFXIRGYP ZJ RSFLK TFEJKILTKZEX REU RERCPQZEX GIFKFTFCJ KYRK GIVMVEK KYZIU GRIKZVJ FI KYV GLSCZT WIFD IVRUZEX GIZMRKV DVJJRXVJ;[3] MRIZFLJ RJGVTKJ ZE ZEWFIDRKZFE JVTLIZKP JLTY RJ URKR TFEWZUVEKZRCZKP, URKR ZEKVXIZKP, RLKYVEKZTRKZFE, REU EFE-IVGLUZRKZFE[4] RIV TVEKIRC KF DFUVIE TIPGKFXIRGYP. DFUVIE TIPGKFXIRGYP VOZJKJ RK KYV ZEKVIJVTKZFE FW KYV UZJTZGCZEVJ FW DRKYVDRKZTJ, TFDGLKVI JTZVETV, VCVTKIZTRC VEXZEVVIZEX, TFDDLEZTRKZFE JTZVETV, REU GYPJZTJ. RGGCZTRKZFEJ FW TIPGKFXIRGYP ZETCLUV VCVTKIFEZT TFDDVITV, TYZG-SRJVU GRPDVEK TRIUJ, UZXZKRC TLIIVETZVJ, TFDGLKVI GRJJNFIUJ, REU DZCZKRIP TFDDLEZTRKZFEJ.
```

A Caesar cipher is easily broken. We can consider two situations:

#### Situation 1: The adversary knows that a Caesar cipher is in use but does not know the shift value.

In this case, the adversary simply needs to try different shift values until the ciphertext message is decrypted. At the worst, this requires trying only $26$ different shift values.

In [None]:
message = 'TIPGKFXIRGYP FI TIPGKFCFXP ZJ KYV GIRTKZTV REU JKLUP FW KVTYEZHLVJ WFI JVTLIV TFDDLEZTRKZFE ZE KYV GIVJVETV FW KYZIU GRIKZVJ TRCCVU RUMVIJRIZVJ.[2] DFIV XVEVIRCCP, TIPGKFXIRGYP ZJ RSFLK TFEJKILTKZEX REU RERCPQZEX GIFKFTFCJ KYRK GIVMVEK KYZIU GRIKZVJ FI KYV GLSCZT WIFD IVRUZEX GIZMRKV DVJJRXVJ;[3] MRIZFLJ RJGVTKJ ZE ZEWFIDRKZFE JVTLIZKP JLTY RJ URKR TFEWZUVEKZRCZKP, URKR ZEKVXIZKP, RLKYVEKZTRKZFE, REU EFE-IVGLUZRKZFE[4] RIV TVEKIRC KF DFUVIE TIPGKFXIRGYP. DFUVIE TIPGKFXIRGYP VOZJKJ RK KYV ZEKVIJVTKZFE FW KYV UZJTZGCZEVJ FW DRKYVDRKZTJ, TFDGLKVI JTZVETV, VCVTKIZTRC VEXZEVVIZEX, TFDDLEZTRKZFE JTZVETV, REU GYPJZTJ. RGGCZTRKZFEJ FW TIPGKFXIRGYP ZETCLUV VCVTKIFEZT TFDDVITV, TYZG-SRJVU GRPDVEK TRIUJ, UZXZKRC TLIIVETZVJ, TFDGLKVI GRJJNFIUJ, REU DZCZKRIP TFDDLEZTRKZFEJ.'

from spellchecker import SpellChecker
spell = SpellChecker()
for i in range(26):
    unencrypted = ''
    for n in message:
        if n in string.ascii_uppercase:
            x = ord(n) - ord('A')
            new_index = (x + i) % 26
            new_character = chr(new_index + ord('A'))
            unencrypted = unencrypted + new_character
        else:
            unencrypted = unencrypted + n
    # spell check the first "word"
    first_word = unencrypted[:unencrypted.find(' ')].lower()
    try:
        if list(spell.known([first_word]))[0] == first_word:
            print('Unshift =', i)
            print(unencrypted)
    except:
        pass

#### Situation 2: The adversary knows a simple substitution cipher is used but not specifically that it is a Caesar scheme.

In this case, the adversary may apply a frequency analysis. There is a distinct and predictable distribution of letters in a typical sample of text in English. A Caesar shift "shifts" this distribution, and from here it simply remains for the adversary to find the shift used.

<h2 align='center'>Frequency Analysis Overview</h2>

In the subject of cryptography, frequency analysis is the study of how frequent letters or group of letters appear in a text. 

The main idea behind frequency analysis is that letters in any language have a frequency with which they appear in a language. For instance we would generally find that the letter 'Z' appears less frequently than 'A' or 'E'. 

If we wanted to find the frequencies of letters within a given language, we would need to sample many articles, books, and other media and count the number of times each letter occurs to find their frequency. For most languages, this has already been done - there are databases of letter frequencies which have looked at millions of texts, making for an accurate estimate of how frequent a letter occurs within a given language.

From these databases, the relative frequency of letters in the English language can be observed below in a bar plot that we will generate from this [Wikipedia article](https://en.wikipedia.org/wiki/Letter_frequency)).

In [None]:
import pandas as pd
import plotly.express as px
letter_frequencies = pd.read_html('https://en.wikipedia.org/wiki/Letter_frequency')[0]
letter_frequencies.columns = ['Letter','Texts','ignore','Dictionaries','ignore2']
letter_frequencies.drop(['ignore','ignore2','Dictionaries'], axis=1, inplace=True)
letter_frequencies['Texts'] = letter_frequencies['Texts'].str[:-1].astype('float')
fig = px.bar(letter_frequencies, x='Letter', y='Texts', title='Letter Frequencies in English Texts')
fig.show()

We can order the letters by their frequencies.

In [None]:
fig1 = px.bar(letter_frequencies.sort_values('Texts', ascending=False), x='Letter', y='Texts', title='Sorted Letter Frequencies in English Texts')
fig1.show()

We find that 'E' occurs most frequently, appearing over 12% of the time, with the next most common being 'T' at around 9% the time.

Againw we will look at the following encrypted message:

```
TIPGKFXIRGYP FI TIPGKFCFXP ZJ KYV GIRTKZTV REU JKLUP FW KVTYEZHLVJ WFI JVTLIV TFDDLEZTRKZFE ZE KYV GIVJVETV FW KYZIU GRIKZVJ TRCCVU RUMVIJRIZVJ.[2] DFIV XVEVIRCCP, TIPGKFXIRGYP ZJ RSFLK TFEJKILTKZEX REU RERCPQZEX GIFKFTFCJ KYRK GIVMVEK KYZIU GRIKZVJ FI KYV GLSCZT WIFD IVRUZEX GIZMRKV DVJJRXVJ;[3] MRIZFLJ RJGVTKJ ZE ZEWFIDRKZFE JVTLIZKP JLTY RJ URKR TFEWZUVEKZRCZKP, URKR ZEKVXIZKP, RLKYVEKZTRKZFE, REU EFE-IVGLUZRKZFE[4] RIV TVEKIRC KF DFUVIE TIPGKFXIRGYP. DFUVIE TIPGKFXIRGYP VOZJKJ RK KYV ZEKVIJVTKZFE FW KYV UZJTZGCZEVJ FW DRKYVDRKZTJ, TFDGLKVI JTZVETV, VCVTKIZTRC VEXZEVVIZEX, TFDDLEZTRKZFE JTZVETV, REU GYPJZTJ. RGGCZTRKZFEJ FW TIPGKFXIRGYP ZETCLUV VCVTKIFEZT TFDDVITV, TYZG-SRJVU GRPDVEK TRIUJ, UZXZKRC TLIIVETZVJ, TFDGLKVI GRJJNFIUJ, REU DZCZKRIP TFDDLEZTRKZFEJ.
```

We can create a frequency analysis plot of this message:

In [None]:
message = 'TIPGKFXIRGYP FI TIPGKFCFXP ZJ KYV GIRTKZTV REU JKLUP FW KVTYEZHLVJ WFI JVTLIV TFDDLEZTRKZFE ZE KYV GIVJVETV FW KYZIU GRIKZVJ TRCCVU RUMVIJRIZVJ.[2] DFIV XVEVIRCCP, TIPGKFXIRGYP ZJ RSFLK TFEJKILTKZEX REU RERCPQZEX GIFKFTFCJ KYRK GIVMVEK KYZIU GRIKZVJ FI KYV GLSCZT WIFD IVRUZEX GIZMRKV DVJJRXVJ;[3] MRIZFLJ RJGVTKJ ZE ZEWFIDRKZFE JVTLIZKP JLTY RJ URKR TFEWZUVEKZRCZKP, URKR ZEKVXIZKP, RLKYVEKZTRKZFE, REU EFE-IVGLUZRKZFE[4] RIV TVEKIRC KF DFUVIE TIPGKFXIRGYP. DFUVIE TIPGKFXIRGYP VOZJKJ RK KYV ZEKVIJVTKZFE FW KYV UZJTZGCZEVJ FW DRKYVDRKZTJ, TFDGLKVI JTZVETV, VCVTKIZTRC VEXZEVVIZEX, TFDDLEZTRKZFE JTZVETV, REU GYPJZTJ. RGGCZTRKZFEJ FW TIPGKFXIRGYP ZETCLUV VCVTKIFEZT TFDDVITV, TYZG-SRJVU GRPDVEK TRIUJ, UZXZKRC TLIIVETZVJ, TFDGLKVI GRJJNFIUJ, REU DZCZKRIP TFDDLEZTRKZFEJ.'

frequencies = []
for letter in string.ascii_uppercase:
    count = 0
    for character in message:
        if character == letter:
            count += 1
    frequencies.append(count)

message_letter_frequencies = pd.DataFrame(list(string.ascii_uppercase), columns=['Letter'])
message_letter_frequencies['Frequency'] = frequencies
fig2 = px.bar(message_letter_frequencies.sort_values('Frequency', ascending=False), x='Letter', y='Frequency', title='Message Letter Frequencies')
fig2.show()

In this encrypted message, we see that the most common letter is 'V'. We can guess that 'V' was used to encrypt 'E'. We can compare the frequency of letters in the encrypted message with the usual frequency of letters in English by stacking the bar plots together. But we must be cautious - not every text has exactly the same frequency. 'V' could possibly be 'T','A', or 'O', as these characters have high frequencies as well.

In [None]:
fig1.show()
fig2.show()


If we proceed with replacing the letters ordered from the most frequent to the least frequent from the encrypted text to their usual frequency in the English language, we may get the incorrect message if it is not the case that the frequency of letters from our encrypted text does not follow the same order. 

In using frequency analysis, we may need to consider some other patterns in the language. For instance, in English, the only single lettered words are 'A' and 'I'. So we may start by assuming that every time we encounter a single letter in the encrypted message, they are likely to be 'A' or 'I'. Some other common word that appear is 'THE', so whenever we encounter a three letter word, it is reasonable to substitute the order of their letters with the order of letters in 'THE'. 

In the encrypted message, we have the three lettered word 'KYV', we can assume that 'K' is 'T', 'Y' is 'H', and 'V' is 'E' (and we have even more reason to confirm this, since 'V' is the most common letter in the encrypted text).

Going in the manner of considering the statistics of how common certain English words are used, we are able to decrypt the entire text. In fact, it is encrypted using the Caesar cipher with a shift of 19, which we my have tested by counting how many letters it takes to shift 'E' to 'V'. We also notice that the order frequency of letters in the encrypted text does not follow exactly the same order as that in the English language (but it is close to matching the usual frequency).

The decrypted message is:
```
'CRYPTOGRAPHY OR CRYPTOLOGY IS THE PRACTICE AND STUDY OF TECHNIQUES FOR SECURE COMMUNICATION IN THE PRESENCE OF THIRD PARTIES CALLED ADVERSARIES.[2] MORE GENERALLY, CRYPTOGRAPHY IS ABOUT CONSTRUCTING AND ANALYZING PROTOCOLS THAT PREVENT THIRD PARTIES OR THE PUBLIC FROM READING PRIVATE MESSAGES;[3] VARIOUS ASPECTS IN INFORMATION SECURITY SUCH AS DATA CONFIDENTIALITY, DATA INTEGRITY, AUTHENTICATION, AND NON-REPUDIATION[4] ARE CENTRAL TO MODERN CRYPTOGRAPHY. MODERN CRYPTOGRAPHY EXISTS AT THE INTERSECTION OF THE DISCIPLINES OF MATHEMATICS, COMPUTER SCIENCE, ELECTRICAL ENGINEERING, COMMUNICATION SCIENCE, AND PHYSICS. APPLICATIONS OF CRYPTOGRAPHY INCLUDE ELECTRONIC COMMERCE, CHIP-BASED PAYMENT CARDS, DIGITAL CURRENCIES, COMPUTER PASSWORDS, AND MILITARY COMMUNICATIONS.'
```

### Interactive: Visualizing the Shift Cypher and Using Frequency Analysis

In the following interactive, you are to take a text of your choice (preferably at least two or three paragraphs long in English) and you are to choose a shift of your choice to encrypt it using the Caesar cipher. Observe that as you change the number of shifts, with knowledge of the frequency of letters in the English language, we may be able to guess the shift that was used by inspecting the bar plots visually and how their frequencies compare with the usual frequency of letters in the English language.

In [None]:
shift = 7

my_message = '''
Replace this text between the quotation marks with your message, preferably at least two or three paragraphs long in English.

'''


my_encrypted_message = ''
for n in my_message.upper():
    if n in string.ascii_uppercase:
        x = ord(n) - ord('A')
        new_index = (x + shift) % 26
        new_character = chr(new_index + ord('A'))
        my_encrypted_message = my_encrypted_message + new_character
    else:
        my_encrypted_message = my_encrypted_message + n
print(my_encrypted_message)

frequencies = []
for letter in string.ascii_uppercase:
    count = 0
    for character in my_encrypted_message:
        if character == letter:
            count += 1
    frequencies.append(count)
message_letter_frequencies = pd.DataFrame(list(string.ascii_uppercase), columns=['Letter'])
message_letter_frequencies['Frequency'] = frequencies
px.bar(message_letter_frequencies.sort_values('Frequency', ascending=False), x='Letter', y='Frequency', title='Message Letter Frequencies')

## Conclusion

In this notebook we looked at a basic type of [cryptography](https://en.wikipedia.org/wiki/Cryptography), the substitution cypher, and possible ways to use to Python to encrypt and decrypt substitution cyphers.

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)