# 3.3 Hill's System

## Exercises 3.3

In [1]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path().absolute().parent))

import numpy as np
import numpy.typing as npt
from src.hills_system import matrix_inverse_mod26, validate_hills_key, is_invertible_mod_26, TopResults
from src.helpers import CHARACTERS, strip_text, format_ciphertext, format_plaintext, pos, char_at
from src.ngrams_data.top_ngrams import TOP_ENGLISH_QUINTGRAMS
from src.ngrams_scorer import NgramScorer

### 5. Write a computer program to both encipher and decipher a message using Hill's (digraph) System.

We implement the general case which includes $2\times2$ (digraph).  See [hills_system.py](https://github.com/dandoug/cryptomath-book/blob/main/src/hills_system.py) for the validation logic and the computation of the inverse key.

In [2]:
def hills_transform(key: npt.NDArray[np.int_], input_str: str) -> str:
    """
    This is the heart of the Hill's cipher system.  It takes a key and a string and returns the encrypted/decrypted string.
    No validation or cleanup is done here.  This method is intended to be wrapped by other code that does any
    necessary checking or preparation.
    """
    n = key.shape[0]
    output_chars = []
    for i in range(0, len(input_str), n):
        input_vector = np.array([[pos(c)] for c in input_str[i:i + n]])
        output_vector = np.dot(key, input_vector) % 26
        for c in output_vector:
            output_chars.append(char_at(int(c[0])))
    return ''.join(output_chars)


class HillsCipher:
    def __init__(self, key: np.ndarray, pad_char: str = 'q'):
        validate_hills_key(key)
        self.key = key
        self.n = self.key.shape[0]
        try:
            self.inv_key = matrix_inverse_mod26(self.key)
        except ValueError as e:
            raise ValueError("Key is not invertible mod 26") from e
        self.pad_char = pad_char.strip().lower()
        if len(self.pad_char) != 1 or self.pad_char not in CHARACTERS:
            raise ValueError("pad_char must be a single character")

    def encipher(self, plaintext: str) -> str:
        """
        Encipher a message using Hill's cipher system.
        """
        plaintext = strip_text(plaintext)
        #  pad plaintext if the length is a multiple of n
        padding_length = (self.n - (len(plaintext) % self.n)) % self.n
        plaintext += self.pad_char * padding_length
        ciphertext = hills_transform(self.key, plaintext)
        return ciphertext.upper()

    def decipher(self, ciphertext: str) -> str:
        """
        Deciphers the given ciphertext using Hill cipher decryption.
        """
        ciphertext = strip_text(ciphertext)
        if len(ciphertext) % self.n != 0:
            raise ValueError("Ciphertext length must be a multiple of n")
        return hills_transform(self.inv_key, ciphertext)



### 1. Using Hill's System with $key = \begin{pmatrix} 6 & 3 \\ 7 & 8 \end{pmatrix}$ encipher the message:
```
 It is lonely at the top; but you eat better.
```

In [3]:
cipher_1 = HillsCipher(np.array([[6, 3], [7, 8]]))
plaintext_1 = "It is lonely at the top; but you eat better."

In [4]:
ciphertext_1 = cipher_1.encipher(plaintext_1)
print(format_ciphertext(ciphertext_1))

JOGGM VUHQX NKNVL MHYWZ MBWMG QVZLM EXCB


### 2. Decipher the following message that was enciphered using Hill's System with $key = \begin{pmatrix} 3 & 2 \\ 8 & 5 \end{pmatrix}$
```
 MUBYA QIQGN AEWOS RZQJI RZQKC LIZAG SXCJA AQFRM HO
```

In [5]:
cipher_2 = HillsCipher(np.array([[3, 2], [8, 5]]))
ciphertext_2 = "MUBYA QIQGN AEWOS RZQJI RZQKC LIZAG SXCJA AQFRM HO"

In [6]:
plaintext_2 = cipher_2.decipher(ciphertext_2)
print(format_plaintext(plaintext_2))

consciousnessisthatannoyingtimebetweennaps


which yields
```
  consciousness is that annoying time between naps
```

### 4. Consider extending Hill's (digraph) System to a trigraph system. That is, rather than enciphering and deciphering characters in pairs, you wish to encipher them three at a time. Describe how you would proceed. Be sure to consider whatever restrictions are necessary to impose on the key to assure that a message is decipherable. (Hint: See Exercises 3.2, #9.)

The determinent of the $3 \times 3$ matrix must be coprime to 26.  Plaintext must be padded to multiple of 3.  See general implementation above which will work for trigraph systems.

###  3. Molly has intercepted the following message that was enciphered using Hill's System (digraph version):

```
 KFHYY GIGMC EJSST EBOEU GRWJT SDVYK ZOZLI ZKFHX KUUIC WXFWJ
 GAXQP BQAGV GXDVD GUEVG MIGYK QQPIP SCLLF YPMUL KFHXP MHGME
 VDKAV YQCEG UEALY YYZSZ MPXZO CTXTR IMDID VDGSX OZFFT SMEDV
 MEIMD VMPKO UJKOD UBOAX BOORS LPZCW IMDVY GJWMI FQ
```

Help her cryptanalyze and decipher it.

We base our attack on a message encrypted with Hill's System on the assumption that English text will contain common quintgrams.  Finding two pairs of digraphs is enough to determine the decrypt key.  So we "slide" common quintgrams across the cipher text, identify the two pairs of plaintext-ciphertext digrams implied at each position as the quintgraph moves along the ciphertext.

With those pairs of digraphs, we can solve for a candidate decryption key.  With this candidate encryption key, we can "decrypt" the ciphertext to get a possible plaintext.   To automate the process of checking each possible plaintext for  being an ineligible message, we compute an ngram score.  The higher the score, the more like English is the candidate plaintext.  We save the best scoring candidates to investigate manually.  Hopefully, one of those is actual plaintext.

For more details, see [here](http://practicalcryptography.com/cryptanalysis/stochastic-searching/cryptanalysis-hill-cipher/)

In [7]:
ciphertext_3 = "KFHYY GIGMC EJSST EBOEU GRWJT SDVYK ZOZLI ZKFHX KUUIC WXFWJ " + \
               "GAXQP BQAGV GXDVD GUEVG MIGYK QQPIP SCLLF YPMUL KFHXP MHGME " + \
               "VDKAV YQCEG UEALY YYZSZ MPXZO CTXTR IMDID VDGSX OZFFT SMEDV " + \
               "MEIMD VMPKO UJKOD UBOAX BOORS LPZCW IMDVY GJWMI FQ"
ciphertext_3 = strip_text(ciphertext_3).upper() # remove the non-letter chars keep uppercase

In [14]:
decrypt_keys_examined = set()
top_results = TopResults(max_size=100)  # only keep the top results
scorer = NgramScorer('english_quintgrams.txt.zip')
for quint in TOP_ENGLISH_QUINTGRAMS:
    for i in range(len(ciphertext_3) - 5):
        # Depending on whether the window is on an even boundary, we lie up either the
        # first four characters or the last four characters of the quint
        if i % 2 == 0:
            p_dg_1 = quint[0:2]
            p_dg_2 = quint[2:4]
            c_dg_1 = ciphertext_3[i:i+2]
            c_dg_2 = ciphertext_3[i+2:i+4]
        else:
            p_dg_1 = quint[1:3]
            p_dg_2 = quint[3:5]
            c_dg_1 = ciphertext_3[i+1:i+3]
            c_dg_2 = ciphertext_3[i+3:i+5]
        # Given the candidate digraph pairs, solve for the candidate decrypt key
        c_col_1 = np.array([[pos(c_dg_1[0])], [pos(c_dg_1[1])]])
        c_col_2 = np.array([[pos(c_dg_2[0])], [pos(c_dg_2[1])]])
        c_matrix = np.hstack((c_col_1, c_col_2))
        # need to check if c_matrix is invertible
        if not is_invertible_mod_26(c_matrix):
            continue   # no need to check more
        c_matrix_inv = matrix_inverse_mod26(c_matrix)
        # set up for solving for the decryption key
        p_col_1 = np.array([[pos(p_dg_1[0])], [pos(p_dg_1[1])]])
        p_col_2 = np.array([[pos(p_dg_2[0])], [pos(p_dg_2[1])]])
        p_matrix = np.hstack((p_col_1, p_col_2))
        # compute the decryption key implied by this pair choice
        inv_key = np.dot(p_matrix, c_matrix_inv) % 26
        inv_key_tuple = tuple(inv_key.flatten())
        if inv_key_tuple in decrypt_keys_examined:
            continue  # no need to check a key twice
        else:
            decrypt_keys_examined.add(inv_key_tuple)
        # Now with the candidate decryption key, decrypt the text and score the key
        ptext = hills_transform(inv_key, ciphertext_3)
        score = scorer.score(ptext.upper())
        top_results.add(score, inv_key_tuple, ptext)  # add to heap, will only save teh top ones

# After all that, print out the top results and see if any look viable
best_results = top_results.get_best()
for rank, result in enumerate(best_results, start=1):
    print(f"Rank: {rank} Score: {result.score:.4f}, Key: {result.key}")
    print("Plaintext:")
    print(format_plaintext(result.plaintext))
    print("\n")


Rank: 1 Score: -1683.0868, Key: (np.int64(7), np.int64(10), np.int64(6), np.int64(1))
Plaintext:
gttukaciqcenkchuhakyuharrintyetoplkbgtjtaiceqottargqzebtyyil
cnntteoapiyimomesnorgmvffiheghgtjthevckelfionasaakoawrisstcj
qplneopnhhkongntteihalxprikentkekontqpscmfscdshamdhaydsvhrqo
kontkaneyida


Rank: 2 Score: -1743.3555, Key: (np.int64(6), np.int64(1), np.int64(13), np.int64(17))
Plaintext:
tkuiabibclnacxugauyfhgraiktjeroulvbmtktrifejontxraqdecthydlw
netjeoatioijovepnyrymlfviiemhitktremcoetfpodaialkbatrisvtmjm
pynzolnrhtozgwtjeohelmpxikettjetoztjpychfachssaudeaudgvirzon
oztjabeaijac


Rank: 3 Score: -1748.6046, Key: (np.int64(15), np.int64(21), np.int64(6), np.int64(1))
Plaintext:
etuubavixcynhcougavyohirwibthecorlebetzthijehortirvqoevtpyul
knbtyedaiitifobesncrjmpfqiseuhetztsegcnexfdowafankdagrpsktyj
kpvnboxnbhroogbtyeihqlhpwinebtnerobtkplceflcgsgaydgaedqvfrho
robtbaietiea


Rank: 4 Score: -1759.8724, Key: (np.int64(18), np.int64(20), np.int64(19), np.int64(14))
Plaintext:
fgthrapihcdat