
The task of decrypting a message using RNN.
You are given messages encrypted with a Caesar cipher, which is one of the simplest ciphers in cryptography.
 

The Caesar cipher works like this: each letter
of the original alphabet is shifted by K characters to the right:

Let us be given a message: message="RNN IS NOT AI", then our encryption performed according to the rule f, with K=2, will give us the result:
f(message, K) = TPPAKUAPQVACK

For convenience, we can say that all letters of a non-English alphabet will be marked as a dash "-".

In [1]:
import random
import torch
import torch.nn as nn
import torch.nn.functional as F


In [2]:
# Define key and vocabular
key = 2
vocab = [char for char in ' -ABCDEFGHIJKLMNOPQRSTUVWXYZ']

In [3]:
def encrypt(text, key):
    """Returns the encrypted form of 'text'."""
    indexes = [vocab.index(char) for char in text]
    encrypted_indexes = [(idx + key) % len(vocab) for idx in indexes]
    encrypted_chars = [vocab[idx] for idx in encrypted_indexes]
    encrypted = ''.join(encrypted_chars)
    return encrypted

print(encrypt('RNN IS NOT AI', key))

TPPAKUAPQVACK


Now we need to generate a dataset for solving the supervised learning problem. Our dataset can be randomly encrypted phrases, and then its structure will be as follows:
message --- encrypted message

This is an example of a parallel corpus from NLP.

But we need to represent each letter as its number in the dictionary in order to use the Embedding layer further.

For simplicity, let's assume that all strings have the same *seq_len* length

In [4]:
num_examples = 256 # dataset size
seq_len = 18 


def encrypted_dataset(dataset_len, k):
    """
    Return: List(Tuple(Tensor encrypted, Tensor source))
    """
    dataset = []
    for x in range(dataset_len):
        random_message  = ''.join([random.choice(vocab) for x in range(seq_len)])
        encrypt_random_message = encrypt(''.join(random_message), k)
        src = [vocab.index(x) for x in random_message]
        tgt = [vocab.index(x) for x in encrypt_random_message]
        dataset.append([torch.tensor(tgt), torch.tensor(src)])
    return dataset

**Pytorch RNN:**
$$h_t = \text{tanh}(w_{ih} x_t + b_{ih} + w_{hh} h_{(t-1)} + b_{hh})$$

**where : $h_t$ is the hidden state at time $t$, $x_t$ is
    the input at time $t$, and $h_{(t-1)}$ is the hidden state of the
    previous layer at time $t-1$ or the initial hidden state at time $0$.**
    
Args: 

        input_size: The number of expected features in the input $x$
        hidden_size: The number of features in the hidden state $h$
        num_layers: Number of recurrent layers. E.g., setting

In [5]:
class Decipher(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, 
                 rnn_type='simple'):
        """
        :params: int vocab_size 
        :params: int embedding_dim
        :params
        """
        super(Decipher, self).__init__()
        self.embed = nn.Embedding(vocab_size, embedding_dim)
        if rnn_type == 'simple':
            self.rnn = nn.RNN(embedding_dim, hidden_dim, num_layers = 2)
         
        self.fc = nn.Linear(hidden_dim, vocab_size)
        self.initial_hidden = torch.zeros(2, 1, hidden_dim)

        
    def forward(self, cipher):
        # CHECK INPUT SIZE
        # Unsqueeze 1 dimension for batches
        embd_x = self.embed(cipher).unsqueeze(1)
        out_rnn, hidden = self.rnn(embd_x, self.initial_hidden)
        # Apply the affine transform and transpose output in appropriate way
        # because you want to get the softmax on vocabulary dimension
        # in order to get probability of every letter
        return self.fc(out_rnn).transpose(1, 2)
      

In [6]:
# set model parameters
embedding_dim = 5
hidden_dim = 10
vocab_size = len(vocab) 
lr = 1e-3

criterion = torch.nn.CrossEntropyLoss()

# Initialize model
model = Decipher(vocab_size, embedding_dim, hidden_dim)

# Initialize optimizer: Adam is recommended
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)

num_epochs = 10

In [7]:
k = 10
for x in range(num_epochs):
    print('Epoch: {}'.format(x))
    for encrypted, original in encrypted_dataset(num_examples, k):

        scores = model(encrypted)
        original = original.unsqueeze(1)
        # Calculate loss
        loss = criterion(scores, original)
        # Zero grads
        optimizer.zero_grad()
        # Backpropagate
        loss.backward()
        # Update weights
        optimizer.step()
    print('Loss: {:6.4f}'.format(loss.item()))

    with torch.no_grad():
        matches, total = 0, 0
        for encrypted, original in encrypted_dataset(num_examples, k):
            # Compute a softmax over the outputs
            predictions = F.softmax(model(encrypted), 1)
            # Choose the character with the maximum probability (greedy decoding)
            _, batch_out = predictions.max(dim=1)
            # Remove batch
            batch_out = batch_out.squeeze(1)
            # Calculate accuracy
            matches += torch.eq(batch_out, original).sum().item()
            total += torch.numel(batch_out)
        accuracy = matches / total
        print('Accuracy: {:4.2f}%'.format(accuracy * 100))

Epoch: 0
Loss: 2.5849
Accuracy: 30.97%
Epoch: 1
Loss: 1.8316
Accuracy: 70.72%
Epoch: 2
Loss: 1.3102
Accuracy: 89.80%
Epoch: 3
Loss: 0.8673
Accuracy: 98.61%
Epoch: 4
Loss: 0.5550
Accuracy: 100.00%
Epoch: 5
Loss: 0.4002
Accuracy: 100.00%
Epoch: 6
Loss: 0.3031
Accuracy: 100.00%
Epoch: 7
Loss: 0.2130
Accuracy: 100.00%
Epoch: 8
Loss: 0.1517
Accuracy: 100.00%
Epoch: 9
Loss: 0.1405
Accuracy: 100.00%
