# Reproduction of the paper [Visualizing and understanding RNNs](https://arxiv.org/abs/1506.02078)
Project for the subject [344.063 Special Topics Natural Language Processing with Deep Learning ](https://kusss.jku.at/kusss/sz-lvadetail-overview.action?courseId=344063%2C2020S)

Authors: M. Vogl, F. Shalaby

Limitation: We only worked with 3mb warandpeace.txt, as the linux kernel dataset grew from 400mb during the writing of the paper to 800mb, which wouldn't fit into my GPU as a one-hot-able LongTensor.

Training for the 3x4x6 grid (with early stopping and parameters seen in config.txt) for 3mb warandpeace.txt took ~4h on a GTX2070Super.

**Improvements to other implementations**
* Using pytorch and lightning instead of lua-torch unlike the [original kapathy charrnn](https://github.com/karpathy/char-rnn)
* Still working with a current pytorch version unlike the great [huango implementation](https://github.com/huanghao-code/VisRNN_ICLR_2016_Text) 
* Shoter code - easier to understand: Old implementation. 1500+ LOC. New: 400 LOC (Counted with <br>`projects find proj -name "*.py" -print -exec cat {} \; | wc -l`)
* This version is extending the torch builtin [LSTM](https://pytorch.org/docs/master/generated/torch.nn.LSTM.html)/[GRU](https://pytorch.org/docs/master/generated/torch.nn.GRU.html)/[RNN](https://pytorch.org/docs/master/generated/torch.nn.RNN.html) with activation gate extraction, which is 10-100x faster than the other implementations. Includes support for apex 16 bit precision, cuda and optimized dataloading.

This file assumes you have downloaded the models (too big for github) from [http://öä.eu/models.zip](http://öä.eu/models.zip).

Hyperparameters from config:

In [1]:
print(open("config.py").read())  # hyperparameters used

import torch

# linux = https://cs.stanford.edu/people/karpathy/char-rnn/
_FILE_NAMES = ["shakespeare"] #, "warandpeace"]
FILE_NAME = _FILE_NAMES[0]
FILE_PATH = f"data/{FILE_NAME}.txt"
MODEL_PATH = "models/" #"/content/gdrive/MyDrive/DL-project/" #
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
N_LAYERS = 2
SEQ_LEN = 70
HIDDEN_SIZE = 32
BATCH_SIZE = 128
PREDICT_SEQ_LEN = SEQ_LEN
LR = 0.01
CLIP = 5
DROPOUT = 0.0
MAX_EPOCHS = 100
EMBEDDING_DIM = 0  # 0 = one-hot encoding
MODEL_NAME = "lstm"
PRECISION = 32  # 16 if DEVICE == "cuda" else 32
SPLITS = (0, 90, 95, 100)



In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display, IFrame
# from matplotlib_venn import venn3

import dataloader
from config import *
from visualize import visualize_gate, visualize_cell
from net.charrnn import CharRNN
%matplotlib inline

(trl, tel, val), vocab = dataloader.load(FILE_PATH, DEVICE, SPLITS, BATCH_SIZE, SEQ_LEN, unique=True)

# Text generation

In [24]:
MODEL_PATH = "models/"

In [3]:
net = CharRNN.load_from_file("lstm", 3, 256).eval()
for method in "rand", "max", "softrand":
    print(method, net.predict("Hello ", 200, vocab, method=method), sep="\t")

rand	Hello 3RRGTF?QiXT$HRZKuKXKJSXJ$$3pNQKP
MP.OKzBAHQ$BGW..Fa!uABNI.XaQKY!.WV
:XB$Fxi;F?TKJK&:u
kX?yZF[KL!RfVR;jZEAV?XSGAcUP-xTJZ$vLB$C;v$YKPM[x-:G;R&A$DRiF?HDS$jK]Rs$NX3x!PjP,MyzBPMsQ:POLa;RyIWsAQ3!NQDFJX]Aww&
max	Hello the son of the common son,
That the son of the common son of the son,
And the son of the son of the son of the son,
And the son of the son of the son of the son,
And the son of the son of the son of t
softrand	Hello be
Than sure it to these and most humouraratant have a many,
Whilst repute it when every sport safets.'

CLIFFORD:
Os
A perfection moved power to your a's hell in rebel bemp that shame;
For we have ve


# Model performance table - Training | Test - Loss | Accuracy

In [4]:
model_names = ["lstm", "rnn", "gru"]
n_layers = [1, 2, 3]
hidden_sizes = [64, 128, 256, 512]

for loader, loadername in (trl, "train data"), (tel, "test data"):
    corrects, losses = [], []
    display(loadername)
    for MODEL_NAME in model_names:
        for N_LAYERS in n_layers:
            for HIDDEN_SIZE in hidden_sizes:
                net = CharRNN.load_from_file(MODEL_NAME, N_LAYERS, HIDDEN_SIZE).eval()
                loss = [] #correct, loss = [], []
                for x, y in loader:
                    out = net.forward(x)
#                     correct += (out.argmax(-1) == y.flatten()).tolist()
                    loss += [(torch.nn.functional.cross_entropy(out, y.flatten())).tolist()]
#                 corrects.append(np.mean(correct))
                losses.append(np.mean(loss))

#     for title, arr in (("accuracy", corrects), ("loss", losses)):
    title, arr = "loss", losses
    arr = np.array(arr).reshape((len(hidden_sizes), len(model_names) * len(n_layers)))
    columns = pd.MultiIndex.from_product([model_names, n_layers])
    df = pd.DataFrame(arr, columns=columns, index=hidden_sizes)
    df.columns.name = title
    display(df.round(3))

'train data'

Unnamed: 0_level_0,lstm,lstm,lstm,rnn,rnn,rnn,gru,gru,gru
Unnamed: 0_level_1,1,2,3,1,2,3,1,2,3
64,1.577,1.45,1.337,1.202,1.49,1.339,1.313,1.496,1.485
128,1.387,1.411,1.308,1.746,1.596,1.601,2.45,1.651,1.563
256,3.308,3.339,1.646,2.154,3.322,3.337,1.604,1.451,1.393
512,1.445,1.497,1.385,1.405,1.703,1.526,1.402,1.46,1.453


'test data'

Unnamed: 0_level_0,lstm,lstm,lstm,rnn,rnn,rnn,gru,gru,gru
Unnamed: 0_level_1,1,2,3,1,2,3,1,2,3
64,1.659,1.542,1.448,1.378,1.586,1.453,1.433,1.581,1.584
128,1.485,1.521,1.442,1.808,1.67,1.669,2.459,1.723,1.642
256,3.305,3.334,1.717,2.175,3.32,3.335,1.675,1.539,1.478
512,1.538,1.571,1.482,1.5,1.769,1.609,1.491,1.548,1.541


# Gate Visualization (test set)

In [None]:
net = CharRNN.load_from_file("lstm", 3, 512).eval()
input_gates, forget_gates, cell_gates, output_gates, cell_states = net.extract_from_loader(tel)
visualize_gate(input_gates, forget_gates, output_gates)

# Cell visualization (test set)

In [3]:
net = CharRNN.load_from_file("lstm", 2, 512).eval()
reset_gates = net.extract_from_loader(tel)[3,:,:,:]
print(reset_gates.shape)
visualize_cell(reset_gates[:, 50:1000], dataloader.decode(tel.dataset.data[50:1000], vocab))

(2, 225400, 512)


# Showcase of the rnn cell visualization
![screenshot](visualization/screenshot.png)