Imports

In [25]:
import string
import json

import torch
from torch import nn

import numpy as np

from labml import experiment, logger, lab
from labml_helpers.module import Module
from labml.analytics import ModelProbe
from labml.logger import Text, Style, inspect
from labml.utils.pytorch import get_modules
from labml.utils.cache import cache
from labml_helpers.datasets.text import TextDataset

from python_autocomplete.train import Configs
from python_autocomplete.evaluate import Predictor
from python_autocomplete.evaluate.beam_search import NextWordPredictionComplete

We load the model from a training run. For this demo I'm loading from a run I trained at home.

[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://web.lab-ml.com/run?uuid=39b03a1e454011ebbaff2b26e3148b3d)

If you have a locally trained model load it directly with:

```python
run_uuid = 'RUN_UUID'
checkpoint = None # Get latest checkpoint
```

`load_bundle` will download an archive with a saved checkpoint (pretrained model).

In [2]:
# run_uuid = 'a6cff3706ec411ebadd9bf753b33bae6'
# checkpoint = None

run_uuid, checkpoint = experiment.load_bundle(
    lab.get_path() / 'saved_checkpoint.tar.gz',
    url='https://github.com/lab-ml/python_autocomplete/releases/download/0.0.5/bundle.tar.gz')

We initialize `Configs` object defined in [`train.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/train.py).

In [3]:
conf = Configs()

Create a new experiment in evaluation mode. In evaluation mode a new training run is not created. 

In [4]:
experiment.evaluate()

Load custom configurations/hyper-parameters used in the training run.

In [5]:
custom_conf = experiment.load_configs(run_uuid)
custom_conf

{'epochs': 32,
 'is_token_by_token': True,
 'mem_len': 256,
 'model': 'transformer_xl_model',
 'n_layers': 6,
 'optimizer.learning_rate': 0.000125,
 'optimizer.optimizer': 'AdamW',
 'state_updater': 'transformer_memory',
 'text.batch_size': 12,
 'text.is_shuffle': False,
 'text.seq_len': 256,
 'text.tokenizer': 'bpe'}

Set the custom configurations

In [6]:
# custom_conf['device.use_cuda'] = False

In [7]:
experiment.configs(conf, custom_conf)

Set models for saving and loading. This will load `conf.model` from the specified run.

In [8]:
experiment.add_pytorch_models({'model': conf.model})

Specify which run to load from

In [9]:
experiment.load(run_uuid, checkpoint)

Start the experiment

In [10]:
experiment.start()

<labml.internal.experiment.watcher.ExperimentWatcher at 0x7f93f10eaf90>

Initialize the `Predictor` defined in [`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py).

We load `stoi` and `itos` from cache, so that we don't have to read the dataset to generate them. `stoi` is the map for character to an integer index and `itos` is the map of integer to character map. These indexes are used in the model embeddings for each character.

In [11]:
p = Predictor(conf.model, conf.text.tokenizer,
              state_updater=conf.state_updater,
              is_token_by_token=conf.is_token_by_token)

Set model to evaluation mode

In [12]:
_ = conf.model.eval()

Setup probing to extract attentions

In [13]:
probe = ModelProbe(conf.model)

A python prompt to test completion.

In [14]:
PROMPT = """from typing import Optional, Tuple

import torch
from torch import nn

from labml_nn.lstm import LSTM
from python_autocomplete.models import AutoregressiveModel


class LstmModel(AutoregressiveModel):
    def __init__(self, *,
                 n_tokens: int,
                 embedding_size: int,
                 hidden_size: int,
                 n_layers: int):
        super().__init__()

        self.embedding = nn.Embedding(n_tokens, embedding_size)
        self.lstm = LSTM(input_size=embedding_size,
                         hidden_size=hidden_size,
                         n_layers=n_layers)
        self.fc = nn.Linear(hidden_size, n_tokens)

    def __call__(self, x: torch.Tensor, state: Optional[Tuple[torch.Tensor, torch.Tensor]]):
        # shape of x is [seq, batch, feat]
        x = self.embedding(x)
        out, (hn, cn) = self.lstm(x, state)
        logits = self.fc(out)

        return logits, (hn, cn)
"""

Get a token. `get_token` predicts character by character greedily (no beam search) until it find and end of token character (non alpha-numeric character).

In [15]:
stripped, prompt = p.rstrip(PROMPT)
rest = PROMPT[len(stripped):]
prediction_complete = NextWordPredictionComplete(rest, 5)
prompt = torch.tensor(prompt, dtype=torch.long).unsqueeze(-1)

## Lets analyze attentions

In [16]:
tokens = [p.tokenizer.itos[i[0]] for i in prompt]
inspect(tokens)

Lets run the transformer XL model without cached memory to get the full attention matrix

In [17]:
inspect(p._get_predictions(prompt, None)[0])

We capture the outputs after the [attention softmax](https://nn.labml.ai/transformers/mha.html#section-34)

In [18]:
inspect(probe.forward_output['*softmax*'])

In [19]:
attn = probe.forward_output['*softmax*'].get_list()

Attentions have shape `[source, destination, batch, heads]`

In [20]:
inspect(attn[0].shape)

In [21]:
attn_maps = torch.stack([a.permute(2, 3, 0, 1)[0] for a in attn])

In [22]:
inspect(attn_maps)

In [23]:
torch.save( attn_maps, 'attentions.pt')

In [26]:
with open('tokens.json', 'w') as f:
    f.write(json.dumps({'src': tokens, 'dst': tokens}))