[![Github](https://img.shields.io/github/stars/lab-ml/python_autocomplete?style=social)](https://github.com/lab-ml/python_autocomplete)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/evaluate.ipynb)

# Evaluate a model trained on predicting Python code

This notebook evaluates a model trained on Python code.

Here's a link to [training notebook](https://github.com/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/python_autocomplete/blob/master/notebooks/train.ipynb)

### Install dependencies

In [None]:
!pip install labml labml_python_autocomplete

Imports

In [1]:
import string

import torch
from torch import nn

from labml import experiment, logger, lab
from labml_helpers.module import Module
from labml.logger import Text, Style
from labml.utils.pytorch import get_modules
from labml.utils.cache import cache
from labml_helpers.datasets.text import TextDataset

from python_autocomplete.train import Configs
from python_autocomplete.evaluate import evaluate, anomalies, complete, Predictor

We load the model from a training run. For this demo I'm loading from a run I trained at home.

[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://web.lab-ml.com/run?uuid=39b03a1e454011ebbaff2b26e3148b3d)

If you have a locally trained model load it directly with:

```python
run_uuid = 'RUN_UUID'
checkpoint = None # Get latest checkpoint
```

`load_bundle` will download an archive with a saved checkpoint (pretrained model).

In [1]:
run_uuid, checkpoint = experiment.load_bundle(
    lab.get_path() / 'saved_checkpoint.tar.gz',
    url='https://github.com/lab-ml/python_autocomplete/releases/download/0.0.4/transformer_checkpoint.tar.gz')

We initialize `Configs` object defined in [`train.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/train.py).

In [3]:
conf = Configs()

Create a new experiment in evaluation mode. In evaluation mode a new training run is not created. 

In [None]:
experiment.evaluate()

Load custom configurations/hyper-parameters used in the training run.

In [4]:
custom_conf = experiment.load_configs(run_uuid)
custom_conf

{'batch_size': 12,
 'epochs': 32,
 'model': 'transformer_model',
 'n_layers': 6,
 'optimizer.learning_rate': 1.0,
 'optimizer.optimizer': 'Noam',
 'seq_len': 512,
 'train_loader': 'shuffled_train_loader',
 'valid_loader': 'shuffled_valid_loader'}

Set the custom configurations

In [6]:
experiment.configs(conf, custom_conf)

Set models for saving and loading. This will load `conf.model` from the specified run.

In [7]:
experiment.add_pytorch_models({'model': conf.model})

Specify which run to load from

In [8]:
experiment.load(run_uuid, checkpoint)

Start the experiment

In [9]:
experiment.start()

<labml.internal.experiment.watcher.ExperimentWatcher at 0x7f655c41a400>

Initialize the `Predictor` defined in [`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py).

We load `stoi` and `itos` from cache, so that we don't have to read the dataset to generate them. `stoi` is the map for character to an integer index and `itos` is the map of integer to character map. These indexes are used in the model embeddings for each character.

In [10]:
p = Predictor(conf.model, cache('stoi', lambda: conf.text.stoi), cache('itos', lambda: conf.text.itos))

Set model to evaluation mode

In [None]:
_ = conf.model.eval()

A python prompt to test completion.

In [11]:
PROMPT = """from torch import nn

from labml_helpers.module import Module
from labml_nn.lstm import LSTM


class LSTM(Module):
    def __init__(self, *,
                 n_tokens: int,
                 embedding_size: int,
                 hidden_size int,
                 n_layers int):"""

Get a token. `get_token` predicts character by character greedily (no beam search) until it find and end of token character (non alpha-numeric character).

In [12]:
%%time
res =  p.get_token(PROMPT)
print('"' + res + '"')

"
        super"
CPU times: user 950 ms, sys: 34.7 ms, total: 984 ms
Wall time: 254 ms


Try another token

In [13]:
res = p.get_token(PROMPT + res)
print('"' + res + '"')

"(LSTM"


Load a sample python file to test our model

In [14]:
with open(str(lab.get_data_path() / 'sample.py'), 'r') as f:
    sample = f.read()
print(sample[-50:])

ckpoint()


if __name__ == '__main__':
    main()



## Test the model on a sample python file

`evaluate` function defined in
[`evaluate.py`](https://github.com/lab-ml/python_autocomplete/blob/master/python_autocomplete/evaluate.py)
will predict token by token using the `Predictor`, and simulates an editor autocompletion.

Colors:
* <span style="color:yellow">yellow</span>: the token predicted is wrong and the user needs to type that character.
* <span style="color:blue">blue</span>: the token predicted is correct and the user selects it with a special key press, such as TAB or ENTER.
* <span style="color:green">green</span>: autocompleted characters based on the prediction

In [15]:
%%time
evaluate(p, sample)

CPU times: user 1min 59s, sys: 62.9 ms, total: 1min 59s
Wall time: 1min 23s


`accuracy` is the fraction of charactors predicted correctly. `key_strokes` is the number of key strokes required to write the code with help of the model and `length` is the number of characters in the code, that is the number of key strokes required without the model.

*Note that this sample is a classic MNIST example, and the model must have overfitted to similar codes (exept for it's use of [LabML](https://github.com/lab-ml/labml) 😛).*

## Test anomalies in code

We run the model through the same sample code and visualize the probabilty of predicting each character.
<span style="color:green">green</span> means the probabilty of that character is high and 
<span style="color:red">red</span> means the probability is low.

In [16]:
anomalies(p, sample)

Here we try to autocomplete 100 characters

In [17]:
sample = """import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets, transforms

from labml import lab


class Model(nn.Module):
"""

complete(p, sample, 100)