[![Github](https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social)](https://github.com/labmlai/annotated_deep_learning_paper_implementations)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.ipynb)

## Transformer Experiment

This trains a simple transformer with
[multi headed attention](https://nn.labml.ai/transformers/mha.html)
introduced in [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
on an NLP auto-regression task (with Tiny Shakespeare dataset).

### Install the packages

In [1]:
!pip install labml-nn --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/266.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.3/266.3 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m443.9/443.9 kB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.0/131.0 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m64.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for fairscale (pyproject.toml) ... [?25l[?25hdone


### Imports

In [2]:
import torch
import torch.nn as nn

from labml import experiment
from labml.configs import option
from labml_nn.transformers.basic.autoregressive_experiment import Configs

### Create an experiment

In [3]:
experiment.create(name="transformer", writers={'screen'})

### Configurations

In [4]:
conf = Configs()

Set experiment configurations and assign a configurations dictionary to override configurations

In [5]:
experiment.configs(conf, {
    # Use character level tokenizer
    'tokenizer': 'character',
    # Prompt separator is blank
    'prompt_separator': '',
    # Starting prompt for sampling
    'prompt': 'It is ',
    # Use Tiny Shakespeare dataset
    'text': 'tiny_shakespeare',

    # Use a context size of $256$
    'seq_len': 512,
    # Train for 32 epochs
    'epochs': 32,
    # Batch size $32$
    'batch_size': 16,
    # Switch between training and validation for $10$ times
    # per epoch
    'inner_iterations': 10,

    # Model size
    'd_model': 256,
    'transformer.n_heads': 16,
    'transformer.ffn.d_ff': 1024,

    # Use [Noam optimizer](../../optimizers/noam.html)
    'optimizer.optimizer': 'Noam',
    'optimizer.learning_rate': 1.,
})

Set PyTorch models for loading and saving

In [6]:
experiment.add_pytorch_models({'model': conf.model})

### Start the experiment and run the training loop.

In [None]:
# Start the experiment
with experiment.start():
    conf.run()