In [None]:
"""
---
title: Rotary Positional Embeddings (RoPE) Experiment
summary: This experiment trains a transformer model with Rotary Positional Embeddings (RoPE) on tiny Shakespeare dataset.
---

# Rotary Positional Embeddings (RoPE) Experiment

This is an annotated PyTorch experiment to train a transformer model with Rotary Positional Embeddings (RoPE).
"""

from labml import experiment
from labml.configs import option, calculate
from labml_nn.transformers import TransformerConfigs
from labml_nn.transformers.basic.autoregressive_experiment import AutoregressiveTransformer, Configs


# ### Rotary PE attention
def _rotary_pe_mha(c: TransformerConfigs):
    from labml_nn.transformers.rope import RotaryPEMultiHeadAttention
    return RotaryPEMultiHeadAttention(c.n_heads, c.d_model, 1.)


# Configuration options
calculate(TransformerConfigs.encoder_attn, 'rotary', _rotary_pe_mha)
calculate(TransformerConfigs.decoder_attn, 'rotary', _rotary_pe_mha)
calculate(TransformerConfigs.decoder_mem_attn, 'rotary', _rotary_pe_mha)


@option(Configs.model, 'rotary_pe_transformer')
def _model(c: Configs):
    """
    Create an autoregressive model and initialize weights
    """
    m = AutoregressiveTransformer(c.transformer.encoder,
                                  c.transformer.src_embed,
                                  c.transformer.generator).to(c.device)

    return m


def main():
    # Create experiment
    experiment.create(name="rotary_pe_transformer", writers={'screen'})
    # Create configs
    conf = Configs()
    # Override configurations
    experiment.configs(conf, {
        # No fixed positional embeddings
        'transformer.src_embed': 'no_pos',
        'transformer.tgt_embed': 'no_pos',

        # Encoder with RoPE
        'transformer.encoder_attn': 'rotary',

        #
        'model': 'rotary_pe_transformer',

        # Use character level tokenizer
        'tokenizer': 'character',
        # Prompt separator is blank
        'prompt_separator': '',
        # Starting prompt for sampling
        'prompt': 'It is ',
        # Use Tiny Shakespeare dataset
        'text': 'tiny_shakespeare',

        # Use a context size of $256$
        'seq_len': 512,
        # Train for 32 epochs
        'epochs': 32,
        # Batch size $4$
        'batch_size': 4,
        # Switch between training and validation for $10$ times
        # per epoch
        'inner_iterations': 10,

        # Model size
        'd_model': 128,
        'transformer.ffn.d_ff': 512,
        'transformer.n_heads': 16,
        'transformer.dropout': 0.0,

        # Use [Noam optimizer](../../optimizers/noam.html)
        'optimizer.optimizer': 'Noam',
        'optimizer.learning_rate': 1.,

        'dataloader_shuffle_with_replacement': True
    })

    # Set models for saving and loading
    experiment.add_pytorch_models({'model': conf.model})

    # Start the experiment
    with experiment.start():
        # Run training
        conf.run()


#
if __name__ == '__main__':
    main()

HTML(value='<pre  style="overflow-x: scroll;"><span style="color: #C5C1B4"></span>\n<span style="color: #C5C1B…

This code is a part of a machine learning experiment setup using the LabML library, particularly for training a Transformer-based model with Rotary Positional Encodings (RoPE) on a text dataset. Below is a step-by-step explanation of the code:

1. **Importing Libraries**:
   - The code begins by importing necessary libraries and modules from LabML and labml_nn. These libraries provide tools for configuring and running machine learning experiments.

2. **Configuring Transformer with RoPE**:
   - The `_rotary_pe_mha` function is defined to create a Rotary Positional Embedding (RoPE) Multi-Head Attention module. This is used to set up attention mechanisms in the Transformer architecture with RoPE.

3. **Configuration Options**:
   - The `calculate` function from LabML is used to set specific options for the Transformer model.
   - `TransformerConfigs.encoder_attn`, `TransformerConfigs.decoder_attn`, and `TransformerConfigs.decoder_mem_attn` are configuration options related to attention mechanisms in the encoder and decoder. These options are set to use the Rotary PE Multi-Head Attention module created earlier.

4. **Model Configuration**:
   - The `_model` function is defined to create an autoregressive Transformer model and initialize its weights. This model uses the Transformer configuration specified in the argument, including the RoPE Multi-Head Attention.

5. **Main Function**:
   - The `main` function is defined as the main entry point of the script.

6. **Creating an Experiment**:
   - `experiment.create` is used to create a LabML experiment named "rotary_pe_transformer" and specifies that experiment logs will be written to the screen.

7. **Configuration Setup**:
   - An instance of the `Configs` class (presumably provided by labml_nn) is created as `conf`. This class likely contains various configuration options for the Transformer model and training process.

8. **Overriding Configurations**:
   - The `experiment.configs` function is used to override specific configurations within the `conf` object.
   - It sets options related to data preprocessing, model architecture, training hyperparameters, and more. Notable configurations include using RoPE in the encoder, specifying the model to use RoPE (`'rotary_pe_transformer'`), character-level tokenization, dataset choice, sequence length, number of epochs, batch size, and learning rate.

9. **Adding Models to Experiment**:
   - `experiment.add_pytorch_models` is used to specify the PyTorch model to save and load during training. In this case, it adds the model defined in the `conf` object.

10. **Starting Experiment**:
    - The `experiment.start()` block begins the LabML experiment.

11. **Training Execution**:
    - Inside the experiment block, `conf.run()` is executed to initiate the training process with the configured options.

12. **Main Function Execution**:
    - The script checks whether it is being executed directly (`if __name__ == '__main__':`) and, if so, calls the `main` function to start the experiment when the script is run.

In summary, this code sets up a LabML experiment for training a Transformer-based model with Rotary Positional Encodings on a text dataset. It configures various aspects of the model, data, and training process and then executes the training within the LabML framework.

References : https://nn.labml.ai/transformers/rope/experiment.html and https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/rope