# Verified Integer Mathematics in Transformers - Train the Model

This CoLab defines and trains a Transformer model that performs integer addition, subtraction and multiplication e.g. 133357+182243=+0315600, 123450-345670=-0123230 and 000345*000823=+283935. Each digit is a separate token. For 6 digit questions, the model is given 14 "question" (input) tokens, and must then predict the corresponding 8 "answer" (output) tokens.


This CoLab trains the model, storing the results to the Colab files. Useful models are manually copied to HuggingFace.

## Tips for using the Colab
 * You can run and alter the code in this CoLab notebook yourself in Google CoLab ( https://colab.research.google.com/ ).
 * To run the notebook, in Google CoLab, **you will need to** go to Runtime > Change Runtime Type and select GPU as the hardware accelerator.
 * Some graphs are interactive!
 * Use the table of contents pane in the sidebar to navigate.
 * Collapse irrelevant sections with the dropdown arrows.
 * Search the page using the search in the sidebar, not CTRL+F.

# Part 0: Import libraries
Imports standard libraries. Do not read.

Imports "verified_transformer" public library as "qt". This library is specific to this CoLab's "QuantaTool" approach to transformer analysis. Refer https://github.com/PhilipQuirke/verified_transformers/blob/main/README.md for more detail.

In [None]:
# Janky code to do different setup when run in a Colab notebook vs VSCode
DEVELOPMENT_MODE = True
try:
    import google.colab
    IN_COLAB = True
    print("Running as a Colab notebook")

    !pip install kaleido
    !pip install transformer_lens
    !pip install circuitsvis
    !pip install torchtyping
    !pip install transformers

except:
    IN_COLAB = False
    print("Running as a Jupyter notebook - intended for development only!")
    from IPython import get_ipython

    ipython = get_ipython()
    # Code to automatically update the HookedTransformer code as its edited without restarting the kernel
    ipython.magic("load_ext autoreload")
    ipython.magic("autoreload 2")

In [None]:
# Plotly needs a different renderer for VSCode/Notebooks vs Colab argh
import kaleido
import plotly.io as pio

if IN_COLAB or not DEVELOPMENT_MODE:
    pio.renderers.default = "colab"
else:
    pio.renderers.default = "notebook_connected"
print(f"Using renderer: {pio.renderers.default}")

import plotly.express as px
import plotly.graph_objects as go

In [None]:
pio.templates['plotly'].layout.xaxis.title.font.size = 20
pio.templates['plotly'].layout.yaxis.title.font.size = 20
pio.templates['plotly'].layout.title.font.size = 30

In [None]:
import json
import requests
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import tqdm.auto as tqdm
import random
from torch.utils.data import DataLoader
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import circuitsvis as cv
import math
from huggingface_hub import hf_hub_download

In [None]:
import transformer_lens
import transformer_lens.utils as utils
from transformer_lens.hook_points import (
    HookedRootModule,
    HookPoint,
)  # Hooking utilities
from transformer_lens import HookedTransformer, HookedTransformerConfig, FactoredMatrix, ActivationCache

In [None]:
!pip install --upgrade git+https://github.com/PhilipQuirke/verified_transformers.git
import QuantaTools as qt

# Part 1A: Configuration


This CoLab can be configured to:
- Train a model using traditional approach. For example, add_d6_l2_h3_t15K_s372001.pth is trained from scratch using 100% addition questions to give an "Addition" model with a very low loss (9e-9).
- Train a "mixed" model to do two tasks by inserting a "known good" model into the untrained composite model. For example, initialising  ins1_mix_d6_l3_h4_t20K_s372001.pth with  add_d6_l2_h3_dm510_dh170_t15K_s372001, and then training it on 80% subtraction questions and 20% addition questions.

If we are inserting (initialising) the model with existing addition model weightings they are loaded from HuggingFace.

We want to train very-low-loss models. This will aid the subsequent analysis step by removing noise and red herrings.   

In [None]:
# Main configuration class for main model creation and training.
# Derived from MathsConfig > AlgoConfig > UsefulConfig > ModelConfig
class ColabConfig(qt.MathsConfig):

  def __init__(self):
    super().__init__()

    self.main_model = None

    # Batch size for training
    self.batch_size = 64


  def to_dict(self):
    return {
      "n_layers": self.n_layers,
      "n_heads": self.n_heads,
      "d_vocab": self.d_vocab,
      "d_mlp": self.d_mlp,
      "d_head": self.d_head,
      "training_seed": self.training_seed,
      "n_digits": self.n_digits,
      "n_ctx": self.n_ctx,
      "act_fn": self.act_fn,
      "batch_size": self.batch_size,
      "n_training_steps": self.n_training_steps,
      "lr": self.lr,
      "weight_decay": self.weight_decay,
      "perc_mult": self.perc_mult,
      "perc_sub": self.perc_sub,
      "insert_late": self.insert_late,
      "insert_mode": self.insert_mode,
      "insert_n_layers": self.insert_n_layers,
      "insert_n_heads": self.insert_n_heads,
      "insert_training_seed": self.insert_training_seed,
      "insert_n_training_steps": self.insert_n_training_steps,
    }

In [None]:
# Singleton QuantaTool "main" configuration class. MathsConfig is derived from the chain AlgoConfig > UsefulConfig > ModelConfig
cfg = ColabConfig()


# Which model do we want to analyze? Uncomment one line:

#cfg.model_name = "" # Use default configuration specified in cfg

# Addition models
cfg.model_name = "add_d5_l1_h3_t15K_s372001"  # AddAccuracy=Two9s. Inaccurate as only has one layer
#cfg.model_name = "add_d5_l2_h3_t15K_s372001"  # AddAccuracy=Six9s
#cfg.model_name = "add_d6_l2_h3_t15K_s372001"  # AddAccuracy=Six9s. MAIN FOCUS
#cfg.model_name = "add_d6_l2_h3_t20K_s173289"  # AddAccuracy=Six9s
#cfg.model_name = "add_d6_l2_h3_t20K_s572091"  # AddAccuracy=Six9s
#cfg.model_name = "add_d5_l2_h3_t40K_s372001"  # AddAccuracy=Six9s
#cfg.model_name = "add_d6_l2_h3_t40K_s372001"  # AddAccuracy=Six9s
#cfg.model_name = "add_d10_l2_h3_t40K_s572091" # AddAccuracy=Six9s

# Subtraction model
#cfg.model_name = "sub_d6_l2_h3_t30K_s372001"  # SubAccuracy=Six9s
#cfg.model_name = "sub_d10_l2_h3_t75K_s173289"  # SubAccuracy=Two9s

# Mixed (addition and subtraction) model
#cfg.model_name = "mix_d6_l3_h4_t40K_s372001"  # Add/SubAccuracy=Six9s/Six9s
#cfg.model_name = "mix_d10_l3_h4_t75K_s173289"  # Add/SubAccuracy=Five9s/Two9s

# Mixed models initialized with addition model.
#cfg.model_name = "ins1_mix_d6_l3_h4_t40K_s372001"  # Add/SubAccuracy=Six9s/Six9s. MAIN FOCUS
#cfg.model_name = "ins1_mix_d6_l3_h4_t40K_s173289"  # Add/SubAccuracy=Five9s/Five9s
#cfg.model_name = "ins1_mix_d6_l3_h4_t50K_s572091"  # Add/SubAccuracy=Six9s/Five9s
#cfg.model_name = "ins1_mix_d6_l3_h3_t40K_s572091"  # Add/SubAccuracy=Six9s/Five9s
#cfg.model_name = "ins1_mix_d10_l3_h3_t50K_s572091"  # Add/SubAccuracy=Five9s/Five9s
#cfg.model_name = "ins1_mix_d6_l2_h3_t40K_s572091"  # Add/SubAccuracy=Six9s/Five9s. Two layer
#cfg.model_name = "ins1_mix_d6_l3_h3_t80K_s572091"  # Add/SubAccuracy=Six9s/Five9s. Fewer nodes?

# Mixed model initialized with addition model. Reset useful heads every 100 training epochs.
#cfg.model_name = "ins2_mix_d6_l4_h4_t40K_s372001"  # Add/SubAccuracy=Five9s/Five9s

# Mixed model initialized with addition model. Reset useful heads & MLPs every 100 training epochs.
#cfg.model_name = "ins3_mix_d6_l4_h3_t40K_s372001"  # Add/SubAccuracy=Four9s/Two9s

# Mixed models initialized with addition model. Randomise heads without a known subtask
#cfg.model_name = "ins4_mix_d6_l3_h4_t30K_s775824"  # Add/SubAccuracy=???/???
#cfg.model_name = "ins4_mix_d6_l2_h4_t30K_s775824"  # Add/SubAccuracy=???/???

# Part 1B: Configuration: Input and Output file names

In [None]:
# Needed when user changes model_name and reruns this Colab a second time
cfg.reset_useful()
cfg.reset_algo()

if cfg.model_name != "":
  # Update cfg member data n_digits, n_layers, n_heads, n_training_steps, training_seed from model_name
  cfg.parse_model_name()

  # Addition model
  cfg.perc_sub = 0
  if cfg.model_name.startswith("sub_") :
      # Subtraction model
      cfg.perc_sub = 100
  elif cfg.model_name.startswith("mix") :
      # Mixed (addition and subtraction) model
      cfg.perc_sub = 66 # Train on 66% subtraction and 33% addition question batches
  elif cfg.model_name.startswith("ins") :
      # Mixed model initialised with an addition model (using insert mode 1, 2 or 3)
      cfg.perc_sub = 80 # Train on 80% subtraction and 20% addition question batches

  # We train multiple versions of some models, inserting different addition models.
  insert_model_name = ""
  if cfg.model_name.startswith("ins1_mix_d6_l3") :
      if cfg.model_name == "ins1_mix_d6_l3_h3_t80K_s572091":
          insert_model_name = "add_d6_l2_h3_t40K_s372001"
      elif cfg.training_seed == 372001:
          insert_model_name = "add_d6_l2_h3_t15K_s372001"
      else:
          insert_model_name = "add_d6_l2_h3_t20K_s173289"
  elif cfg.model_name.startswith("ins1_mix_d6_l2") :
      insert_model_name = "add_d6_l2_h3_t20K_s173289"
  elif cfg.model_name.startswith("ins1_mix_d10_l3") :
      insert_model_name = "add_d10_l2_h3_t40K_s572091"
  elif cfg.model_name == "ins4_mix_d6_l3_h4_t30K_s775824" or cfg.model_name == "ins4_mix_d6_l2_h4_t30K_s775824":
      insert_model_name = "ins1_mix_d6_l3_h4_t40K_s372001"
      cfg.perc_sub = 66
  if insert_model_name != "":
      cfg.parse_insert_model_name(insert_model_name)

cfg.initialize_maths_token_positions()

In [None]:
def print_config():
  print("%Add=", cfg.perc_add, "%Sub=", cfg.perc_sub, "%Mult=", cfg.perc_mult, "InsertMode=", cfg.insert_mode, "File=", cfg.file_config_prefix)

In [None]:
main_fname = cfg.file_config_prefix
main_fname_pth = main_fname + '.pth'
main_fname_json = main_fname + '_train.json'

print_config()
print("weight_decay=", cfg.weight_decay, "lr=", cfg.lr, "batch_size=", cfg.batch_size)
print('Main model will save to Colab temporary file', main_fname_pth)
print('Main model config etc will save to Colab temporary file', main_fname_json)

In [None]:
# Singleton QuantaTool "ablation intervention" configuration class
acfg = qt.acfg
acfg.reset_ablate()

# Part 3A: Set Up: Vocabulary / Embedding / Unembedding

In [None]:
qt.set_maths_vocabulary(cfg)
qt.set_maths_question_meanings(cfg)
print(cfg.token_position_meanings)

# Part 3B: Create main_model
This section defines the token embedding / unembedding and creates the model.

In [None]:
# Transformer creation

# Structure is documented at https://neelnanda-io.github.io/TransformerLens/transformer_lens.html#transformer_lens.HookedTransformerConfig.HookedTransformerConfig
ht_cfg = HookedTransformerConfig(
    n_layers = cfg.n_layers,
    n_heads = cfg.n_heads,
    d_model = cfg.d_model,
    d_head = cfg.d_head,
    d_mlp = cfg.d_mlp,
    act_fn = cfg.act_fn,
    normalization_type = 'LN',
    d_vocab = cfg.d_vocab,
    d_vocab_out = cfg.d_vocab,
    n_ctx = cfg.n_ctx,
    init_weights = True,
    device = "cuda",
    seed = cfg.training_seed,
)

cfg.main_model = HookedTransformer(ht_cfg)

optimizer = optim.AdamW(cfg.main_model.parameters(),
                        lr = cfg.lr,
                        weight_decay = cfg.weight_decay,
                        betas = (0.9, 0.98))

max_iter = cfg.n_training_steps
warmup_iter = max_iter // 5
scheduler1 = torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=0.01, total_iters=int(warmup_iter))
scheduler2 = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=int(np.ceil((max_iter-warmup_iter))))
scheduler  = torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[int(warmup_iter)])

# Part 4: Data Generator
This section defines the training/testing data generator.


In [None]:
# Define "iterator" maths "questions" data generator function. Invoked using next().
ds = qt.maths_data_generator( cfg )

In [None]:
# Test data generator
tokens = next(ds)
print(tokens[:3,:])

# Part 5: Read insert_model from HuggingFace (optional)

If we are initialising the untrained model with an existing model,
then we load the existing model from HuggingFace.
We load both the model weights and a json file stating which nodes in the model are actually doing useful calculations.

In [None]:
insert_weights_fname = ""
insert_nodes_fname = ""

In [None]:
main_repo_name="PhilipQuirke/VerifiedArithmetic"

In [None]:
# Read insert_model weights from HuggingFace
if cfg.insert_mode >= 1:
  insert_weights_fname = insert_model_name + ".pth"

  ht_cfg = HookedTransformerConfig(
      n_layers = cfg.insert_n_layers,
      n_heads = cfg.insert_n_heads,
      d_model = cfg.d_model, # Assume constant
      d_head = cfg.d_head, # Assume constant
      d_mlp = cfg.d_mlp, # Assume constant
      act_fn = cfg.act_fn, # Assume constant
      normalization_type = 'LN',
      d_vocab = cfg.d_vocab, # Assume constant
      d_vocab_out = cfg.d_vocab, # Assume constant
      n_ctx = cfg.n_ctx, # Assume constant
      init_weights = True, # Assume constant
      device = "cuda",
      seed = cfg.insert_training_seed,
  )

  insert_model = HookedTransformer(ht_cfg)

  print('Loading insertion model from', insert_weights_fname)
  insert_model.load_state_dict(utils.download_file_from_hf(repo_name=main_repo_name, file_name=insert_weights_fname, force_is_torch=True))
  insert_model.eval()

  print("Loaded insert model", insert_weights_fname)

In [None]:
if cfg.insert_mode >= 1:
  # Read insert_model useful node information from HuggingFace
  insert_fname_behavior_json = insert_model_name + "_behavior.json"
  file_path = hf_hub_download(repo_id=main_repo_name, filename=insert_fname_behavior_json, revision="main")
  cfg.useful_nodes.load_nodes(file_path)
  print( "Loaded:", len(cfg.useful_nodes.nodes), "Sample:", cfg.useful_nodes.nodes[0].tags)

# Part 6B: Transfer all of insert_model into main_model (optional)



In [None]:
# Transfer all attention heads weights from the small to the main model, updating the right-most small.n_heads of main_model
def transfer_all_heads(from_model, from_cfg, start_layer, end_layer, to_model):
  from_n_heads = from_cfg["n_heads"]
  for from_layer_no, to_layer_no in enumerate(range(start_layer, end_layer+1)):
    to_model.blocks[to_layer_no].attn.W_Q.data[:from_n_heads] = from_model.blocks[from_layer_no].attn.W_Q.clone().data
    to_model.blocks[to_layer_no].attn.W_K.data[:from_n_heads] = from_model.blocks[from_layer_no].attn.W_K.clone().data
    to_model.blocks[to_layer_no].attn.W_V.data[:from_n_heads] = from_model.blocks[from_layer_no].attn.W_V.clone().data

    to_model.blocks[to_layer_no].attn.b_Q.data[:from_n_heads] = from_model.blocks[from_layer_no].attn.b_Q.clone().data
    to_model.blocks[to_layer_no].attn.b_K.data[:from_n_heads] = from_model.blocks[from_layer_no].attn.b_K.clone().data
    to_model.blocks[to_layer_no].attn.b_V.data[:from_n_heads] = from_model.blocks[from_layer_no].attn.b_V.clone().data

In [None]:
# Transfer all MLP layer weights from the small to the main model, updating the right-most small.d_mlp of main_model
def transfer_all_mlps(from_model, from_cfg, start_layer, end_layer, to_model):
  from_d_mlp = from_cfg["d_mlp"]
  for from_layer_no, to_layer_no in enumerate(range(start_layer, end_layer+1)):
    to_model.blocks[to_layer_no].mlp.W_in.data[:, :from_d_mlp] = from_model.blocks[from_layer_no].mlp.W_in.clone().data
    to_model.blocks[to_layer_no].mlp.b_in.data[:from_d_mlp] = from_model.blocks[from_layer_no].mlp.b_in.clone().data
    to_model.blocks[to_layer_no].mlp.W_out.data[:from_d_mlp,] = from_model.blocks[from_layer_no].mlp.W_out.clone().data
    to_model.blocks[to_layer_no].mlp.b_out.data = from_model.blocks[from_layer_no].mlp.b_out.clone().data #PQR????

In [None]:
def transfer_all_ln(from_model, start_layer, end_layer, to_model):
  for from_layer_no, to_layer_no in enumerate(range(start_layer, end_layer+1)):
    to_model.blocks[to_layer_no].ln1.w.data = from_model.blocks[from_layer_no].ln1.w.clone().data
    to_model.blocks[to_layer_no].ln1.b.data = from_model.blocks[from_layer_no].ln1.b.clone().data

  to_model.ln_final.w.data = from_model.ln_final.w.clone().data

In [None]:
def transfer_all_embeds(from_model, to_model):
  to_model.embed.W_E.data = from_model.embed.W_E.clone().data
  to_model.pos_embed.W_pos.data = from_model.pos_embed.W_pos.clone().data
  to_model.unembed.W_U.data = from_model.unembed.W_U.clone().data

In [None]:
from_cfg = {}
to_cfg = {}

# Insert the "from" model weights into the "to" model
def transfer_full_model(from_model, to_model, start_layer, end_layer, transfer_ln=True, transfer_embeds=True):
  """Args:
  from_model: The model to transfer weights from
  to_model: The model to transfer weights to
  start_layer: The first layer to transfer weights to
  end_layer: The last layer to transfer weights to (Note that this is end-inclusive!)
  """
  global from_cfg
  global to_cfg

  from_cfg = {k: v for k,v in from_model.cfg.__dict__.items() if k in ["d_head", "d_mlp", "d_model", "n_heads", "n_layers"]}
  to_cfg = {k: v for k,v in to_model.cfg.__dict__.items() if k in ["d_head", "d_mlp", "d_model", "n_heads", "n_layers"]}

  # Sanity checks for to_model size >= from model size
  assert from_cfg["d_model"] == to_cfg["d_model"]
  assert from_cfg["d_head"] == to_cfg["d_head"]
  assert from_cfg["n_layers"] <= to_cfg["n_layers"]
  assert from_cfg["n_heads"] <= to_cfg["n_heads"]
  assert from_cfg["d_mlp"] <= to_cfg["d_mlp"]

  assert 0 <= start_layer < end_layer <= to_cfg["n_layers"] # Make sure start_layer and end_layer are valid
  assert end_layer - start_layer + 1 == from_cfg["n_layers"] # Make sure the number of layers to transfer is correct

  transfer_all_heads(from_model, from_cfg, start_layer, end_layer, to_model)
  transfer_all_mlps(from_model, from_cfg, start_layer, end_layer, to_model)
  if transfer_ln:
    transfer_all_ln(from_model, start_layer, end_layer, to_model)
  if transfer_embeds:
    transfer_all_embeds(from_model, to_model)

In [None]:
def insert_existing_model( first_time ):
  if cfg.insert_mode >= 1 :
    # Is the destination the first few or last few layers of the main_model?
    start_layer = max(0, cfg.n_layers - cfg.insert_n_layers) if cfg.insert_late else 0
    end_layer = min(cfg.n_layers-1, start_layer + cfg.insert_n_layers-1)

    if first_time:
      print( "Inserting trained from_model", insert_weights_fname)
      print( "into untrained main_model", main_fname)
      print( "destination layers:", start_layer, end_layer)

    transfer_full_model(insert_model, cfg.main_model, start_layer, end_layer, first_time, first_time)


insert_existing_model( True )

# Part 6C: Transfer useful heads of insert_model into main_model (optional)

Transfer just the useful attention heads from insert_model into main_model.

In [None]:
# Transfer one attention head's weights from the small to the main model.
# The right-most small.n_heads of main_model are updated
def transfer_one_head(from_model, from_layer_no, from_head_no, to_model, start_layer):
  to_layer_no = start_layer + from_layer_no
  to_head_no = to_cfg["n_heads"] - from_cfg["n_heads"] + from_head_no

  to_model.blocks[to_layer_no].attn.W_Q.data[to_head_no] = from_model.blocks[from_layer_no].attn.W_Q.clone().data[from_head_no]
  to_model.blocks[to_layer_no].attn.W_K.data[to_head_no] = from_model.blocks[from_layer_no].attn.W_K.clone().data[from_head_no]
  to_model.blocks[to_layer_no].attn.W_V.data[to_head_no] = from_model.blocks[from_layer_no].attn.W_V.clone().data[from_head_no]

  to_model.blocks[to_layer_no].attn.b_Q.data[to_head_no] = from_model.blocks[from_layer_no].attn.b_Q.clone().data[from_head_no]
  to_model.blocks[to_layer_no].attn.b_K.data[to_head_no] = from_model.blocks[from_layer_no].attn.b_K.clone().data[from_head_no]
  to_model.blocks[to_layer_no].attn.b_V.data[to_head_no] = from_model.blocks[from_layer_no].attn.b_V.clone().data[from_head_no]

In [None]:
def transfer_useful_heads(from_model, to_model):
  if cfg.insert_mode >= 2 and len(cfg.useful_nodes) > 0:
    # Is the destination the first few or last few layers of the main_model?
    start_layer = cfg.n_layers - cfg.insert_n_layers if cfg.insert_late else 0
    transfer_count = 0

    for use_cell in cfg.useful_nodes:
      if use_cell.is_head:
        transfer_one_head(from_model, use_cell.layer, use_cell.num, to_model, start_layer)
        transfer_count += 1

    print('Transferred', transfer_count, 'useful heads')

# Part 6D: Randomise some attention heads in main_model (optional)

In [None]:
if cfg.insert_mode == 4 :
    # We randomise any attention head that does not have an identified sub-task.
    # The hope is that will eliminate some "low-value" or "noise" nodes from the new model.

    # Load inserted model's subtask data
    #      https://huggingface.co/PhilipQuirke/VerifiedArithmetic/raw/main/ins1_mix_d6_l3_h4_t40K_s372001_maths.json"
    insert_fname_maths_json = insert_model_name + "_maths.json"
    file_path = hf_hub_download(repo_id=main_repo_name, filename=insert_fname_maths_json, revision="main")
    cfg.useful_nodes.load_nodes(file_path)

    randomize_count = 0

    # For each attention head in the model ...
    for node in cfg.useful_nodes.nodes:
        randomize = False
        algo_tags = node.filter_tags(qt.QType.ALGO.value)

        # Some models use token 0 in predictions (possibly as a heuristic that large Dn give a positive answer in subtraction).
        # We want an accurate model that does not depend on heuristics
        # Randomise all attentions heads at token position zero
        if node.position == 0:
            randomize = True

        # If node does not have any identified subtasks, randomise it
        elif len(algo_tags) == 0:
            randomize = True

        else:
            # The SC and MC subtasks are optional as they can be replaced by ST and MT subtasks.
            # For each attention head in the model that only does SC and MC subtasks, randomise it
            sc_tags = len([s for s in algo_tags if ".SC" in s])
            mc_tags = len([s for s in algo_tags if ".MC" in s])
            if (sc_tags == 1 or mc_tags == 1) and len(algo_tags) == sc_tags + mc_tags:
                randomize = True

        if randomize:
            # PQR TODO
            # transfer_one_head(from_model, from_layer_no, from_head_no, to_model, start_layer):
            randomize_count += 1

    print('Randomized', randomize_count, 'inserted nodes')

# Part 7: Train add/sub/mix main_model with Infinite Data
Train main_model for n_training_steps, storing train_losses per epoch.

Each training step (of n_training_steps) new training data (a batch of batch_size tokens) is generated and the model is trained and loss calculated on it. No separate "testing" data is need   ed, as the training data is unique each step. Memorisation of past training data by the model (if any) is minimally beneficial. For 6 digit addition or subtraction there are 1000 billion possible questions.

In [None]:
print_config()

# Train the model
train_losses_list = []
batch_op_list = []
per_token_train_losses_list = []

for epoch in tqdm.tqdm(range(cfg.n_training_steps)):

  tokens = next(ds)
  logits = cfg.main_model(tokens)

  per_token_train_losses_raw, _ = qt.logits_to_tokens_loss(cfg, logits, tokens)
  per_token_train_losses = qt.loss_fn(per_token_train_losses_raw)
  per_token_train_losses_list.append(utils.to_numpy(per_token_train_losses))

  train_loss = per_token_train_losses.mean()
  train_loss.backward()
  train_losses_list.append(train_loss.item())
  batch_op_list.append(tokens[0][cfg.n_digits] == qt.MathsToken.PLUS)

  optimizer.step()
  scheduler.step()
  optimizer.zero_grad()

  if epoch % 100 == 0:
    print(epoch, train_loss.item())
    if cfg.insert_mode == 2:
      # Freeze the useful attention heads from insert_model
      transfer_useful_heads(insert_model, cfg.main_model)
    if cfg.insert_mode == 3:
      # Freeze the useful attention heads and MLP layers from insert_model
      insert_existing_model( False )

print(epoch, train_loss.item())

In [None]:
final_training_loss = round((train_losses_list[-5]+train_losses_list[-4]+train_losses_list[-3]+train_losses_list[-2]+train_losses_list[-1])/5,9)

print( "AvgFinalLoss", final_training_loss)
print( "FinalLoss", train_losses_list[-1])

In [None]:
# These temporary Colab files can be manually downloaded from the Colab "Files" tab (at left).
# The download can be manually loaded into HuggingFace so the "VerifiedArithmeticAnalyse" Colab can access it.

print("Saving main model to temporary Colab file", main_fname_pth)
torch.save(cfg.main_model.state_dict(), main_fname_pth)

In [None]:
extra_data = {
    "Config": cfg.to_dict(),
    "AvgFinalLoss": final_training_loss,
    "FinalLoss": train_losses_list[-1],
    "TrainingLoss": train_losses_list
}

print( "Saving main model config etc to temporary Colab file:", main_fname_json)
save_cfg = cfg.to_dict()
with open(main_fname_json, 'w') as file:
    json.dump(extra_data, file)

# Part 9: Line Graphs

This section analyses the training loss by graphing it at a high level.

The loss curve for all digits show visible inflection points (bumps), but is too high level to help understand the algorithm.

When this graph is decomposed into 'per digit' graphs, the interesting distinct 'per digit' curves appear, showing each digit is being refined semi-independently, with the model algorithm refining each digit separately.

In [None]:
steps_to_graph=1500

In [None]:
# Helper function to plot single lines
def line(tensor, renderer=None, xaxis="", yaxis="", **kwargs):
    px.line(utils.to_numpy(tensor), labels={"x":xaxis, "y":yaxis}, **kwargs).show(renderer)


# Helper function to plot multiple lines
def lines(raw_lines_list, x=None, mode='lines', labels=None, xaxis='Epoch', yaxis='Loss', title = '', log_y=False, hover=None, all_epochs=True, **kwargs):
    global steps_to_graph
    global cfg

    full_title, fig = qt.plot_loss_lines(cfg=cfg, steps_to_graph=steps_to_graph, raw_lines_list=raw_lines_list, x=x, mode=mode, labels=labels,
                                         xaxis=xaxis, yaxis=yaxis, log_y=log_y, hover=hover, all_epochs=all_epochs,
                                         title=title, title_font_size=32, tick_font_size=24)

    if cfg.graph_file_suffix != "":
        filename = full_title.replace(" ", "").replace("(", "").replace(")", "").replace("&", "").replace(",", "").replace("%", "")  + '.' + cfg.graph_file_suffix
        pio.write_image(fig, filename)

In [None]:
if cfg.perc_sub > 0 and cfg.perc_add > 0:
    add_points = [val if flag else None for val, flag in zip(train_losses_list, batch_op_list)]
    sub_points = [val if not flag else None for val, flag in zip(train_losses_list, batch_op_list)]

    fig = go.Figure()
    fig.add_trace(go.Scatter(x=list(range(len(add_points))), y=add_points, mode='markers', name='Addition', marker=dict(color='green')))
    fig.add_trace(go.Scatter(x=list(range(len(sub_points))), y=sub_points, mode='markers', name='Subtraction', marker=dict(color='red')))
    fig.update_layout(title='Training Loss Graph by operation',
                      xaxis_title='Training step',
                      yaxis_title='Loss',
                      showlegend=True)
    qt.plot_loss_lines_layout(cfg, fig, 14, np.arange(len(add_points)))
    fig.show(bbox_inches="tight")
    pio.write_image(fig, cfg.model_name + "_LossByOperation." + cfg.graph_file_suffix )

    fig.update_layout(title='Training Log Loss Graph by operation',
                      xaxis_title='Training step',
                      yaxis_title='Log loss',
                      showlegend=True)
    fig.update_layout(yaxis_type="log")
    fig.show(bbox_inches="tight")
    pio.write_image(fig, cfg.model_name + "_LogLossByOperation." + cfg.graph_file_suffix )

In [None]:
title_suffix = 'Digit Loss Curves ' + main_fname
per_token_losses = np.stack(per_token_train_losses_list, axis=0)

line(train_losses_list, title=title_suffix)

answer_digits = cfg.n_digits + 1
all_epochs = True;
for i in range(2):
  lines(raw_lines_list=[per_token_losses[:, i] for i in range(answer_digits)]+[train_losses_list],
        labels = [f'A{cfg.n_digits-j}' for j in range(answer_digits)]+['All'],
        title='Per digit'+title_suffix, all_epochs=all_epochs, log_y=False)

  lines(raw_lines_list=[per_token_losses[:, i] for i in range(answer_digits)]+[train_losses_list],
        labels = [f'A{cfg.n_digits-j}' for j in range(answer_digits)]+['All'],
        title='Per digit'+title_suffix, all_epochs=all_epochs, log_y=True)

  all_epochs = False

for i in range(answer_digits):
  print('Final Loss for A' + str(cfg.n_digits-i) + ' is ', per_token_losses[-1, i])

# Part 10: Questions Set Up

Create sets of sample questions (by task) to ask the model to predict

In [None]:
def make_varied_questions():
  q0 = next(ds)
  q1 = next(ds)
  q2 = next(ds)
  q3 = next(ds)

  questions = torch.vstack((q0.cuda(), q1.cuda(), q2.cuda(), q3.cuda()))

  return questions

In [None]:
varied_questions = make_varied_questions()
num_varied_questions = varied_questions.shape[0]

qt.a_set_ablate_hooks(cfg)
qt.a_calc_mean_values(cfg, varied_questions)

cfg.main_model.reset_hooks()
cfg.main_model.set_use_attn_result(True)
sample_logits, sample_cache = cfg.main_model.run_with_cache(varied_questions.cuda())

# Part 11: Attention Patterns
Attention patterns show which token(s) the model's attention heads are paying attention to in each token position of the prediction calculation.

For the default CoLab set up, the  model has 3 attention heads, and performs 5 digit addition. The attention pattern is 18 by 18 squares (as 54321+77779=132100 is 18 tokens). Time proceeds vertically downwards, with one additional token being revealed horizontally at each token position, giving the overall triangle shape. This visualisation provided insights. After the question is fully revealed (at token position 11), each head starts attending to pairs of question digits from left to right (i.e. high-value digits before lower-value digits) giving the “double staircase" shape. The three heads attend to a given digit pair in three different token position, giving a time ordering of heads.

In [None]:
def show_token_attention_patterns(index, layer, token_at_index, use_case):

  the_tokens = [str(token) for token in token_at_index.tolist()]
  if layer == 0:
    tokens_str = qt.tokens_to_string(cfg, token_at_index)
    print("Attention patterns for", tokens_str)

  attention_pattern=sample_cache["pattern", layer, "attn"][index]
  display(cv.attention.attention_patterns(
      tokens=the_tokens,
      attention=attention_pattern,
      #attention_head_names=[f"L{layer}H{i}" for i in range(cfg.n_heads)],
  ))


sample_size = 3

# Show attention patterns for some randomly chosen tokens
for i in range(sample_size):
  for layer in range(cfg.n_layers):
    show_token_attention_patterns(i, layer, tokens[i], "Misc")


In [None]:
if cfg.graph_file_suffix != "":

  tokens_str = []
  for i in range(cfg.n_heads):
    one_token_str = []
    for j in tokens[i]:
      one_token_str.append(str(utils.to_numpy(j)))
    tokens_str.append(one_token_str)

  # Refer https://github.com/callummcdougall/CircuitsVis/blob/main/python/circuitsvis/circuitsvis_demo.ipynb

  # html_object = cv.attention.from_cache(
  #    cache = sample_cache,
  #    tokens = tokens_str, # list of list of strings
  #    return_mode = "html",
  #)

  # Create a CoLab file containing the attention pattern(s) in HTML
  #filename = "AttentionPattern" + str(cfg.n_digits) + "Digits" + str(cfg.n_heads) + "Heads.html"
  #with open(filename, "w") as f:
  #    f.write(html_object.data)

  # Manually download the CoLab "html" file and open in your local browser.
  # Install and use the Edge extension "FireShot" to save a portion of the HTML page as a PDF