<a href="https://colab.research.google.com/github/chrisjmccormick/shared-subspaces/blob/main/subspace_decoder/scripts/run_experiments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ▂▂▂▂▂▂▂▂▂▂▂▂

# Overview

This notebook demonstrates how to run the pre-training and fine-tuning scripts from a command line, and is also setup to allow you to run then from within the notebook.

**Running on Colab**

The current configurations require the 40GB A100 because of the pre-training batch size.

(If you wanted to run on a T4, you could adjust the training arguments to use batch accumulation. This would allow you to preserve the training behavior without running out of memory.)

What to expect:

* Pre-training runs take roughly 75 minutes. It's pushing the limit of what works well in Colab--you'll want to babysit the notebook a little to avoid disconnect issues.

* Fine-tuning is relatively fast, and completes in under 10 minutes.

**Training Arguments**

I designed the scripts such that everything is specified through `.json` config files rather than on the command line. However, there is a command line utility to define new configurations--see the "Defining a New Run" section at the end of this notebook.

The examples below run one of the existing configurations.

**Weights and Biases**

The scripts are set up to log to wandb by default. You can change the `wandb_mode` variable below to 'offline' if you don't have an account / don't want to log online.

The project names are currently hardcoded to:

* Pretraining: `decoder-pretrain-wiki103`



# ▂▂▂▂▂▂▂▂▂▂▂▂

# S1. Setup

## 1.1. Clone Repository

In [None]:
!git clone https://github.com/chrisjmccormick/shared-subspaces.git

Provide the full path to the subspace_decoder folder.

This will be added to the PYTHONPATH when executing the scripts so that they can import the classes from the local files.

This variable is also used to construct paths to config files and scripts.

In [None]:
base_path = "/content/shared-subspaces/subspace_decoder"

## 1.2. Weights & Biases

To provide your wandb API key for the script:
1. You could paste it in manually on the training command lines further down.
2. Or, use the secrets panel (the key symbol on the left edge of the notebook) and:
    * Define your wandb api key as `wandb_api_key`.
    * Grant access to this notebook.
    * Run the below cell to retrieve it.

In [None]:
# Get key from colab secrets
from google.colab import userdata

# Set to false if you don't want to use wandb.
# The scripts will still log using the wandb library, but to a local directory.
use_wandb = True

if use_wandb:
    # Enable Weights & Biases logging (online mode)
    wandb_mode = "online"

    # Get wandb API key from Colab secrets
    wandb_key = userdata.get("wandb_api_key")

# Set to offline if you don't want to log in.
else:
    wandb_mode = "offline"

    wandb_key = ""

## 1.3. Choose Configuration

In [None]:
import os
import json

#  Choose which config file to run
pretrain_config_path = f"{base_path}/configs/initial_mla.json"
#pretrain_config_path = f"{base_path}/configs/best_mla-o.json"

# Make sure it's a valid path
if not os.path.exists(pretrain_config_path):
    raise ValueError(f"Config file {pretrain_config_path} does not exist.")

# Print it out.
with open(pretrain_config_path, "r") as f:
    pretrain_config = json.load(f)

print(f"\n======== {pretrain_config_path} ========\n")

# Print out the configuration with spacing.
json_str = json.dumps(pretrain_config, indent=4)
print(json_str)



{
    "shorthand": "rd.32 - 6.mha - mlp.1024 - model.256.lyr.6 - ah.8.32",
    "notes": "Baseline using standard MHA with RoPE",
    "model": {
        "hidden_size": 256,
        "num_hidden_layers": 6,
        "intermediate_size": 1024,
        "hidden_dropout_prob": 0.1,
        "attention_dropout_prob": 0.1,
        "classifier_dropout": null,
        "initializer_range": 0.02,
        "layer_norm_eps": 1e-12,
        "rms_norm_eps": 1e-06,
        "vocab_size": 30522,
        "rope_theta": 10000.0,
        "rope_scaling": null,
        "max_position_embeddings": 128,
        "num_dense_layers": 6,
        "q_latent_dim": null,
        "kv_latent_dim": null,
        "num_attention_heads": 8,
        "head_dim": 32,
        "rope_dims": 32,
        "attention_bias": false,
        "output_subspace": false,
        "o_latent_dim": null,
        "attention_backend": "sdpa",
        "ffn_decompose": false,
        "ffn_rank": null,
        "vocab_subspace": false,
        "vocab_rank

# S2. Run Pre-Training

In [None]:
print("\n======= Pre-Train ========\n")

# Construct the command line
train_command = (
    f"PYTHONPATH={base_path} "
    f"WANDB_MODE={wandb_mode} "
    f'WANDB_API_KEY="{wandb_key}" '
    f"python {base_path}/scripts/train.py --config {pretrain_config_path}"
)

# Run pre-training
!{train_command}



Importing Packages...

2025-08-08 15:28:26.969088: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-08 15:28:26.987237: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1754666907.008364    4512 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754666907.014845    4512 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754666907.031469    4512 computation_placer.cc:177] computation placer already registered. Please check l

**Optional - Save the checkpoint to Google Drive**

In [None]:
import shutil

if False:
    # Copy whatever's in the checkpoints folder over to Google Drive.
    shutil.copytree(
        "/content/checkpoints",
        "/content/drive/MyDrive/decoder-pretrain-wiki103/checkpoints",
        dirs_exist_ok = True
    )

'/content/drive/MyDrive/encoder-pretrain-wiki103/checkpoints'

# ▂▂▂▂▂▂▂▂▂▂▂▂

# Defining a New Run

To modify the parameters from the command line, I created a command line utiltity in `/configs/create_new_config.py` which will copy one of the existing config files and allow you to specify any parameter changes.

See the [script](https://github.com/chrisjmccormick/shared-subspaces/blob/main/subspace_encoder/configs/create_new_config.py) for documentation, check out the baseline config [here](https://github.com/chrisjmccormick/shared-subspaces/blob/main/subspace_encoder/configs/best_mla-o.json) to see all of the hyperparameters that are defined, and see the [Config](https://github.com/chrisjmccormick/shared-subspaces/blob/main/subspace_encoder/models/shared_space_config.py#L81) class for documentation of the model parameters.

Below is an example for defining a new run which increases the output latent size to 96.

In [None]:
!python {base_path}/configs/create_new_config.py \
    mla-o_baseline_o96
    --base {base_path}/configs/mla-o_baseline.json \
    --shorthand "rd.32 - 6.mla.64.32.96 - mlp.1024 - model.256.lyr.6 - ah.8.32" \
    --notes "Trying increasing the output subspace size from 64 to 96" \
    --set model.o_latent_dim=96

Wrote new config to /content/shared-subspaces/subspace_encoder/configs/mla-o_baseline_o96.json


In [None]:
!cat {base_path}/configs/mla-o_baseline_o96.json

{
  "shorthand": "rd.32 - 6.mla.64.32.96 - mlp.1024 - model.256.lyr.6 - ah.8.32",
  "notes": "Trying increasing the output subspace size from 64 to 96",
  "model": {
    "hidden_size": 256,
    "num_hidden_layers": 6,
    "intermediate_size": 1024,
    "hidden_dropout_prob": 0.1,
    "attention_dropout_prob": 0.1,
    "classifier_dropout": null,
    "initializer_range": 0.02,
    "layer_norm_eps": 1e-12,
    "rms_norm_eps": 1e-06,
    "vocab_size": 30522,
    "rope_theta": 10000.0,
    "rope_scaling": null,
    "max_position_embeddings": 128,
    "num_dense_layers": 0,
    "q_latent_dim": 64,
    "kv_latent_dim": 32,
    "num_attention_heads": 8,
    "head_dim": 32,
    "rope_dims": 32,
    "attention_bias": false,
    "output_subspace": true,
    "o_latent_dim": 96,
    "attention_backend": "sdpa",
    "ffn_decompose": false,
    "ffn_rank": null,
    "vocab_subspace": false,
    "vocab_rank": 128
  },
  "pre_train": {
    "output_dir": "checkpoints/mla-o_baseline_o96",
    "seed": 42

# ▂▂▂▂▂▂▂▂▂▂▂▂