# OctupleMIDI Training Process Demo

This notebook demonstrates the entire process of initializing the model and dataset for OctupleMIDI training, including a short training loop.

In [12]:
import os
if "PATH" not in os.environ:
    os.environ["PATH"] = ""
import sys
import torch
import numpy as np

# Add src to path
sys.path.append(os.getcwd())

from hparams.set_up_hparams import HparamsOctuple
from models.transformer import Transformer
from utils.data_utils import OctupleDataset
from utils.octuple import OctupleEncoding
from models.absorbing_diffusion import AbsorbingDiffusion

## 1. Setup Hyperparameters
We initialize the HparamsOctuple class with the configuration specific to the Octuple format.

In [13]:
class MockParser:
    def __init__(self):
        self.model = 'octuple'
        self.n_vis = 1
        self.visdom_port = 8097
        self.batch_size = 4
        self.notes = 128
        self.bars = 8
        self.epochs = 1
        self.lr = 0.001
        self.load_dir = None
        self.log_base_dir = None
        self.tracks = 'string'
        self.ema = False
        self.amp = False
        self.load_step = 0
        self.validation_set_size = 0.1

parser = MockParser()
H = HparamsOctuple(parser)
print(f"Codebook Sizes: {H.codebook_size}")
print(f"Latent Shape: {H.latent_shape}")

Codebook Sizes: (256, 128, 129, 256, 128, 32, 256, 49)
Latent Shape: (128, 8)


## 2. Load Data
We use the `OctupleDataset` to load preprocessed `.npy` files from `data/processed`.

In [14]:
dataset_path = "data/processed"
# Create dummy data if it doesn't exist for demonstration purposes
if not os.path.exists(dataset_path) or len(os.listdir(dataset_path)) == 0:
    print(f"Path {dataset_path} empty or missing. Creating dummy data for demonstration.")
    os.makedirs(dataset_path, exist_ok=True)
    # Create a dummy file
    dummy_data = np.random.randint(0, 32, (200, 8)).astype(np.int64)
    np.save(os.path.join(dataset_path, "dummy.npy"), dummy_data)

dataset = OctupleDataset(dataset_path, H.NOTES)
print(f"Dataset size: {len(dataset)}")

loader = torch.utils.data.DataLoader(dataset, batch_size=H.batch_size, shuffle=True)
batch = next(iter(loader))
print(f"Batch shape: {batch.shape}")

Found 909 data files.
Dataset size: 909
Batch shape: torch.Size([4, 128, 8])


## 3. Initialize Model and Sampler
Initialize the Transformer model and wrap it in the AbsorbingDiffusion sampler.

In [15]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

model = Transformer(H).to(device)
sampler = AbsorbingDiffusion(H, model, H.codebook_size).to(device)
print(f"Model initialized. Parameters: {sum(p.numel() for p in model.parameters())}")

Using device: cuda


AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


## 4. Training Loop Demonstration
We will run a short training loop to demonstrate the optimization process.

In [7]:
optim = torch.optim.Adam(sampler.parameters(), lr=H.lr)
sampler.train()

print("Starting training loop demo (5 steps)...")
for step in range(5):
    # Get batch
    try:
        x = next(iter(loader))
    except StopIteration:
        loader = torch.utils.data.DataLoader(dataset, batch_size=H.batch_size, shuffle=True)
        x = next(iter(loader))
        
    x = x.to(device).long()
    
    # Forward pass calculates loss internally in AbsorbingDiffusion.train_iter
    stats = sampler.train_iter(x)
    loss = stats['loss']
    
    # Backward pass
    optim.zero_grad()
    loss.backward()
    optim.step()
    
    print(f"Step {step+1}: Loss = {loss.item():.4f}")

print("Training loop demo complete.")

Starting training loop demo (5 steps)...


/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:163: operator(): block: [30,0,0], thread: [64,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "scatter gather kernel index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:163: operator(): block: [30,0,0], thread: [65,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "scatter gather kernel index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:163: operator(): block: [30,0,0], thread: [66,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "scatter gather kernel index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:163: operator(): block: [30,0,0], thread: [67,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "scatter gather kernel index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:163: operator(): block: [30,0,0], thread: [68,0,0] Assertion `idx_dim >= 0 && idx_dim < in

AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


## 5. Sampling Demonstration
Generate samples using the trained model (note: model won't produce music after only 5 steps, but this demonstrates the mechanism).

In [None]:
sampler.eval()
print("Attempting to sample... ")
with torch.no_grad():
    # sample_steps=10 for speed demonstration
    samples = sampler.sample(sample_steps=10, b=H.batch_size)
    
print(f"Samples shape: {samples.shape}")
print(f"Sample 0 (first 10 steps): \n{samples[0][:10]}")