# Trainver v5 code validation

Simple minimal training runs, to validate v5 code

> Important note: These example focuses only on how to configure your dataset, and does not properly perform checkmarking - for trainer configurations refer to the training notebooks

In [6]:
DEEPSPEED_STRAT="deepspeed_stage_1"
GPU_DEVICES="auto"
ENABLE_WANDB=False
WANDB_PREFIX="trainer-v5-validation L6-D512"

print("DEEPSPEED_STRAT:", DEEPSPEED_STRAT)
print("ENABLE_WANDB:", ENABLE_WANDB)
print("GPU_DEVICES:", GPU_DEVICES)

if ENABLE_WANDB:
    WANDB_MODE="online"
else:
    WANDB_MODE="disabled"

DEEPSPEED_STRAT: deepspeed_stage_1
ENABLE_WANDB: False
GPU_DEVICES: auto


## Intial setup

Before we go into the dataset setup, lets perform an initial setup for all the folders we need, and a small toy model which we would use throughout the various examples within this notebook.

In [3]:
# Setup the folders we will need
!mkdir -p ../../model/
!mkdir -p ../../datapath/
!mkdir -p ../../checkpoint/

# Initialized a simple L6-D512 model
!cd ../../RWKV-v5/ && python3 ./init_model.py --n_layer 6 --n_embd 512 --vocab_size neox --skip-if-exists ../model/L6-D512-v5-neox-init.pth


[2023-08-05 09:30:06,188] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[RWKV.model] Running RWKV model using 'torch-jit' with torch '2.1.0.dev20230706'
---- Initializing model ----
No of layers: 6
Embedding size: 512
Output model path: ../model/L6-D512-v5-neox-init.pth
Vocab size: 50277
---- ----- ----
50277 512   -0.1 emb.weight
512   512   0    blocks.0.att.receptance.weight
512   512   0    blocks.0.att.key.weight
512   512   1.0  blocks.0.att.value.weight
512   512   0    blocks.0.att.output.weight
2048  512   1.0  blocks.0.ffn.key.weight
512   512   0    blocks.0.ffn.receptance.weight
512   2048  0    blocks.0.ffn.value.weight
512   512   0    blocks.1.att.receptance.weight
512   512   0    blocks.1.att.key.weight
512   512   1.0  blocks.1.att.value.weight
512   512   0    blocks.1.att.output.weight
2048  512   1.0  blocks.1.ffn.key.weight
512   512   0    blocks.1.ffn.receptance.weight
512   2048  0    blocks.1.ffn.value.weight
512

## Quick train for v5

Preload and train the mini-v5 model

In [5]:
# Lets preload the requried dataset
!cd ../../RWKV-v5 && \
    python3 preload_datapath.py ../notebook/trainer-x-validation/mini-v5-enwiki.yaml

Found cached dataset parquet (/home/ubuntu/.cache/huggingface/datasets/teven___parquet/teven--enwiki_10k-de63a925546e70ab/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)
100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 945.30it/s]
                                                                                

In [15]:
# Validate the dataset is working, by doing a quick training run
!cd ../../RWKV-v5 && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 lightning_trainer.py fit \
        -c ../notebook/trainer-x-validation/mini-v5-enwiki.yaml \
        --trainer.logger.init_args.name="{WANDB_PREFIX} (full, train-ctx=4096, data-ctx=4096, {DEEPSPEED_STRAT})" \
        --trainer.strategy="{DEEPSPEED_STRAT}" \
        --trainer.devices="{GPU_DEVICES}"

[2023-08-05 10:59:06,695] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[RWKV.model] Running RWKV model using 'torch-jit' with torch '2.1.0.dev20230706'
  rank_zero_warn(
  rank_zero_warn(f"No seed found, seed set to {seed}")
Global seed set to 51716742
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


[RWKV.Trainer] Applying 'target_batch_size' with the following:
   - target_batch_size:       32
   - num_nodes:               1
   - num_devices:             1
   - accumulate_grad_batches: 32
   - effective_batch_size:    32

Found cached dataset parquet (/home/ubuntu/.cache/huggingface/datasets/teven___parquet/teven--enwiki_10k-de63a925546e70ab/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)
100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 964.43it/s]
Loading cached processed dataset at /home

In [16]:
# Lets preload the requried dataset
!cd ../../RWKV-v5 && \
    python3 export_checkpoint.py \
        ../checkpoint/trainer-validation/mini-v5-enwiki/last.ckpt/ \
        ../model/mini-v5-enwiki.pth

[2023-08-05 11:18:01,356] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint '../checkpoint/trainer-validation/mini-v5-enwiki/last.ckpt/checkpoint'
Detected checkpoint of type zero stage ZeroStageEnum.optimizer_states, world_size: 1
Parsing checkpoint created by deepspeed==0.9.5
Reconstructed fp32 state dict with 126 params 71966816 elements
Saving fp32 state dict to ../model/mini-v5-enwiki.pth
