# Short Enwiki Train

Test that the model init code, runs without issues

**L6-D512 model with**
- Layer count: 6
- Embed size: 512

## Preparing the init model and test dataset

In [8]:
GPU_DEVICES="auto"
ENABLE_WANDB=False
WANDB_PREFIX="infctx-v5-unit-test"
DEEPSPEED_STRAT="deepspeed_stage_1"

print("ENABLE_WANDB:", ENABLE_WANDB)
print("GPU_DEVICES:", GPU_DEVICES)

if ENABLE_WANDB:
    WANDB_MODE="online"
else:
    WANDB_MODE="disabled"

# Computing the notebook, and various paths
import os
NOTEBOOK_DIR=os.path.dirname(os.path.abspath("__file__"))
PROJECT_DIR=os.path.abspath(os.path.join(NOTEBOOK_DIR, "../../"))
TRAINER_DIR=os.path.abspath(os.path.join(PROJECT_DIR, "./RWKV-v5/"))

print("NOTEBOOK_DIR:", NOTEBOOK_DIR)
print("TRAINER_DIR:", TRAINER_DIR)
print("PROJECT_DIR:", PROJECT_DIR)

ENABLE_WANDB: False
GPU_DEVICES: auto
NOTEBOOK_DIR: /home/ubuntu/dev-infctx/notebook/trainer-v5-unit-test
TRAINER_DIR: /home/ubuntu/dev-infctx/RWKV-v5
PROJECT_DIR: /home/ubuntu/dev-infctx


In [4]:
# First lets setup the various directories
!mkdir -p "{PROJECT_DIR}/model/"
!mkdir -p "{PROJECT_DIR}/datapath/"
!mkdir -p "{PROJECT_DIR}/checkpoint/"

In [None]:
# Lets initialized the L6-D512 model with the init_model.py code
!cd "{TRAINER_DIR}" && python3 init_model.py \
    --n_layer 6 --n_embd 512 \
    --vocab_size world \
    --skip-if-exists --safe-init \
    ../model/L6-D512-world-init.pth

In [6]:
# Preload the dataset
!cd "{TRAINER_DIR}" && \
    python3 preload_datapath.py "{NOTEBOOK_DIR}/config/enwiki_10k-world-4096.yaml"

Downloading readme: 100%|██████████████████████| 424/424 [00:00<00:00, 2.81MB/s]
Downloading data files:   0%|                             | 0/1 [00:00<?, ?it/s]
Downloading data:   0%|                             | 0.00/15.2M [00:00<?, ?B/s][A
Downloading data:  28%|█████▌              | 4.19M/15.2M [00:00<00:00, 13.4MB/s][A
Downloading data: 100%|████████████████████| 15.2M/15.2M [00:00<00:00, 23.4MB/s][A
Downloading data files: 100%|█████████████████████| 1/1 [00:00<00:00,  1.53it/s]
Extracting data files: 100%|████████████████████| 1/1 [00:00<00:00, 1572.08it/s]
Setting num_proc from 32 back to 1 for the train split to disable multiprocessing as it only contains one shard.
Generating train split: 100%|███| 10000/10000 [00:00<00:00, 68642.50 examples/s]
Map (num_proc=32): 100%|█████████| 10000/10000 [00:09<00:00, 1095.56 examples/s]
Filter (num_proc=32): 100%|██████| 10000/10000 [00:07<00:00, 1428.02 examples/s]
Map (num_proc=32): 100%|████████████| 1339/1339 [00:04<00:00, 285.37

In [9]:
!cd "{TRAINER_DIR}" && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 lightning_trainer.py fit \
        -c "{NOTEBOOK_DIR}/config/enwiki_10k-world-4096.yaml" \
        --trainer.logger.init_args.name="{WANDB_PREFIX} (train-ctx=4096, data-ctx=4096, {DEEPSPEED_STRAT})" \
        --trainer.strategy="{DEEPSPEED_STRAT}" \
        --trainer.devices="{GPU_DEVICES}" \
        --model.load_model="../model/L6-D512-world-init.pth"

[2023-08-27 10:00:29,295] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[RWKV.model] Running RWKV model using 'torch-jit' with torch '2.0.1'
