# InfCtx trainer validation
This model being trained has the same settings as raven 1B5 model.
- Layer count: 24
- Embed size: 2048

The goal is to validate loss rate change, across the exact same hyper parameters with the following
- 1024 data chunk size
- same learningrate / weightdecay / seed
- "teven/enwiki_10k" dataset, chunked to 1024 token sizes

With only the change in training context size
- 1024 context vs 128 context

> This project assumes you have the rwkv-infctx conda env setup, and you are executing in that environment - see the main README.md for the conda env setup steps
>
> All training runs (except dryrun) is configured to log to weights and bias, comment out the logger in the config file if you want to avoid this
>
> Due to existing "hang" issues with multi-gpu with bptt_length > 1, segmented training is limited to 1 gpu

## Preparing the init model and test dataset

In [1]:
# First lets setup the various directories, and get the blank init model, these init model was generated
# using the original RWKV-LM repo (as at this point of writing, this repo cannot init a model)
# As such I have preinitialized these blank models and uploaded them to HF for convinence
!mkdir -p ../../model/
!mkdir -p ../../datapath/
!mkdir -p ../../checkpoint/
!rm -rf ../../model/Echo-A-1B5-Init.pth
!cd ../../model/ && wget https://huggingface.co/picocreator/memory-size-experiment-for-rwkv/resolve/main/Echo-A-1B5-Init.pth
!ls -alh ../../model/Echo-A-1B5-Init.pth

--2023-07-05 01:12:44--  https://huggingface.co/picocreator/memory-size-experiment-for-rwkv/resolve/main/Echo-A-1B5-Init.pth
Resolving huggingface.co (huggingface.co)... 99.84.108.70, 99.84.108.129, 99.84.108.55, ...
Connecting to huggingface.co (huggingface.co)|99.84.108.70|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/cb/ef/cbef09abb2634a3375b28868bffa285226dfeabedec89b28c2fb302221164d66/0ec7214ed16737a6348254e6f96d8cdc04d3b5efbd5f53fe9337607ea42b5b9f?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27Echo-A-1B5-Init.pth%3B+filename%3D%22Echo-A-1B5-Init.pth%22%3B&Expires=1688778765&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2NiL2VmL2NiZWYwOWFiYjI2MzRhMzM3NWIyODg2OGJmZmEyODUyMjZkZmVhYmVkZWM4OWIyOGMyZmIzMDIyMjExNjRkNjYvMGVjNzIxNGVkMTY3MzdhNjM0ODI1NGU2Zjk2ZDhjZGMwNGQzYjVlZmJkNWY1M2ZlOTMzNzYwN2VhNDJiNWI5Zj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25ka

In [2]:
# Lets preload the requried dataset
!cd ../../RWKV-v4neo && python3 preload_dataset.py ../notebook/trainer-validation/infctx-validation-dryrun.yaml

Found cached dataset parquet (/home/ubuntu/.cache/huggingface/datasets/teven___parquet/teven--enwiki_10k-de63a925546e70ab/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 983.42it/s]
                                                                                

# Trainer Code validation via dryrun

The following dryrun, helps check that the existing trainer code changes are valid across 2 * 2 data samples.
It does not log the run the W&B

In [3]:
# Validate source code and env is working, by doing a short 2 sample dryrun
!cd ../../RWKV-v4neo && python3 new_train.py fit -c ../notebook/trainer-validation/infctx-validation-dryrun.yaml

Setting ds_accelerator to cuda (auto detect)
Global seed set to 3941088705
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cu117/wkv_128_bf16/build.ninja...
Building extension module wkv_128_bf16...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=wkv_128_bf16 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ubuntu/anaconda3/envs/rwkv-exp/lib/python3.11/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/rwkv-exp/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/rwkv-exp/lib/python3.11/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/rwkv-exp/

# Baseline full context (1024) training

Perform a full 1 epoch training run of training context size = 1024. Ensuring all data samples fit within the allocated training size.

In [4]:
# Full training run
!cd ../../RWKV-v4neo && python3 new_train.py fit -c ../notebook/trainer-validation/infctx-validation-full.yaml

Setting ds_accelerator to cuda (auto detect)
Global seed set to 3941088705
[34m[1mwandb[0m: Currently logged in as: [33mpicocreator[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.15.4
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./wandb/run-20230705_011558-adyn25wr[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33minfctx-validation-full (train-ctx=1024, data-ctx=1024, bs=12)[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation/runs/adyn25wr[0m
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cu117/wkv_1024_bf16/build.ninja...
Building extension module wkv_10

# Back-Propagation Through Time (512) training

Perform a full 1 epoch training run of training context size = 512. This is a less exegerated version of the 128 training

In [5]:
# Full training run
!cd ../../RWKV-v4neo && python3 new_train.py fit -c ../notebook/trainer-validation/infctx-validation-segmented-512.yaml

Setting ds_accelerator to cuda (auto detect)
Global seed set to 3941088705
[34m[1mwandb[0m: Currently logged in as: [33mpicocreator[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.15.4
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./wandb/run-20230705_023113-6fszt795[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33minfctx-validation-512-segmented (train-ctx=512, data-ctx=1024, bs=12)[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation/runs/6fszt795[0m
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cu117/wkv_512_bf16/build.ninja...
Building extension module

Rank: 0 partition count [1, 1, 1] and sizes[(1515008000, False), (49152, False), (49152, False)] 
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0006873607635498047 seconds

  | Name   | Type       | Params
--------------------------------------
0 | emb    | Embedding  | 102 M 
1 | blocks | ModuleList | 1.3 B 
2 | ln_out | LayerNorm  | 4.1 K 
3 | head   | Linear     | 102 M 
--------------------------------------
1.5 B     Trainable params
0         Non-trainable params
1.5 B     Total params
6,060.425 Total estimated model params size (MB)
Epoch 0: 100%|█| 5308/5308 [1:28:32<00:00,  1.00s/it, v_num=t795, train/loss=6.2
Validation: 0it [00:00, ?it/s][A
Validation:   0%|                                        | 0/54 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                           | 0/54 [00:00

# Back-Propagation Through Time (128) training

Perform a full 1 epoch training run of training context size = 128. Forcing all data samples to be segmented 8 times, via "Truncated Back-Propagation Through Time"
> PS: Weights and biases logging is enabled

In [6]:
# Full training run
!cd ../../RWKV-v4neo && python3 new_train.py fit -c ../notebook/trainer-validation/infctx-validation-segmented.yaml

Setting ds_accelerator to cuda (auto detect)
Global seed set to 3941088705
[34m[1mwandb[0m: Currently logged in as: [33mpicocreator[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.15.4
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./wandb/run-20230705_040135-arjrziht[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33minfctx-validation-segmented (train-ctx=128, data-ctx=1024, bs=12)[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation/runs/arjrziht[0m
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cu117/wkv_128_bf16/build.ninja...
Building extension module wkv

Rank: 0 partition count [1, 1, 1] and sizes[(1515008000, False), (49152, False), (49152, False)] 
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0005953311920166016 seconds

  | Name   | Type       | Params
--------------------------------------
0 | emb    | Embedding  | 102 M 
1 | blocks | ModuleList | 1.3 B 
2 | ln_out | LayerNorm  | 4.1 K 
3 | head   | Linear     | 102 M 
--------------------------------------
1.5 B     Trainable params
0         Non-trainable params
1.5 B     Total params
6,060.425 Total estimated model params size (MB)
Epoch 0: 100%|█| 5308/5308 [8:32:06<00:00,  5.79s/it, v_num=ziht, train/loss=6.7
Validation: 0it [00:00, ?it/s][A
Validation:   0%|                                        | 0/54 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                           | 0/54 [00:00

# Last segmented (128) training

Perform a full 1 epoch training run of training context size = 128. Only using the last segment. (This replicates previous known regression)

In [7]:
# Full training run
!cd ../../RWKV-v4neo && python3 new_train.py fit -c ../notebook/trainer-validation/infctx-validation-last-segment.yaml

Setting ds_accelerator to cuda (auto detect)
Global seed set to 3941088705
[34m[1mwandb[0m: Currently logged in as: [33mpicocreator[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.15.4
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./wandb/run-20230705_123606-lk8dbb2g[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33minfctx-validation-last-segment (train-ctx=128, data-ctx=1024, bs=12)[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/picocreator/RWKV-InfCtx-Validation/runs/lk8dbb2g[0m
Using /home/ubuntu/.cache/torch_extensions/py311_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py311_cu117/wkv_128_bf16/build.ninja...
Building extension module 

Epoch 0: 100%|█| 5308/5308 [2:57:30<00:00,  2.01s/it, v_num=bb2g, train/loss=7.1
Validation: 0it [00:00, ?it/s][A
Validation:   0%|                                        | 0/54 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                           | 0/54 [00:00<?, ?it/s][A
Validation DataLoader 0:   2%|▎                  | 1/54 [00:01<01:16,  1.44s/it][A
Validation DataLoader 0:   4%|▋                  | 2/54 [00:02<01:13,  1.42s/it][A
Validation DataLoader 0:   6%|█                  | 3/54 [00:04<01:12,  1.41s/it][A
Validation DataLoader 0:   7%|█▍                 | 4/54 [00:05<01:10,  1.41s/it][A
Validation DataLoader 0:   9%|█▊                 | 5/54 [00:07<01:08,  1.41s/it][A
Validation DataLoader 0:  11%|██                 | 6/54 [00:08<01:07,  1.42s/it][A
Validation DataLoader 0:  13%|██▍                | 7/54 [00:09<01:06,  1.41s/it][A
Validation DataLoader 0:  15%|██▊                | 8/54 [00:11<01:04,  1.41s/it][A
Validation DataLoader 0:  17%|███▏           