# MuseNet 1B4 (Basemodel Training)
This model is a custom 1.4B model containing
- 96 layers
- 1024 embedding size

It was initialized using the original RWKV trainer here : https://github.com/PicoCreator/RWKV-LM-LoRA/blob/picocreator-init-memory-experiment/notebook/echo-B-1B4-training.ipynb

This is just a silly experiment of using RWKV with music

> This project assumes you have the rwkv-infctx conda env setup, and you are executing in that environment - see the main README.md for the conda env setup steps

# Basic Setup

In [1]:
# First lets get the blank init model, these init model was generated
# using the original RWKV-LM repo (as at this point of writing, this repo cannot init a model)
#
# As such I have preinitialized these blank models and uploaded them to HF for convinence
!mkdir -p ../../../model/
!mkdir -p ../../../datapath/
!mkdir -p ../../../checkpoint/
!cd ../../../model/ && wget https://huggingface.co/picocreator/memory-size-experiment-for-rwkv/resolve/main/Echo-B-1B4-Init.pth

--2023-07-19 17:56:18--  https://huggingface.co/picocreator/memory-size-experiment-for-rwkv/resolve/main/Echo-B-1B4-Init.pth
Resolving huggingface.co (huggingface.co)... 99.84.108.129, 99.84.108.87, 99.84.108.55, ...
Connecting to huggingface.co (huggingface.co)|99.84.108.129|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/cb/ef/cbef09abb2634a3375b28868bffa285226dfeabedec89b28c2fb302221164d66/aca2f7f217b1d21de5bbf528588684c3f8b2ea16d1b431c551f1681e58ec2de3?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27Echo-B-1B4-Init.pth%3B+filename%3D%22Echo-B-1B4-Init.pth%22%3B&Expires=1690048578&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5MDA0ODU3OH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jYi9lZi9jYmVmMDlhYmIyNjM0YTMzNzViMjg4NjhiZmZhMjg1MjI2ZGZlYWJlZGVjODliMjhjMmZiMzAyMjIxMTY0ZDY2L2FjYTJmN2YyMTdiMWQyMWRlNWJiZjUyODU4ODY4NGMzZjhiMmVhMTZkMWI0

In [2]:
DEEPSPEED_STRAT="deepspeed_stage_2_offload"
GPU_DEVICES="auto"
ENABLE_WANDB=False
WANDB_PREFIX="Musenet-1B4 L96-D1024"

print("DEEPSPEED_STRAT:", DEEPSPEED_STRAT)
print("ENABLE_WANDB:", ENABLE_WANDB)
print("GPU_DEVICES:", GPU_DEVICES)

if ENABLE_WANDB:
    WANDB_MODE="online"
else:
    WANDB_MODE="disabled"

# Computing the notebook, and various paths
import os
NOTEBOOK_DIR=os.path.dirname(os.path.abspath("__file__"))
PROJECT_DIR=os.path.abspath(os.path.join(NOTEBOOK_DIR, "../../../"))
TRAINER_DIR=os.path.abspath(os.path.join(PROJECT_DIR, "./RWKV-v4neo/"))

print("NOTEBOOK_DIR:", NOTEBOOK_DIR)
print("TRAINER_DIR:", TRAINER_DIR)
print("PROJECT_DIR:", PROJECT_DIR)

DEEPSPEED_STRAT: deepspeed_stage_2_offload
ENABLE_WANDB: False
GPU_DEVICES: auto
NOTEBOOK_DIR: /home/ubuntu/breadbrowser-music/notebook/experiment/breadbrowser-music
TRAINER_DIR: /home/ubuntu/breadbrowser-music/RWKV-v4neo
PROJECT_DIR: /home/ubuntu/breadbrowser-music


## Dataset preloading

In [4]:
# Lets preload the requried dataset (enwiki_100k)
!cd "{TRAINER_DIR}" && \
    python3 preload_dataset.py "{NOTEBOOK_DIR}/Musenet-1B4.yaml"

Found cached dataset csv (/home/ubuntu/.cache/huggingface/datasets/breadlicker45___csv/breadlicker45--musenet-encoders-40k-44cc13ced585f16a/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d)
100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 674.00it/s]
Downloading (…)okenizer_config.json: 100%|█████| 264/264 [00:00<00:00, 3.09MB/s]
Downloading (…)/main/tokenizer.json: 100%|███| 750k/750k [00:00<00:00, 51.8MB/s]
Downloading (…)cial_tokens_map.json: 100%|███| 99.0/99.0 [00:00<00:00, 1.22MB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
                                                                                

## Training process

In [4]:
# Foundation model training
!cd "{TRAINER_DIR}" && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python new_train.py fit \
        -c "{NOTEBOOK_DIR}/Musenet-1B4.yaml" \
        --trainer.logger.init_args.name="{WANDB_PREFIX} - (train-ctx=4096, {DEEPSPEED_STRAT})" \
        --trainer.strategy="{DEEPSPEED_STRAT}" \
        --trainer.devices="{GPU_DEVICES}" 

Setting ds_accelerator to cuda (auto detect)
[RWKV.model] Running RWKV model using 'torch-jit' with torch '2.0.1+cu118'
  rank_zero_warn(f"No seed found, seed set to {seed}")
Global seed set to 1053079484
[34m[1mwandb[0m: Currently logged in as: [33mpicocreator[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: wandb version 0.15.5 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.15.4
[34m[1mwandb[0m: Run data is saved locally in [35m[1m./wandb/run-20230713_163624-p1orui5k[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33m(8x3090) Echo-B-1B4 - Enwiki Foundation (ctx=4096, deepspeed_stage_1)[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/picocreator/RWKV-Memory-Experiment[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/picocreator/RWKV-Memory-Experiment/runs/p1o

In [19]:
# Lets export the model from the checkpoint
!cd "{TRAINER_DIR}" && \
    python export_checkpoint.py "../checkpoint/Musenet-1B4/last.ckpt" "../model/Musenet-1B4.pth"
!cd "{TRAINER_DIR}" && ls -alh "../model/Musenet-1B4.pth"

Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint '../checkpoint/Echo-B-1B4-enwiki/last.ckpt/checkpoint'
Detected checkpoint of type zero stage ZeroStageEnum.optimizer_states, world_size: 8
Parsing checkpoint created by deepspeed==0.9.3
Reconstructed fp32 state dict with 1734 params 1412675584 elements
Saving fp32 state dict to ../model/Echo-B-1B4-Stage1.pth
-rw-r--r-- 1 root root 5.3G Jul 14 03:17 ../model/Echo-B-1B4-Stage1.pth
