# Eagle 7B : Finetuning on textbooks!

The following showcases an example of Training the RWKV-v5 7B model, on tiny-strange-textbooks
- https://huggingface.co/datasets/nampdn-ai/tiny-strange-textbooks

In this example, we will be training the model with 16k sample sizes

## Configure the env variable below
The default should work for 8x4090s, but tweak it accordingly

In [65]:
# -----------------------------------------------------------------
# Your configurable settings
# -----------------------------------------------------------------

# WANDB settings
ENABLE_WANDB=True
WANDB_PREFIX="Eagle-X-Training"
WANDB_PROJECT="RWKV-v5-Finetune"

# Project directory offset (you need to modify if, you move the notebook into another dir)
PROJECT_DIR_OFFSET="../../"

# Config dir (relative to the notebook, excluding ending slash)
# to use, with the config filename
CONFIG_FILE_DIR="."
CONFIG_FILE_NAME="Eagle-x-tiny-textbook"

# The model to use
MODEL_NAME="RWKV-v5-Eagle-World-7B-v2-20240128-ctx4096.pth"
MODEL_URL="https://huggingface.co/RWKV/v5-Eagle-7B/resolve/main/RWKV-v5-Eagle-World-7B-v2-20240128-ctx4096.pth?download=true"

# GPU count to use
GPU_DEVICES="auto"

# -----------------------------------------------------------------
# Lets detect the GPU vram sizes, and suggest a resonable default
# based on the detected VRAM sizes
# -----------------------------------------------------------------
import torch
GPU_0_VRAM_SIZE_GB=torch.cuda.get_device_properties(0).total_memory / 1024**3
GPU_COUNT=torch.cuda.device_count()
if GPU_DEVICES != "auto":
    GPU_COUNT=int(GPU_DEVICES)
print("GPU_COUNT:", GPU_COUNT)
print("GPU_0_VRAM_SIZE (GB):", GPU_0_VRAM_SIZE_GB)

# -----------------------------------------------------------------
# Auto select the strategy based on the detected VRAM size
# -----------------------------------------------------------------
if GPU_0_VRAM_SIZE_GB < 17:
    assert False, "For the Eagle-7B model, you need atleast 18GB vram"
elif GPU_0_VRAM_SIZE_GB < 23:
    # This takes about 17.5GB vram on a single GPU
    # We DO NOT recommend training with ctx_len=128, as the training
    # quality will degrade noticably. But it will work!
    DEEPSPEED_STRAT="deepspeed_stage_2_offload"
    TRAINING_CTX_LEN=128
    MICROBATCH_SIZE=1
elif GPU_0_VRAM_SIZE_GB < 25:
    # This takes about 21GB vram on a single GPU
    DEEPSPEED_STRAT="deepspeed_stage_2_offload"
    TRAINING_CTX_LEN=2048
    MICROBATCH_SIZE=2
elif GPU_0_VRAM_SIZE_GB < 50:
    # This takes about 23GB vram on a single GPU
    DEEPSPEED_STRAT="deepspeed_stage_2"
    TRAINING_CTX_LEN=2048
    MICROBATCH_SIZE=2

# -----------------------------------------------------------------
# # Training settings you can use to override the "auto" default above
# -----------------------------------------------------------------
# DEEPSPEED_STRAT="deepspeed_stage_1"
# TRAINING_CTX_LEN=4096
# MICROBATCH_SIZE=8

# ---
print("ENABLE_WANDB:", ENABLE_WANDB)
print("GPU_DEVICES:", GPU_DEVICES)
print("DEEPSPEED_STRAT:", DEEPSPEED_STRAT)
print("TRAINING_CTX_LEN:", TRAINING_CTX_LEN)
if ENABLE_WANDB:
    WANDB_MODE="online"
else:
    WANDB_MODE="disabled"

# Computing the notebook, and various paths
import os
NOTEBOOK_DIR=os.path.dirname(os.path.abspath("__file__"))
PROJECT_DIR=os.path.abspath(os.path.join(NOTEBOOK_DIR, PROJECT_DIR_OFFSET))
TRAINER_DIR=os.path.abspath(os.path.join(PROJECT_DIR, "./RWKV-v5/"))
print("NOTEBOOK_DIR:", NOTEBOOK_DIR)
print("TRAINER_DIR:", TRAINER_DIR)
print("PROJECT_DIR:", PROJECT_DIR)

# Check if the directory exists
if not os.path.exists(TRAINER_DIR):
    raise Exception("The trainer directory does not exists. Did you move the notebook?")

GPU_COUNT: 8
GPU_0_VRAM_SIZE (GB): 23.64971923828125
ENABLE_WANDB: True
GPU_DEVICES: auto
DEEPSPEED_STRAT: deepspeed_stage_2
TRAINING_CTX_LEN: 1024
NOTEBOOK_DIR: /home/recursal/RWKV-infctx-trainer/notebook/finetune-example
TRAINER_DIR: /home/recursal/RWKV-infctx-trainer/RWKV-v5
PROJECT_DIR: /home/recursal/RWKV-infctx-trainer


## Lets download the model

In [30]:
!cd "{PROJECT_DIR}" && mkdir -p "./model" && \
    cd "./model" && \
    wget -nc "{MODEL_URL}" -O "{MODEL_NAME}"

--2024-01-31 09:46:31--  https://huggingface.co/RWKV/v5-Eagle-7B/resolve/main/RWKV-v5-Eagle-World-7B-v2-20240128-ctx4096.pth?download=true
Resolving huggingface.co (huggingface.co)... 108.138.246.67, 108.138.246.71, 108.138.246.79, ...
Connecting to huggingface.co (huggingface.co)|108.138.246.67|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.huggingface.co/repos/d5/6f/d56f8718b68e0e1840ad1e209498db64132d773e8c85a1bf4f194501bc3cddcf/a88c7274184b211e5545c8f992f0b80d03c40a447980bbfcd0f6d5858982615a?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27RWKV-v5-Eagle-World-7B-v2-20240128-ctx4096.pth%3B+filename%3D%22RWKV-v5-Eagle-World-7B-v2-20240128-ctx4096.pth%22%3B&Expires=1706953591&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwNjk1MzU5MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2Q1LzZmL2Q1NmY4NzE4YjY4ZTBlMTg0MGFkMWUyMDk0OThkYjY0MTMyZDc3M2U4Yzg1Y

In [31]:
# Lets preload the requried dataset
!cd "{TRAINER_DIR}" && python3 preload_datapath.py "{NOTEBOOK_DIR}/{CONFIG_FILE_DIR}/{CONFIG_FILE_NAME}.yaml"

Map (num_proc=160): 100%|███████| 10000/10000 [00:00<00:00, 10826.81 examples/s]
Filter (num_proc=160): 100%|████| 10000/10000 [00:00<00:00, 14926.90 examples/s]
Map (num_proc=160): 100%|██████████| 8061/8061 [00:01<00:00, 4165.24 examples/s]
num_proc must be <= 83. Reducing num_proc to 83 for dataset of size 83.
Map (num_proc=83): 100%|████████████████| 83/83 [00:00<00:00, 215.06 examples/s]
num_proc must be <= 83. Reducing num_proc to 83 for dataset of size 83.
Map (num_proc=83): 100%|████████████████| 83/83 [00:00<00:00, 218.22 examples/s]
num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1.
Map: 100%|████████████████████████████████| 1/1 [00:00<00:00, 124.21 examples/s]
Saving the dataset (1/1 shards): 100%|█| 83/83 [00:00<00:00, 1259.83 examples/s]
Saving the dataset (1/1 shards): 100%|████| 1/1 [00:00<00:00, 340.31 examples/s]


## Start the training run!

In [66]:
# Setup the checkpoint dir
!cd "{PROJECT_DIR}" && mkdir -p "./checkpoint/{CONFIG_FILE_NAME}/"

# Lets start the training
!cd "{TRAINER_DIR}" && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 lightning_trainer.py fit \
        -c "{NOTEBOOK_DIR}/{CONFIG_FILE_DIR}/{CONFIG_FILE_NAME}.yaml" \
        --model.load_model="../model/{MODEL_NAME}" \
        --data.skip_datapath_setup=True \
        --trainer.callbacks.init_args.dirpath="../checkpoint/{CONFIG_FILE_NAME}/" \
        --trainer.logger.init_args.name="{WANDB_PREFIX} - {CONFIG_FILE_NAME} (tctxlen={TRAINING_CTX_LEN}, {DEEPSPEED_STRAT})" \
        --trainer.logger.init_args.project="{WANDB_PROJECT}" \
        --trainer.strategy="{DEEPSPEED_STRAT}" \
        --trainer.target_batch_size=64 \
        --trainer.microbatch_size={MICROBATCH_SIZE} \
        --model.ctx_len={TRAINING_CTX_LEN} \
        --trainer.devices="{GPU_DEVICES}"

[2024-01-31 10:50:54,386] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[RWKV.model] Running RWKV infctx using 'torch-jit' with torch '2.1.2'
/home/recursal/miniconda3/envs/rwkv-infctx/lib/python3.11/site-packages/lightning/pytorch/cli.py:518: LightningCLI's args parameter is intended to run from within Python like if it were from the command line. To prevent mistakes it is not recommended to provide both args and command line arguments, got: sys.argv[1:]=['fit', '-c', '/home/recursal/RWKV-infctx-trainer/notebook/finetune-example/./Eagle-x-tiny-textbook.yaml', '--model.load_model=../model/RWKV-v5-Eagle-World-7B-v2-20240128-ctx4096.pth', '--data.skip_datapath_setup=True', '--trainer.callbacks.init_args.dirpath=../checkpoint/Eagle-x-tiny-textbook/', '--trainer.logger.init_args.name=Eagle-X-Training - Eagle-x-tiny-textbook (tctxlen=1024, deepspeed_stage_2)', '--trainer.logger.init_args.project=RWKV-v5-Finetune', '--trainer.strategy=deepspeed