# Deepspeed 2 & Validation
This model being trained has the same settings as raven 1B5 model.
- Layer count: 24
- Embed size: 2048

The goal is to validate the trainer across deepspeed 2 & 3 - with and without offload. All other training params remain constant.

Note, you will need a dual+ GPU setup, that is capable of handling deepspeed 2

> This project assumes you have the rwkv-infctx conda env setup, and you are executing in that environment - see the main README.md for the conda env setup steps
>
> And that you have completed the `baseline-setup.ipynb`

## Configure and apply your preferred settings

Adjust your desired deepspeed settings, and gpu device count.

Enable/Disable WANDB here as well ( Enabled by default, as we need the loss curve for this experiment )

( note you will need to rerun this cell, if you restart your env )

In [1]:
GPU_DEVICES="auto"
ENABLE_WANDB=True
WANDB_PREFIX="infctx-deepspeed"

print("ENABLE_WANDB:", ENABLE_WANDB)
print("GPU_DEVICES:", GPU_DEVICES)

if ENABLE_WANDB:
    WANDB_MODE="online"
else:
    WANDB_MODE="disabled"

DEEPSPEED_STRAT: deepspeed_stage_2_offload
ENABLE_WANDB: True
GPU_DEVICES: auto


# Deepspeed 2
Perform a full 1 epoch training run of training context size = 1024. With deepspeed 2

In [None]:
!cd ../../RWKV-v4neo && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 new_train.py fit \
        -c ../notebook/trainer-validation/config/baseline-1024.yaml \
        --trainer.logger.init_args.name="{WANDB_PREFIX} (deepspeed_stage_2, train-ctx=1024, data-ctx=1024)" \
        --trainer.strategy="deepspeed_stage_2" \
        --trainer.devices="{GPU_DEVICES}"

# Deepspeed 2 + Offload
Perform a full 1 epoch training run of training context size = 1024. With deepspeed 2

In [None]:
!cd ../../RWKV-v4neo && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 new_train.py fit \
        -c ../notebook/trainer-validation/config/baseline-1024.yaml \
        --trainer.logger.init_args.name="{WANDB_PREFIX} (deepspeed_stage_2_offload, train-ctx=1024, data-ctx=1024)" \
        --trainer.strategy="deepspeed_stage_2_offload" \
        --trainer.devices="{GPU_DEVICES}"

# Deepspeed 3
Perform a full 1 epoch training run of training context size = 1024. With deepspeed 3

In [None]:
!cd ../../RWKV-v4neo && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 new_train.py fit \
        -c ../notebook/trainer-validation/config/baseline-1024.yaml \
        --trainer.logger.init_args.name="{WANDB_PREFIX} (deepspeed_stage_3, train-ctx=1024, data-ctx=1024)" \
        --trainer.strategy="deepspeed_stage_3" \
        --trainer.devices="{GPU_DEVICES}"

# Deepspeed 3 + offload
Perform a full 1 epoch training run of training context size = 1024. With deepspeed 3 + offload

In [None]:
!cd ../../RWKV-v4neo && \
    export WANDB_MODE="{WANDB_MODE}" && \
    python3 new_train.py fit \
        -c ../notebook/trainer-validation/config/baseline-1024.yaml \
        --trainer.logger.init_args.name="{WANDB_PREFIX} (deepspeed_stage_3_offload, train-ctx=1024, data-ctx=1024)" \
        --trainer.strategy="deepspeed_stage_3" \
        --trainer.devices="{GPU_DEVICES}"