Skip to content

IST-DASLab/CloverLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CloverLM

Description

This is the PyTorch codebase used for the training and evaluation of our model CloverLM; please see the full report.

The training harness is a heavily-modified version of ant, with NVFP4 kernels from Quartet-II. The evaluation code is by Matin Ansaripour (@matinansaripour) and Andrei Panferov (@BlackSamorez).

Getting started

Training

  1. Clone the repo:
git clone https://github.com/IST-DASLab/CloverLM.git
  1. Use uv to install dependencies from pyproject.toml
uv sync
  1. Install FlashAttention

  2. Download pretokenized ClimbMix (305B tokens/610GB)

  3. Train CloverLM

OMP_NUM_THREADS=1 torchrun --standalone --nproc_per_node=8 ./src/train.py  4b-28h-29d-cm310b-v3 --opt adam --micro_batch_size 32 --train_batches 590000  --k_input 3e-3 --momentum 0.9 --beta2 0.95 --eps 1e-6 --quartet true --info false --extra_freq 200 --backend flash2 --dataset=climbmix10m --num_blocks=29 --heads=28 --ratio=4 --checkpoint_freq 20000 --dataset_seed=654356 --dataset_path=climbmix --wandb_kwargs='{"project": "expedition44"}' --warmup 2000 --cooldown 20000 --model_stats_freq=5000

Evaluation

See ./src/evals/README.md