LLM-Genie: an integrated framework for LLM training

Supported Models

Model	Model size	Default module	Template
LLaMA	7B/13B/33B/65B	q_proj,v_proj	-
LLaMA-2	7B/13B/70B	q_proj,v_proj	llama2
Falcon	7B/40B	query_key_value	-
Qwen	7B	c_attn	chatml
ChatGLM2	6B	query_key_value	chatglm2

Default module is used for the --lora_target argument. Please use python src/train_bash.py -h to see all available options.
For the "base" models, the --template argument can be chosen from default, alpaca, vicuna etc. But make sure to use the corresponding template for the "chat" models.

Supported Training Approaches

Approach	Full-parameter	Partial-parameter	LoRA	QLoRA
Pre-Training	✅	✅	✅	✅
Supervised Fine-Tuning	✅	✅	✅	✅
Reward Modeling			✅	✅
PPO Training			✅	✅
DPO Training	✅		✅	✅

Use --quantization_bit 4/8 argument to enable QLoRA.

Provided Datasets

Please refer to data/README.md for details.

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Python 3.8+ and PyTorch 1.13.1+
🤗Transformers, Datasets, Accelerate, PEFT and TRL
sentencepiece and tiktoken
jieba, rouge-chinese and nltk (used at evaluation)
gradio and matplotlib (used in web_demo.py)
uvicorn, fastapi and sse-starlette (used in api_demo.py)

Getting Started

Data Preparation (optional)

Please refer to data/example_dataset for checking the details about the format of dataset files. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

Note: please update data/dataset_info.json to use your custom dataset. About the format of this file, please refer to data/README.md.

Dependence Installation (optional)

git lfs install
git clone https://github.com/byte-genie/llm-genie.git
conda create -n llm-genie python=3.10
conda activate llm-genie
cd llm-genie
pip install -r requirements.txt

If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.1.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl

Pre-Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage pt \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset wiki_demo \
    --template default \
    --finetuning_type lora \
    --output_dir path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Supervised Fine-Tuning

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --template default \
    --finetuning_type lora \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Reward Modeling

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage rm \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --template default \
    --finetuning_type lora \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --output_dir path_to_rm_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

DPO Training

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage dpo \
    --model_name_or_path path_to_your_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --template default \
    --finetuning_type lora \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --output_dir path_to_dpo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

Distributed Training

Use Huggingface Accelerate

accelerate config # configure the environment
accelerate launch src/train_bash.py # arguments (same as above)

Example config.yaml for training with DeepSpeed ZeRO-2

compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 4
  gradient_clipping: 0.5
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: false
  zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Use DeepSpeed

deepspeed --num_gpus 8 --master_port=9901 src/train_bash.py \
    --deepspeed ds_config.json \
    ... # arguments (same as above)

Example ds_config.json for training with DeepSpeed ZeRO-2

{
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "initial_scale_power": 16,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },  
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "overlap_comm": false,
    "contiguous_gradients": true
  }
}

Evaluation (BLEU and ROUGE_CHINESE)

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_model \
    --do_eval \
    --dataset alpaca_gpt4_en \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_eval_result \
    --per_device_eval_batch_size 8 \
    --max_samples 100 \
    --predict_with_generate

We recommend using --per_device_eval_batch_size=1 and --max_target_length 128 at 4/8-bit evaluation.

Predict

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_model \
    --do_predict \
    --dataset alpaca_gpt4_en \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_predict_result \
    --per_device_eval_batch_size 8 \
    --max_samples 100 \
    --predict_with_generate

API Demo

python src/api_demo.py \
    --model_name_or_path path_to_your_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Visit http://localhost:8000/docs for API documentation.

CLI Demo

python src/cli_demo.py \
    --model_name_or_path path_to_your_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Web Demo

python src/web_demo.py \
    --model_name_or_path path_to_your_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Export model

python src/export_model.py \
    --model_name_or_path path_to_your_model \
    --template default \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_export

License

This repository is licensed under the Apache-2.0 License.

Please follow the model licenses to use the corresponding model weights:

Citation

Cite this work as:

@Misc{llm-genie,
  title = {LLM-Genie: an operating system for LLM training},
  author = {ByteGenie},
  howpublished = {\url{https://github.com/byte-genie/llm-genie}},
  year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
configs		configs
data		data
docker		docker
shcripts		shcripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Genie: an integrated framework for LLM training

Supported Models

Supported Training Approaches

Provided Datasets

Requirement

Getting Started

Data Preparation (optional)

Dependence Installation (optional)

Pre-Training

Supervised Fine-Tuning

Reward Modeling

DPO Training

Distributed Training

Use Huggingface Accelerate

Use DeepSpeed

Evaluation (BLEU and ROUGE_CHINESE)

Predict

API Demo

CLI Demo

Web Demo

Export model

License

Citation

About

Releases

Packages

Languages

License

byte-genie/llm-genie

Folders and files

Latest commit

History

Repository files navigation

LLM-Genie: an integrated framework for LLM training

Supported Models

Supported Training Approaches

Provided Datasets

Requirement

Getting Started

Data Preparation (optional)

Dependence Installation (optional)

Pre-Training

Supervised Fine-Tuning

Reward Modeling

DPO Training

Distributed Training

Use Huggingface Accelerate

Use DeepSpeed

Evaluation (BLEU and ROUGE_CHINESE)

Predict

API Demo

CLI Demo

Web Demo

Export model

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages