QLoRA #9340

cuichenx · 2024-05-29T18:01:41Z

What does this PR do ?

Add QLoRA

Current Results

Convergence

Llama 7b, Squad, packed seq 2k, LoRA on all layers
gbs4, 2e-4, 2k iterations
exact_match 89.199 f1 94.791 rougeL 94.400 total 6064.000

Normal LoRA (qkv only)
exact_match 88.819 f1 94.415 rougeL 94.084 total 6064.000

Memory Usage

Llama 13b, Alpaca, seq len 512, LoRA on all layers, gbs256
Peak memory reserved (GB):

Batch Size	LoRA	QLoRA	% Reduction
2	37.5	18.3	51%
4	46.7	27.4	41%
8	65.4	46.4	29%

Performance

Llama 13b, Alpaca, seq len 512, LoRA on all layers, gbs256
Global batch step timing (s):

Batch Size	LoRA	QLoRA	% Increase
2	2.7	6.7	148%
4	2.3	4.4	91%
8	2.2	3.2	46%

Collection: NLP

Changelog

Add specific line by line info of high level changes in this PR.

Usage

Set this flag in megatron_gpt_finetuning_config.yaml to enable QLoRA

model.peft.peft_scheme=qlora

To get the maximum memory savings, also set these flags

++model.dist_ckpt_load_on_device=False         # load checkpoint on CPU
++model.use_cpu_initialization=True            # initialize model on CPU
++model.peft.lora_tuning.target_modules=[all]  # quantize all linear layers in the model

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Chen Cui <chcui@nvidia.com>

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

Signed-off-by: Chen Cui <chcui@nvidia.com>

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

Signed-off-by: Chen Cui <chcui@nvidia.com>

arendu

LGTM! awesome PR!

Signed-off-by: Chen Cui <chcui@nvidia.com>

* temp qlora implementation Signed-off-by: Chen Cui <chcui@nvidia.com> * swap nf4 after model instantiation Signed-off-by: Chen Cui <chcui@nvidia.com> * load model on cpu and then quantize on gpu Signed-off-by: Chen Cui <chcui@nvidia.com> * model init on cpu to prevent memory spike Signed-off-by: Chen Cui <chcui@nvidia.com> * account for TE versions Signed-off-by: Chen Cui <chcui@nvidia.com> * guard use_cpu_initialization Signed-off-by: Chen Cui <chcui@nvidia.com> * fix layernorm autograd Function Signed-off-by: Chen Cui <chcui@nvidia.com> * add unit tests Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * move cpu init to library code Signed-off-by: Chen Cui <chcui@nvidia.com> * copyright header and nf4 quantize on GPU Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fix cpu init Signed-off-by: Chen Cui <chcui@nvidia.com> * comments Signed-off-by: Chen Cui <chcui@nvidia.com> * fix test Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* temp qlora implementation Signed-off-by: Chen Cui <chcui@nvidia.com> * swap nf4 after model instantiation Signed-off-by: Chen Cui <chcui@nvidia.com> * load model on cpu and then quantize on gpu Signed-off-by: Chen Cui <chcui@nvidia.com> * model init on cpu to prevent memory spike Signed-off-by: Chen Cui <chcui@nvidia.com> * account for TE versions Signed-off-by: Chen Cui <chcui@nvidia.com> * guard use_cpu_initialization Signed-off-by: Chen Cui <chcui@nvidia.com> * fix layernorm autograd Function Signed-off-by: Chen Cui <chcui@nvidia.com> * add unit tests Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * move cpu init to library code Signed-off-by: Chen Cui <chcui@nvidia.com> * copyright header and nf4 quantize on GPU Signed-off-by: Chen Cui <chcui@nvidia.com> * Apply isort and black reformatting Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> * fix cpu init Signed-off-by: Chen Cui <chcui@nvidia.com> * comments Signed-off-by: Chen Cui <chcui@nvidia.com> * fix test Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Signed-off-by: cuichenx <cuichenx@users.noreply.github.com> Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>

cuichenx added 9 commits May 1, 2024 11:18

temp qlora implementation

a496abf

Signed-off-by: Chen Cui <chcui@nvidia.com>

Merge branch 'main' into chcui/qlora

a1e95b3

swap nf4 after model instantiation

3dc6f16

Signed-off-by: Chen Cui <chcui@nvidia.com>

load model on cpu and then quantize on gpu

ff7e7ef

Signed-off-by: Chen Cui <chcui@nvidia.com>

model init on cpu to prevent memory spike

8491ec1

Signed-off-by: Chen Cui <chcui@nvidia.com>

account for TE versions

b831498

Signed-off-by: Chen Cui <chcui@nvidia.com>

guard use_cpu_initialization

29ab0e9

Signed-off-by: Chen Cui <chcui@nvidia.com>

fix layernorm autograd Function

004e2b2

Signed-off-by: Chen Cui <chcui@nvidia.com>

add unit tests

922f2c4

Signed-off-by: Chen Cui <chcui@nvidia.com>

github-actions bot added the NLP label May 29, 2024

cuichenx and others added 5 commits May 29, 2024 18:02

Apply isort and black reformatting

8953e7f

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

move cpu init to library code

73da392

Signed-off-by: Chen Cui <chcui@nvidia.com>

copyright header and nf4 quantize on GPU

655dc99

Signed-off-by: Chen Cui <chcui@nvidia.com>

Apply isort and black reformatting

82536c5

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

fix cpu init

9f8f71f

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx marked this pull request as ready for review June 3, 2024 22:21

Merge branch 'main' into chcui/qlora

17e4f14

cuichenx requested review from ertkonuk and arendu June 3, 2024 22:29

comments

d1dc034

Signed-off-by: Chen Cui <chcui@nvidia.com>

arendu previously approved these changes Jun 6, 2024

View reviewed changes

cuichenx added the Run CICD label Jun 7, 2024

Merge branch 'main' into chcui/qlora

dc4ce5d

cuichenx added Run CICD and removed Run CICD labels Jun 7, 2024

fix test

ac84635

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx dismissed arendu’s stale review via ac84635 June 7, 2024 01:04

cuichenx added Run CICD and removed Run CICD labels Jun 7, 2024

cuichenx requested a review from arendu June 7, 2024 16:10

arendu approved these changes Jun 7, 2024

View reviewed changes

arendu merged commit ceffb49 into main Jun 7, 2024
135 checks passed

arendu deleted the chcui/qlora branch June 7, 2024 16:50

ko3n1g mentioned this pull request Jul 18, 2024

Release 2.0.0rc1 #9786

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLoRA #9340

QLoRA #9340

cuichenx commented May 29, 2024 •

edited

Loading

arendu left a comment

QLoRA #9340

QLoRA #9340

Conversation

cuichenx commented May 29, 2024 • edited Loading

What does this PR do ?

Current Results

Convergence

Memory Usage

Performance

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

arendu left a comment

Choose a reason for hiding this comment

cuichenx commented May 29, 2024 •

edited

Loading