Add support for GPTNeoX models #32

naubull2 · 2023-10-03T09:18:01Z

Adds Long-LoRA support for GPTNeoX models.

Tested on a colab A100 40GB x 1 instance, with the scripts

fine-tune.py
supervised-fine-tune.py

Using a sample GPTNeoX model

EleutherAI/pythia-1.4b-deduped

As there was no specific guide on how to contribute, I've tried to make as little modification as possible to the original structure.

Added GPTNeoX support by adding a module gptneox_attn_replace just as the original llama_attn_replace.

How to apply

Application is showcased in the tested scripts fine-tune.py, supervised-fine-tune.py

Add

model_type argument to switch back and forth between the llama and gpt-neox configuration.

import

from gptneox_attn_replace import replace_gpt_neox_attn

Appropriate changes needed for low rank training

 if training_args.low_rank_training:
      if model_args.model_type == "gpt-neox":
          # added `dense` to match with llama as the vanilla peft config would only target 'query_key_value'
          targets = ["query_key_value", "dense"]
      else:
          targets=["q_proj", "k_proj", "v_proj", "o_proj"],

      config = LoraConfig(
          r=8,
          lora_alpha=16,
          target_modules=targets,
          lora_dropout=0,
          bias="none",
          task_type="CAUSAL_LM",
      )
      model = get_peft_model(model, config)

Notes on flash-attention + GPTNeoX

As the huggingface implementation won't support flash attention off the shelf, I modified some parts from modeling_gpt_neox.py, for the use_flash_attn=True case.
- transformers == 4.33.3 as of writing.
- Mainly the part where cached cos/sin rotary embedding is in fp32 where flash-attn requires tensors to be in fp16/bf16 only.
Some how the original flash-attention2 interface flash_attn_varlen_func would cause a runtime error of "in-place operation" flash-attention code
- So I've opted for flash_attn_varlen_qkvpacked_func which worked fine.
  - In changing the dimensions to fit in, I referenced codes by Philipp Schmid🤗 ref

+ There's still bugs in the attention dimensions mismatch

+ group batch attention is skipped to avoid this problem for now

+ flash attention only supports in fp16/bf16

…tion + cos/sin cache tensor is not trained parameter, so it's not autocast along with other model parameters through `torch_dtype`.

+ Works fine without the torch.cuda autocast context, so rollback.

yukang2017 · 2023-10-03T12:10:27Z

Hi,

Many thanks for your contribution. These commits are really helpful for this project. I have merged them in to the main branch!

Regards,
Yukang Chen

Add support for GPTNeoX models

naubull2 added 17 commits September 27, 2023 23:40

[add] gpt-neox support

7baa4f7

[update] readme

41977df

[fix] some of the bugs preventing fine-tune run

9c9d0a2

+ There's still bugs in the attention dimensions mismatch

[fix] dimesion discrepancy between attention mask and the query length

a5111ef

+ group batch attention is skipped to avoid this problem for now

[fix] SFT to match the same mods in finetune.py

5862050

Merge branch 'forked-only'

0cf0dfd

[add] parallel group attention then reshape back to original form

1532c4b

[fix] non-contiguous dimensions changing view issue

6fdffbb

[add] attention mask to align with the grouped batching

fe97f86

[add] torch autocast for flash attention safety

9e30a15

+ flash attention only supports in fp16/bf16

[fix] HF built-in rotary embedding is not compatible with flash-atten…

3f9c47c

…tion + cos/sin cache tensor is not trained parameter, so it's not autocast along with other model parameters through `torch_dtype`.

[add] missing local reference for rotate_half

b21e949

[rollback] torch.cuda autocast causes half precision error

b224273

+ Works fine without the torch.cuda autocast context, so rollback.

[fix] flash attention causing in-place operation runtime errors

9123e42

[fix] mixed use of tabs and spaces

7203de2

[change] readme back to where it came from the original repo

8a11ef8

[remove] unused comments

02e4c1c

yukang2017 merged commit 04c8db1 into dvlab-research:main Oct 3, 2023

gianlucamacri pushed a commit to gianlucamacri/LongLoRA that referenced this pull request Oct 31, 2023

Merge pull request dvlab-research#32 from naubull2/main

203a58e

Add support for GPTNeoX models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GPTNeoX models #32

Add support for GPTNeoX models #32

naubull2 commented Oct 3, 2023

yukang2017 commented Oct 3, 2023

Add support for GPTNeoX models #32

Add support for GPTNeoX models #32

Conversation

naubull2 commented Oct 3, 2023

How to apply

Notes on flash-attention + GPTNeoX

yukang2017 commented Oct 3, 2023