Add flash-attention patch for falcon-7b #3580

andreaskoepf · 2023-07-17T20:08:15Z

Enable the use_flash_attention configuration flag for Falcon models. When use_flash_attention is set to true the FalconAttention.forwad() method is replaced with a variant that uses Tri Dao's flash_attention instead of pytorch's scaled_dot_product_attention function.

At the moment the patch works only for falcon-7b but technically it will also work for falcon-40b with the right configuration. The falcon model situation is currently a bit messy: The Falcon model was recently added to Huggingface transformers (see PR transformers#24523) but the falcon models on the hugginface hub use still the code which is shipped together with the weights (a PR to change this was reverted). Falcon-7b and 40b use both slightly different code (which was unified in the HF transformers impl and can there be controlled via a configuration member called new_decoder_architecture see configuration_falcon.py#L65-L67). The HF Falcon impl uses different names in the configuration class, e.g. compare new configuration_falcon.py and old configuration_RW.py

HF Falcon implementation compatible model configurations can be found here:
7B: config.json
40B: config.json

jordiclive

LGTM, minor clarification

model/model_training/models/__init__.py

model/model_training/models/patching_utils.py

model/pyproject.toml

model/model_training/models/patching_falcon.py

jordiclive

LGTM

add flash-attention patch for falcon-7b

ce6e13b

andreaskoepf requested review from theblackcat102, sanagno, dvruette and yk as code owners July 17, 2023 20:08

andreaskoepf added 4 commits July 17, 2023 20:14

revert changes to rope_scaling_test config

fa6c4f3

uncomment trlx lines

b0a3dff

add falcon deepspeed configurations

bf4fe04

remove old falcon zero config file

04fa9a2

jordiclive reviewed Jul 19, 2023

View reviewed changes

model/model_training/models/__init__.py Show resolved Hide resolved

model/model_training/models/patching_utils.py Show resolved Hide resolved

model/pyproject.toml Show resolved Hide resolved

model/model_training/models/patching_falcon.py Show resolved Hide resolved

jordiclive approved these changes Jul 19, 2023

View reviewed changes

andreaskoepf merged commit 1e6e569 into main Jul 19, 2023
1 check passed

andreaskoepf deleted the falcon7b_flash_attn_patch branch July 19, 2023 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flash-attention patch for falcon-7b #3580

Add flash-attention patch for falcon-7b #3580

andreaskoepf commented Jul 17, 2023 •

edited

jordiclive left a comment

jordiclive left a comment

Add flash-attention patch for falcon-7b #3580

Add flash-attention patch for falcon-7b #3580

Conversation

andreaskoepf commented Jul 17, 2023 • edited

jordiclive left a comment

Choose a reason for hiding this comment

jordiclive left a comment

Choose a reason for hiding this comment

andreaskoepf commented Jul 17, 2023 •

edited