[CODE] optimize BloomAttention, remove unnecessary host-to-device transfers #1

justheuristic · 2022-06-12T03:34:46Z

TL;DR the current code for BloomAttention is surprisingly inefficient, with obvious problems that should be easy to fix

The current code for BloomAttention is surprisingly inefficient:

it does batch-repetition on CPU before copying larger tensor to GPU - https://github.com/huggingface/transformers/blob/ca2a55e9dfb245527b5e1c954fec6ffbb7aef07b/src/transformers/models/bloom/modeling_bloom.py#L358
it checks for "0 in mask" on each layer, triggering needless CUDA synchronization https://github.com/huggingface/transformers/blob/ca2a55e9dfb245527b5e1c954fec6ffbb7aef07b/src/transformers/models/bloom/modeling_bloom.py#L361
it checks runs a for-loop with synchronization to shift alibi weights inside each layer in each forward pass https://github.com/huggingface/transformers/blob/ca2a55e9dfb245527b5e1c954fec6ffbb7aef07b/src/transformers/models/bloom/modeling_bloom.py#L162-L167
it computes alibi biases on CPU in python - https://github.com/huggingface/transformers/blob/ca2a55e9dfb245527b5e1c954fec6ffbb7aef07b/src/transformers/models/bloom/modeling_bloom.py#L94

It also misses a ton of opportunities for in-place ops for memory savings, but that's the least of its problems

justheuristic · 2022-06-19T17:48:58Z

Updates on alibi.py: https://gist.github.com/justheuristic/9751e02a2a5604a98a4fe0b6b688e808
Transformers PR developed by @younesbelkada : huggingface/transformers#17759

justheuristic · 2022-06-20T14:00:12Z

I've copy-pasted changes from @younesbelkada 's PR above.
As of right now ( aaaf0c2 ) , all four issues are solved.

justheuristic mentioned this issue Jun 12, 2022

[CODE] port HF bloom to hivemind #2

Closed

justheuristic changed the title ~~optimize BloomAttention~~ [CODE] optimize BloomAttention Jun 12, 2022

justheuristic added bug Something isn't working development labels Jun 12, 2022

justheuristic changed the title ~~[CODE] optimize BloomAttention~~ [CODE] optimize BloomAttention, remove unnecessary host-to-device transfers Jun 12, 2022

justheuristic closed this as completed Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CODE] optimize BloomAttention, remove unnecessary host-to-device transfers #1

[CODE] optimize BloomAttention, remove unnecessary host-to-device transfers #1

justheuristic commented Jun 12, 2022 •

edited

justheuristic commented Jun 19, 2022 •

edited

justheuristic commented Jun 20, 2022

[CODE] optimize BloomAttention, remove unnecessary host-to-device transfers #1

[CODE] optimize BloomAttention, remove unnecessary host-to-device transfers #1

Comments

justheuristic commented Jun 12, 2022 • edited

justheuristic commented Jun 19, 2022 • edited

justheuristic commented Jun 20, 2022

justheuristic commented Jun 12, 2022 •

edited

justheuristic commented Jun 19, 2022 •

edited