Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate FSDPA in GaudiLlamaAttention #129

Merged
merged 1 commit into from
Mar 24, 2024

Conversation

dudilester
Copy link

  • Done to allow quantization using HQT

  • Added use_flash_attention and flash_attention_recompute to run_lm_eval

* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
@MrGeva MrGeva dismissed ulivne’s stale review March 24, 2024 16:25

issues were addressed.

@MrGeva MrGeva merged commit b7e74c1 into habana-main Mar 24, 2024
dudilester added a commit that referenced this pull request Mar 31, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 5, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 5, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 19, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 22, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
astachowiczhabana pushed a commit that referenced this pull request Apr 24, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
dudilester added a commit that referenced this pull request May 7, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
dudilester added a commit that referenced this pull request May 8, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
dudilester added a commit that referenced this pull request May 13, 2024
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
@astachowiczhabana
Copy link

huggingface#972

@dudilester
Copy link
Author

upstream URL
huggingface#976

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants