Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure KV cache is not returned as output tensor during decode phase for Falcon #208

Merged
merged 1 commit into from
May 18, 2024

Conversation

schoi-habana
Copy link

This PR follows the change for llama in https://github.com/HabanaAI/optimum-habana-fork/pull/154/files
This increased the maximum batch size of BF16 falcon180b inference from 250 to 316

updated command (remove --reuse_cache)
python ../gaudi_spawn.py
--use_deepspeed --world_size 8 run_generation.py
--model_name_or_path /root/data/falcon/falcon-180b/snapshots/d2ea5531862d4fe907280234990e6380d2befd97/
--use_hpu_graphs
--use_kv_cache
--bf16
--batch_size 316
--max_new_tokens 128
--max_input_tokens 128
--limit_hpu_graphs
--n_iterations 3
--trim_logits
--bucket_internal
--bucket_size 128
--prompt "I've always managed to dodge the bullet and avoid the addictive pull of Pokemon. Leave it to a button-mashing brawler with plastic figurine accessories to finally get me hooked. At first glance, Pokemon Rumble U isn't much to look at. With its simplistic controls and repetitive gameplay, you might feel inclined to dismiss it as yet another cash-in of the popular Nintendo franchise. But despite its faults, there's actually much more to Rumble U than meets the eye, making this a satisfying and"

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@mandy-li mandy-li merged commit 3bd22e2 into habana-main May 18, 2024
@astachowiczhabana
Copy link

huggingface#993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants