Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Llama-2 with GQA #147

Merged
merged 8 commits into from
Jul 21, 2023
Merged

[Feature] Support Llama-2 with GQA #147

merged 8 commits into from
Jul 21, 2023

Conversation

lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Jul 19, 2023

Support Llama-2 70B which uses grouped-query attention.

Tested on 8 x A100 GPUs

TODO:

  • disable FMHA & set n_kv_heads for GQA models
  • compatibility fix for models in llama format

@lvhan028
Copy link
Collaborator

may resolve linting error and update 'News' section in REAME

@lzhangzz
Copy link
Collaborator Author

may resolve linting error and update 'News' section in REAME

@lvhan028 done

Copy link
Collaborator

@grimoire grimoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028
Copy link
Collaborator

May update llama-2 7b/13b/70b serving methods in docs/en/serving.md and docs/zh_cn/serving.md

@grimoire
Copy link
Collaborator

interlm-7b broken again.

@lvhan028 lvhan028 merged commit f07b697 into InternLM:main Jul 21, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants