New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support setting inject_fused_attention and inject_fused_mlp to False #134

Merged

PanQiWei merged 2 commits into AutoGPTQ:main from TheBloke:TB_benchmark

Jun 5, 2023

Contributor

TheBloke commented Jun 3, 2023 •

edited

Loading

examples/benchmark/generation_speed.py: little fix that allows setting inject_fused_attention and inject_fused_mlp to False during benchmarking


          Support setting inject_fused_attention and inject_fused_mlp to False

TheBloke mentioned this pull request

[BUG] Recent changes increase VRAM consumption #105

Open


          Default inject_fused_attention and mlp to True, matching defaults

edb13d4

PanQiWei approved these changes

View reviewed changes

PanQiWei merged commit bf521cb into AutoGPTQ:main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment