Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently #58
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Today I found a bug in the quantisation code caused by #43 . in
auto_gptq/modeling/_utils.py
it usesgroupsize
but code calling it passesgroup_size
.I fixed that, and then thought I should update the whole repo to be consistent. I think it would be great if one name was used consistently throughout the repo. This will help avoid bugs and confusion. And 'group_size' seems to be the best name to use, as that's what's used in
quantize_config.json
.With this PR merged into faster-llama, all
*.py
files use 'group_size' . There are no longer any references to 'groupsize'.Before:
After:
Note: I only changed .py files, not any CUDA kernels. I don't know CUDA code and don't want to risk touching anything which I don't understand, in case there any implications I don't know about.
I have done tests of Triton and CUDA inference, and Triton and CUDA quantisation, and all seems OK.
Hope this change is OK with you guys, @PanQiWei @qwopqwop200 ?