forked from AutoGPTQ/AutoGPTQ
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Sym=False, new checkpoint_format = gptq_v2 #9
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix underflow cond reversed
…ading of v1 sym=False by default.
…eprecated by python. add packaging. depend
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@qwopqwop200 This is the rebase of your PR at AutoGPTQ#559 with some modifications. Should be ready soon after we verify quantize, inference, and add some tests.
Reason For PR:
sym=False
was practically unusable due to post-quantization avg_loss per layer/PPL vssym=True
. @qwopqwop200 fixed the bad/suboptimal math. Nowsym=False
will most likely match or decrease avg_loss/layer vssym=True
and improve post-quant PPL for many models.Core Changes:
main
: allow usablesym=False
quantization and usecheckpoint_format=gptq_v2
to store new checkpoint format. Compat runtime dynamic convert of allcheckpoint_format=gptq
togptq_v2
on load.Misc Changes not directly related to sym=False code:
nsamples
+ lowdamp=0.005
to speed up quants.layer #
,module name
,avg loss
,duration
) in dict/slice and return to user vaquant_log = model.quantize()
quantize(quant_log=saved_quant_log)
to generate auto-avg_loss diff in progress. Sample diff output in later messages of this discussion.TODO:
sym=False
testsgptq_v2
formatgptq
(v1) format usingsym=False
in this PRPASSING TESTS:
checkpoint_format=gptq
(v1)sym=False
consistently generate loweravg_loss
thansym=True
sym=True
in PR generates same math/avg_loss
for layers assym=True
inmain
test_serialization.py
test_quantization.py
test_shared_loading.py
test_awq_compatibility_generation.py
(note: awq cache generated bymain
is not compatible with pr. fixed with version file name addingv2
)test_q4.py
FAILING TESTS:
test_triton.py
(never got this to work on main)test_repacking.py
(never got this to work on main)Original PR AutoGPTQ#559 notes duplicated here for ref: