Reproduction of the results in the paper #1

caseus-viridis · 2022-11-15T00:57:09Z

@efrantar
Following the instructions in README.md, baseline and RTN perplexities match exactly as listed in Tables 2-3 in the paper.
However, GPTQ perplexity does not.

Is this due to differences in the calibration sample? Or is the result in the Tables statistics out of multiple runs with different random seeds?
Could you share the command that reproduces the results in the paper?

Much appreciated!

The text was updated successfully, but these errors were encountered:

efrantar · 2022-11-15T13:12:29Z

Thank you for your interest!

How big are the differences you are seeing?

Assuming they are not that large, it is probably due to using slightly different GPUs / drivers. Note that GPU computations generally do not give exactly the same results, especially for different models / drivers, due to accumulations and rounding happening in slightly different orders. Since GPTQ accumulates the results of a very large number of GPU operations in multiple places, these very small differences add up and can lead to slightly different final results.

Comparing for example the results on some OPT/4-bit models for PTB between an A100, an A6000 and a 3090, I get:

GPU	125M	1.3B	13B
A100	36.96	18.16	12.58
A6000	37.25	18.39	12.59
3090	37.96	18.31	12.58

The A100 results are precisely the numbers reported in the paper, whereas the other GPUs produce slightly different results, especially at small models, with the gaps shrinking as the model size increases.

caseus-viridis · 2022-11-15T21:29:55Z

Thank you @efrantar!

The set of GPUs I have access to does not have overlap with yours though.

Here are my results comparable to yours above:

torch: tested on v1.12.1+cu114 (NOTE: newer than in README.md)
transformers: tested on v4.21.2
datasets: tested on v1.17.0

OPT/4-bit models for PTB:

GPU	125M	1.3B	13B
V100	37.91	18.29	12.56
T4	37.86	18.28	12.60

Further questions:

Are there empirical results that can bound the GPU-dependent difference?
Why is such difference smaller in larger models? Intuitively, the kind of errors you described above would accumulate over width and depth of the model (though it could be either squashed or amplified at nonlinearities), so larger models should have more error?
Could you provide an explanation of the underlying causes of the GPU-dependent difference, as well as practical advice on controlling these, in the appendix of the paper?

qwopqwop200 mentioned this issue Mar 12, 2023

Quantization produces non-deterministic weights qwopqwop200/GPTQ-for-LLaMa#27

Closed

qwopqwop200 mentioned this issue Mar 25, 2023

I can not reproduce 7b 6.09 Wiki2 PPL. qwopqwop200/GPTQ-for-LLaMa#78

Closed

efrantar closed this as completed Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduction of the results in the paper #1

Reproduction of the results in the paper #1

caseus-viridis commented Nov 15, 2022

efrantar commented Nov 15, 2022

caseus-viridis commented Nov 15, 2022

Reproduction of the results in the paper #1

Reproduction of the results in the paper #1

Comments

caseus-viridis commented Nov 15, 2022

efrantar commented Nov 15, 2022

caseus-viridis commented Nov 15, 2022