Reproduce perplexity #49

deciding · 2024-03-11T11:21:34Z

In the readme the ppl is

Llama-2-7b | 1x16 | 5.92 | 2.4

In the paper it is:

Llama-2-7b AQLM 2.29 6.29 8.11

When I run locally using the same command as in the readme

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py $MODEL_PATH $DATASET_PATH --nsamples=1024 \
 --num_codebooks=1 --nbits_per_codebook=16 --in_group_size=8 \
 --relative_mse_tolerance=0.01 --finetune_relative_mse_tolerance=0.001 \
 --finetune_batch_size=32 --local_batch_size=1 --offload_activations \
 --wandb --save $SAVE_PATH

it gives me

Llama-2-7b AQLM 2.29 6.45 8.39

Can I know why there is such a mismatch? Thanks for any clarifications.

The text was updated successfully, but these errors were encountered:

Vahe1994 · 2024-03-12T22:12:40Z

Hi!
There are 2 different factors contributing to the mismatch.

I believe the difference between your results and those in the paper are mainly due to difference in hyperparameters. The reported result in the paper were achieved using the following settings: --nsamples=1024 --num_codebooks=1 --nbits_per_codebook=16 --in_group_size=8 --relative_mse_tolerance=0.01 --finetune_lr=1e-5 --finetune_adam_beta1=0.90 --finetune_adam_beta2=0.95 --finetune_keep_best --finetune_relative_mse_tolerance=0.001 --finetune_batch_size=32 --local_batch_size=4 --save save_path --wandb. Additionally, results may slightly vary from run to run due to randomness. For more details, please refer to Table 8 in the paper's appendix.
The result of 5.92 from ReadMe/HF was achieved through full fine-tuning on top of the obtained quantization please see Appendix A and Global finetuning? #30 . The code for fine-tuning can be found in Added global funetuning & validation loss early stopping & gemma support #50.

Hope this helps. If you have any additional questions, please feel free to ask.

deciding · 2024-03-13T11:19:24Z

Very clear, thx so much. I will try to reproduce it.

Godofnothing · 2024-03-14T13:10:55Z

@deciding The current Llama-2-7b checkpoint with wikitext2 ppl=5.91 was obtained as follows.

Quantization with blockwise finetuning yields 6.22 ppl. Compared to the version in the main branch it has early stopping on a validation set. The run script (with main.py) used the following hyperparameters.

python main.py \
    $MODEL_PATH \
    $DATASET_PATH \
    --nsamples=2048 \
    --val_size=256 \
    --model_seqlen=4096 \
    --num_codebooks=1 \
    --nbits_per_codebook=16 \
    --in_group_size=8 \
    --out_group_size=1 \
    --relative_mse_tolerance=0.01 \
    --finetune_lr=1e-4 \
    --finetune_adam_beta1=0.90 \
    --finetune_adam_beta2=0.999 \
    --finetune_keep_best \
    --finetune_batch_size=8 \
    --finetune_max_epochs=20 \
    --finetune_early_stop=3 \
    --local_batch_size=4 \
    --offload_activations

The final model was obtained via end-to-end finetuning (script finetune.py) from the model above with the following hyperparameters:

python finetune.py \
  --base_model $MODEL_PATH \
  --quant_model $INPUT_PATH \
  --dataset $DATASET_PATH \
  --nsamples=1024 \
  --val_size=256 \
  --lr=1e-5 \
  --adam_beta1=0.90 \
  --adam_beta2=0.999 \
  --epochs=5 \
  --early_stop=3 \
  --batch_size=8 \
  --microbatch_size=4 \
  \
  --temperature=1.0 \
  \
  --save $DATA_PATH \
  \
  --gradient_checkpointing

deciding · 2024-03-14T14:12:36Z

@Godofnothing really appreciate the tuning details! Besides, may I know the number of a100 GPU hours required for this finetune script?

Godofnothing · 2024-03-17T16:38:17Z

@deciding I do not remember exact numbers, I think the first part took 1 day on 2 A 100 and the second one 6 hours on single A100

deciding · 2024-03-18T08:26:33Z

@Godofnothing Cool. Thx a lot for the information 👍

deciding closed this as completed Mar 13, 2024

Vahe1994 mentioned this issue Mar 24, 2024

How long for the quantizing a 70b model? I had ran for 2days #36

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce perplexity #49

Reproduce perplexity #49

deciding commented Mar 11, 2024

Vahe1994 commented Mar 12, 2024 •

edited

Loading

deciding commented Mar 13, 2024

Godofnothing commented Mar 14, 2024

deciding commented Mar 14, 2024

Godofnothing commented Mar 17, 2024

deciding commented Mar 18, 2024

Reproduce perplexity #49

Reproduce perplexity #49

Comments

deciding commented Mar 11, 2024

Vahe1994 commented Mar 12, 2024 • edited Loading

deciding commented Mar 13, 2024

Godofnothing commented Mar 14, 2024

deciding commented Mar 14, 2024

Godofnothing commented Mar 17, 2024

deciding commented Mar 18, 2024

Vahe1994 commented Mar 12, 2024 •

edited

Loading