Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback model loading to match the code from the paper #23

Merged
merged 3 commits into from
Jul 5, 2023

Conversation

justheuristic
Copy link
Contributor

@justheuristic justheuristic commented Jul 5, 2023

This code fixes bad perplexity that was found with the following config

CUDA_VISIBLE_DEVICES=3 OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 python main.py decapoda-research/llama-7b-hf custom --custom_data_path data/red_pajama_n=1024.pth --nsamples 128 --wbits 3 --perchannel --percdamp 1.0 --groupsize 16 --qq_scale_bits 3 --qq_zero_bits 3 --qq_groupsize 64 --outlier_threshold=0.7 --permutation_order act_order

... and with all dependency versions set by requirements.txt

p.s. kind thanks to the authors (esp. @Godofnothing @Vahe1994 ) for helping me figure out what was causing the problem

@justheuristic
Copy link
Contributor Author

@Vahe1994 i'm re-running the main config now, results will be available in 40-ish minutes

Would you like me to run any additional tests to make sure this PR does not introduce more bugs?

@Vahe1994
Copy link
Owner

Vahe1994 commented Jul 5, 2023

@Vahe1994 i'm re-running the main config now, results will be available in 40-ish minutes

Would you like me to run any additional tests to make sure this PR does not introduce more bugs?

I think your experiments are sufficient.

Copy link
Owner

@Vahe1994 Vahe1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and your provided experiments,all seems good. Tank you for bug fix!

@Vahe1994 Vahe1994 merged commit e75c55b into Vahe1994:main Jul 5, 2023
@poedator
Copy link
Collaborator

poedator commented Sep 25, 2023

I tried to reproduce the problem fixed here. It appeared that it was coming from omission of this code:

    if dtype == "auto":
        dtype = AutoConfig.from_pretrained(model_path).torch_dtype or "auto"  # force transformers 4.29.2 to follow the same rules as 4.30.x

which was still necessary to keep while we still tested code using transformers==4.29.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants