Allow user to use custom calibration data for quantization #27

boehm-e · 2023-09-05T14:53:05Z

Hi,

If you have some time to review these changes,
It should allow to use custom dataset (List[str]) for calibration part.

Thx :)

casper-hansen · 2023-09-05T14:58:07Z

Thanks for this PR, TheBloke also asked for this. Will review it later. Before this is merged, I would also like to create two examples of how to use the functionality with either a string pointing to a huggingface dataset or a list of preprocessed data.

mhenrichsen · 2023-09-06T10:49:58Z

awq/utils/calib_data.py

+        if data == "pileval":
+            dataset = load_dataset("mit-han-lab/pile-val-backup", split="validation")
+        else:
+            raise NotImplementedError


Should work. Might want to find a way define the split instead of defaulting to train, though.

Suggested change

raise NotImplementedError

dataset = load_dataset(data, split="train")

The defaulting to train might be solved by adding a kwarg with that defaults to validation which could be used in L9 and L11.

I agree that we need to not raise an exception here. Instead, we should try to load the dataset by the actual string that was passed and load the split by another variable passed. We could default to the validation split as this should be a small enough dataset for calibration yet scientifically sound enough since we would use the test split to measure perplexity.

aadnesd · 2024-02-12T17:57:32Z

What's the benefit of using custom data?

allow user to use custom calibration data for quantization

1712ce2

mhenrichsen reviewed Sep 6, 2023

View reviewed changes

casper-hansen mentioned this pull request Sep 6, 2023

📌 AutoAWQ Roadmap #32

Closed

30 tasks

casper-hansen added 7 commits September 14, 2023 13:51

Allow huggingface datasets

faedf51

Merge branch 'main' into pr/27

077f39a

Pass split and text_column arguments to calib_data function

84e8274

Custom data example

69d31ed

Actually pass args to calib function

a9cef34

Move shuffling up for datasets loaded with load_dataset

d832a21

Add wikitext example

b9ab9a6

casper-hansen merged commit 2a3e0fa into casper-hansen:main Sep 15, 2023

Sakusakumura mentioned this pull request Nov 14, 2023

Calibration dataset sample size and its sequence length are always fixed. Why? #191

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow user to use custom calibration data for quantization #27

Allow user to use custom calibration data for quantization #27

boehm-e commented Sep 5, 2023

casper-hansen commented Sep 5, 2023

mhenrichsen Sep 6, 2023

mhenrichsen Sep 6, 2023

casper-hansen Sep 7, 2023

aadnesd commented Feb 12, 2024

	raise NotImplementedError
	dataset = load_dataset(data, split="train")

Allow user to use custom calibration data for quantization #27

Allow user to use custom calibration data for quantization #27

Conversation

boehm-e commented Sep 5, 2023

casper-hansen commented Sep 5, 2023

mhenrichsen Sep 6, 2023

Choose a reason for hiding this comment

mhenrichsen Sep 6, 2023

Choose a reason for hiding this comment

casper-hansen Sep 7, 2023

Choose a reason for hiding this comment

aadnesd commented Feb 12, 2024