Enable weights-only and activations-only post-training quantization for conv/linear modules #439

guyjacob · 2019-12-05T12:18:32Z

Same functionality as #356, but decided to take a different approach, which reuses the existing PTQ wrapper modules. This reduces code duplication and keeps the quantized model "similar" (in terms of modules used) when activations aren't quantized vs. when they are.

Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in which case it'll act as a simple pass-through during forward.
In RangeLinearQuantParamLayerWrapper, if bits_activations is None and num_bits_params > 0, perform quant and de-quant of the parameters instead of just quant.
Enable activations only quantization for conv/linear modules. When PostTrainLinearQuantizer detects # bits != None for activations and # bits == None for weights, a fake-quantization wrapper will be used.
Allow passing 0 in the --qe-bits-acts and --qe-bits-wts command line arguments to invoke weights-only / activations-only quantization, respectively.
Minor refactoring for clarity in PostTrainLinearQuantizer's internal replace_* functions

* Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in which case it'll act as a simple pass-through during forward * In RangeLinearQuantParamLayerWrapper, if bits_activations is None and num_bits_params > 0, Perform quant and de-quant of the parameters instead of just quant.

* Enable activations only quantization for conv/linear modules. When PostTrainLinearQuantizer detects # bits != None for activations and # bits == None for weights, a fake-quantization wrapper will be used. * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command line arguments to invoke weights-only / activations-only quantization respectively. * Minor refactoring for clarity in PostTrainLinearQuantizer replace_* functions

* Weights-only PTQ: * Allow RangeLinearQuantWrapper to accept num_bits_acts = None, in which case it'll act as a simple pass-through during forward * In RangeLinearQuantParamLayerWrapper, if bits_activations is None and num_bits_params > 0, Perform quant and de-quant of the parameters instead of just quant. * Activations-only PTQ: * Enable activations only quantization for conv/linear modules. When PostTrainLinearQuantizer detects # bits != None for activations and # bits == None for weights, a fake-quantization wrapper will be used. * Allow passing 0 in the `--qe-bits-acts` and `--qe-bits-wts` command line arguments to invoke weights/activations-only quantization, respectively. * Minor refactoring for clarity in PostTrainLinearQuantizer's replace_* functions

guyjacob requested a review from levzlotnik December 5, 2019 12:18

guyjacob changed the title ~~Enable weights-only post-training quantization~~ Enable weights-only and activations-only post-training quantization for conv/linear modules Dec 8, 2019

levzlotnik approved these changes Dec 8, 2019

View reviewed changes

guyjacob merged commit 952028d into master Dec 8, 2019

This was referenced Dec 8, 2019

support quantize weights/bias only #356

Closed

The problem of post-training quantization #311

Closed

nzmora deleted the ptq_weights_only branch April 20, 2020 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable weights-only and activations-only post-training quantization for conv/linear modules #439

Enable weights-only and activations-only post-training quantization for conv/linear modules #439

guyjacob commented Dec 5, 2019 •

edited

Loading

Enable weights-only and activations-only post-training quantization for conv/linear modules #439

Enable weights-only and activations-only post-training quantization for conv/linear modules #439

Conversation

guyjacob commented Dec 5, 2019 • edited Loading

guyjacob commented Dec 5, 2019 •

edited

Loading