Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Quantization] KL-divergence-based per-layer calibration #3538

Merged
merged 12 commits into from Aug 2, 2019

Conversation

@vinx13
Copy link
Member

commented Jul 12, 2019

  • KL divergence algorithm ported from MXNet quantization
  • CollectStats pass that collects input of each simulated_quantize in annotated graph into a tuple output
  • Support floating-point scale
  • max_scale as an alternative for power2_scale in weight quantization

Evaluation code

https://gist.github.com/vinx13/6f1eb1f9e2c0a8786149ee881bfcd6aa

What's left:

  • I added QAnnotateKind.BIAS. I'm not sure whether it is necessary. Currently there are a few tricks in handling bias (nbit_bias, valid_range, ...). It would be good to find a better solution and avoid these tricks.
  • In my evaluation script, I have to write quantization workflow by myself (optimize, annotate, calibrate, realize). Please also share your thought on the design of calibrate function. We need to decide how users can specify different ways of quantization (max, power2, KLD, ...)

Evaluation result on ImageNet:

max_scale for weights, KL divergence for activations:

resnet18_v1, 0.70642 / 0.89702
resnet50_v1, 0.73682 / 0.91664
resnet101_v1, 0.74484 / 0.9208
resnet18_v2, 0.70794 / 0.89832
resnet50_v2, 0.7691 / 0.93268
resnet101_v2, 0.78204 / 0.94124

power2 for weights, KL divergence restricted to power2 value for activations (use --eval-power2 option in my evaluation script):

resnet18_v1, 0.70332 / 0.89526
resnet50_v1, 0.73426 / 0.9146
resnet101_v1, 0.72434 / 0.91058
resnet18_v2, 0.70314 / 0.89618
resnet50_v2, 0.76486 / 0.93108
resnet101_v2, 0.78066 / 0.94002

These experiments are done under opt_level=2. When opt_level=3, FoldScaleAxis might generate some outliers in bias vector and cause significant accuracy drops. We should use different scales than taking the maximum for bias in this case.

cc @tqchen @ZihengJiang @eqy @ajtulloch @antinucleon @FrozenGene

src/relay/pass/quantize.cc Outdated Show resolved Hide resolved
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 0377167 to 9d71db8 Jul 16, 2019
@vinx13 vinx13 marked this pull request as ready for review Jul 16, 2019
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 9d71db8 to 3d1d4cf Jul 16, 2019
@vinx13

This comment has been minimized.

Copy link
Member Author

commented Jul 16, 2019

This one is ready. Please review and share your thoughts on calibration api design.

@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 3d1d4cf to 99ffbc2 Jul 16, 2019
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 99ffbc2 to 0e55518 Jul 16, 2019
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 76b8b76 to 577387d Jul 16, 2019
@ZihengJiang ZihengJiang self-assigned this Jul 17, 2019
python/tvm/relay/quantize/quantize.py Show resolved Hide resolved
src/relay/pass/quantize.cc Outdated Show resolved Hide resolved
// =============
// calibration

class StatsCollector : private ExprMutator {

This comment has been minimized.

Copy link
@ZihengJiang

ZihengJiang Jul 18, 2019

Member

If just collect stats, ExprVisitor should be enough

This comment has been minimized.

Copy link
@vinx13

vinx13 Jul 19, 2019

Author Member

ExprMutator is needed actually. This mutator transform annotated expr to original expr by removing each simulate_quantize.
For example Relay program:

%1 = ..
%2 = simulate_quantize(%1)
%3 = op(%2)
%4 = simulate_quantize(%3)

We need to profile %1 and %3. But %3 takes %2 as input, we need to replace input of %3 with %1 (because in Annotate pass simulate_quantize in %2 is not in passthrough mode, we need to either remove it or rewrite it in passthrough mode)

This comment has been minimized.

Copy link
@vinx13

vinx13 Jul 22, 2019

Author Member

@ZihengJiang I was thinking that the other pr #3543 actually breaks this pass (because the result of this pass contains annotations and casts)

This comment has been minimized.

Copy link
@ZihengJiang

ZihengJiang Jul 23, 2019

Member

@vinx13 Why not collect stats before annotate?

This comment has been minimized.

Copy link
@vinx13

vinx13 Jul 23, 2019

Author Member

@ZihengJiang Annotations tell us which nodes should be profiled. If we want to collect stats before annotate, we need to repeat the code similar to annotate to decide which node should be quantized.

This comment has been minimized.

Copy link
@ZihengJiang

ZihengJiang Jul 25, 2019

Member

Okay, let's keep the current way. #3543 will not breaks this pass since annotation.cast_hint and annotation.stop_fusion will not change the running result. They are just annotation and you can view them as identity. One thing is, instead of detecting and jumping simulated_quantize inside of IRMutator, let's adding an option like simulated_quantize(kind=kIdentity) for eliminating the impact of simulated_quantize

This comment has been minimized.

Copy link
@vinx13

vinx13 Aug 1, 2019

Author Member

@ZihengJiang updated

src/relay/pass/quantize.cc Outdated Show resolved Hide resolved
src/relay/pass/quantize.cc Show resolved Hide resolved
python/tvm/relay/quantize/quantize.py Show resolved Hide resolved
@ZihengJiang

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

@vinx13 Could you please address other comments?
We can change the calibrate API like you did for now. In long term, we should think about like calibrate(graph, mod, ctx, fcalibrate), where fcalibrate(sq_op, stats) is a callback function which can be provided by user.

@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch 2 times, most recently from 5f0406e to 16b27d4 Jul 22, 2019
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 16b27d4 to 7ba8f30 Jul 22, 2019
@tqchen

This comment has been minimized.

Copy link
Member

commented Jul 24, 2019

@ZihengJiang @vinx13 please followup on this and let us merge soon

Copy link
Contributor

left a comment

I leave some comments, please ping me if I understand in a wrong way.

python/tvm/relay/quantize/kl_divergence.py Outdated Show resolved Hide resolved
vinx13 added 2 commits Jul 27, 2019
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from cc34c02 to 0dd38e7 Jul 27, 2019
@vinx13 vinx13 force-pushed the vinx13:feature/calibration_v2 branch from 10ef14a to 493d14b Jul 30, 2019
Copy link
Contributor

left a comment

LGTM basically.

@ZihengJiang ZihengJiang merged commit 33ab3c6 into dmlc:master Aug 2, 2019
5 checks passed
5 checks passed
continuous-integration/jenkins/pr-merge This commit looks good
Details
windows_mac_build Build #20190730.10 succeeded
Details
windows_mac_build (MacOS_XCode9) MacOS_XCode9 succeeded
Details
windows_mac_build (Windows_VS2017_x64) Windows_VS2017_x64 succeeded
Details
windows_mac_build (Windows_VS2017_x86) Windows_VS2017_x86 succeeded
Details
@tqchen tqchen referenced this pull request Aug 2, 2019
wweic added a commit to wweic/tvm that referenced this pull request Aug 9, 2019
…#3538)

* [Relay][Quantization] Support floating-point scale

* [Relay][Quantization] KL-divergence calibration on dataset

* Fix unhandled LeftShift case in QuantizeRealize

* Fix lint

* drop QBias

* fix lint

* address comments

* address comments

* Update comments

* address comments

* lint

* kQIdentity = 0
wweic added a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019
…#3538)

* [Relay][Quantization] Support floating-point scale

* [Relay][Quantization] KL-divergence calibration on dataset

* Fix unhandled LeftShift case in QuantizeRealize

* Fix lint

* drop QBias

* fix lint

* address comments

* address comments

* Update comments

* address comments

* lint

* kQIdentity = 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.