Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[OPT] Low-bit Quantization #2116
2 times, most recently
Dec 3, 2018
Has there been an RFC posted btw? This comment probably belongs there.
FWIW I'm a little concerned about some directions this PR is taking, or at least some use-cases that would be good to see handled that I don't see how they fit in currently.
For background on my perspective, a standard training flow for quantized models in TF/C2 (at least the fwk's I'm familiar with that implement this), is to:
Does this workflow make sense to folks? If not, could folks please elaborate on where we differ?
Given this flow, we'd like to insert TVM into this process. One key use case that I'd like TVM to consider supporting is to allow frameworks to continue to use their existing approaches for Steps 1-5, and involve TVM in Step 6. There are several reasons for this, such as calibration-based quantization isn't always sufficient, and we'd like to supporting importing from existing int8 graph IRs like TFLite or C2.
I think requiring TVM to take on Steps 4 and 5 in order to implement quantized models is unnecessarily opinionated, and moves it towards being a fully-fledged framework in it's own right (which I thought was not the goal).
I would have thought one natural (and minimalistic) direction for TVM to support quantized models (which isn't precluded by this diff, but I want to see what folks think about this) would be something like:
Concretely, my concerns with this approach (assuming the goal is to be the 'the one true way' to execute quantized models in TVM) are that it a) integrates too early in the pipeline, which unnecessarily requires some assumptions, and b) these assumptions aren't the most general ones (i.e. requires symmetric quantization as used by e.g. MKLDNN), which precludes asymmetric quantization as in TF, TFLite, C2, GEMMLOWP, QNNPACK, and channel-wise quantization as in TF/C2 which is very useful for pushing bitwidths lower (see e.g. https://arxiv.org/pdf/1806.08342.pdf), and c) is less modular than other approaches, which makes it harder to target from existing frameworks that already support quantization.
I don't think our goals are in conflict, I just thought that I should put this on the radar. Happy to send out an RFC (and dedicate engineering effort) to the more alternative approach as well if folks are on board?
Running on inception-v3 produces segmentation fault. It seems some memory issue (some free'd memory accessed by
Currently if conv2d has bias enabled, we have two
left a comment
When I wonder why Jenkins is still silent, I found that the test script sits into
@ZihengJiang sorry this is basic question, but is there support for mixed quantization levels? It looks like currently we specify a global weight and activation precision only. Since we can already skip the first k conv layers, it seems that this would be a useful generalization.