Design doc of fixed-point quantization. #10553

qingqing01 · 2018-05-10T03:40:06Z

You also can see https://github.com/qingqing01/Paddle/blob/quantization_doc/doc/fluid/design/quantization/fixed_point_quantization.md

kuke

Thanks for the design doc. Some comments about the writing

kuke · 2018-05-25T07:17:35Z

doc/fluid/design/quantization/fixed_point_quantization.md

@@ -0,0 +1,115 @@
+Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements.  It is especially import for the inference in embedded device deployment.


is to use lower bit --> uses lower bits
2 bit, 3 bit or 8 bit fixed-point --> 2-bit, 3-bit or 8-bit fixed point
singe float point with 32 bit --> in single-precision float-point format with 32 bits
import --> important
embedded device --> embedded-device

kuke · 2018-05-25T08:01:02Z

doc/fluid/design/quantization/fixed_point_quantization.md

@@ -0,0 +1,115 @@
+Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements.  It is especially import for the inference in embedded device deployment.
+
+According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.


According --> According to
sufficiently --> effectively
on the large model -> on the large models
like the over-parameterized VGG model --> like the VGG model having many parameters. The word over-parameterized has other meanings.
be-tween --> between

kuke · 2018-05-25T08:03:16Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.
+
+This document is to design a quantized training framework on Fluid. The first part will introduce how to quantize, The second part will describe the quantized training framework. The last part will describe how to the quantization range.


The last part will describe how to the quantization range --> The last part will illustrate how to calculate the quantization range

kuke · 2018-05-25T08:04:22Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+### How to quantize
+
+There are many ways to quantizate the float value to fixed-point value. For example:


quantizate --> quantize

kuke · 2018-05-25T08:23:26Z

doc/fluid/design/quantization/fixed_point_quantization.md

+where, $x$ is the float value to be quantized, $[a, b]$ is the quantization range, $a$ is the minimum value and $b$ is the maximal value. $\left \lfloor \right \rceil$  denotes rounding to the nearest integer. If the quantization level is $k$, $n$ is $2^k$, for example, $k$ is 8 and $n$ is 256. $q$ is the quantized integer. 
+
+
+The quantization we apllied is parameterized by the number of quantization levels and maximum absolute value:


apllied --> applied

kuke · 2018-05-25T09:04:53Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+#### Backward pass
+
+See the figure 3. The gradients are calculated by dequantized weights and activations. All inputs and outputs are float point with 32 bit. And in the weight updating process, the gradients will be added to the original weight, not the quantized or dequantized weights.


See the figure 3 --> See Figure 3

kuke · 2018-05-25T09:07:32Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+So the quantization transipler will change some inputs of the corresponding backward operators. 
+
+### How to calculate quantization scale


scale or range, should they be consistent?

use scale, change the description above.

kuke · 2018-05-25T09:11:08Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+### How to calculate quantization scale
+
+There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.


is to calculate --> calculates
is to fix the quantization scale --> keeps the same quantization scale

kuke · 2018-05-25T09:12:18Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.
+
+For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.


remove for weights
will recalculate --> will be recalculated

kuke · 2018-05-25T09:15:22Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.
+
+For activations, the quantization scales are estimated during training, then use them in inference. There are several different ways to estimat:


then use them --> then used
estimat --> estimate them

panyx0718 · 2018-05-27T13:39:48Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+</p>
+
+We use this equivalent workflow in the training. In our desigin, there is a quantization transipler to insert the quantization operator and the de-quantization operator in the Fluid `ProgramDesc`.


transipler -> transpiler

panyx0718 · 2018-05-27T13:43:52Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+</p>
+
+So the quantization transipler will change some inputs of the corresponding backward operators. 


It seems the current plan is to insert quant op first, generate backward, then change backward ops.
What if we first generate backward, then insert quant ops in forward? Then we don't need to update backward ops?

What if we first generate backward, then insert quant ops in forward?

Yeah, the implementation in https://github.com/PaddlePaddle/Paddle/pull/10693/files is this way. The usage is like:

main = fluid.Program() startup = fluid.Program() with fluid.program_guard(main, startup): loss = network(3) opt = fluid.optimizer.Adam(learning_rate=0.001) opt.minimize(loss) t = fluid.QuantizeTranspiler() t.transpile(main)

Since the backward needs to use the dequantized weights and activations, see the Figure 3 in https://github.com/qingqing01/Paddle/blob/quantization_doc/doc/fluid/design/quantization/fixed_point_quantization.md , we still need to rewrite the backward ops.

qingqing01

@kuke @panyx0718 Thanks your detailed review. Thanks very much.

qingqing01 · 2018-05-28T06:38:07Z

doc/fluid/design/quantization/fixed_point_quantization.md

@@ -0,0 +1,115 @@
+Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements.  It is especially import for the inference in embedded device deployment.


qingqing01 · 2018-05-28T06:38:18Z

doc/fluid/design/quantization/fixed_point_quantization.md

@@ -0,0 +1,115 @@
+Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements.  It is especially import for the inference in embedded device deployment.
+
+According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.


qingqing01 · 2018-05-28T06:38:22Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.
+
+This document is to design a quantized training framework on Fluid. The first part will introduce how to quantize, The second part will describe the quantized training framework. The last part will describe how to the quantization range.


qingqing01 · 2018-05-28T06:38:26Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+### How to quantize
+
+There are many ways to quantizate the float value to fixed-point value. For example:


qingqing01 · 2018-05-28T06:38:31Z

doc/fluid/design/quantization/fixed_point_quantization.md

+where, $x$ is the float value to be quantized, $[a, b]$ is the quantization range, $a$ is the minimum value and $b$ is the maximal value. $\left \lfloor \right \rceil$  denotes rounding to the nearest integer. If the quantization level is $k$, $n$ is $2^k$, for example, $k$ is 8 and $n$ is 256. $q$ is the quantized integer. 
+
+
+The quantization we apllied is parameterized by the number of quantization levels and maximum absolute value:


qingqing01 · 2018-05-28T06:43:59Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+</p>
+
+So the quantization transipler will change some inputs of the corresponding backward operators. 


What if we first generate backward, then insert quant ops in forward?

Yeah, the implementation in https://github.com/PaddlePaddle/Paddle/pull/10693/files is this way. The usage is like:

main = fluid.Program() startup = fluid.Program() with fluid.program_guard(main, startup): loss = network(3) opt = fluid.optimizer.Adam(learning_rate=0.001) opt.minimize(loss) t = fluid.QuantizeTranspiler() t.transpile(main)

Since the backward needs to use the dequantized weights and activations, see the Figure 3 in https://github.com/qingqing01/Paddle/blob/quantization_doc/doc/fluid/design/quantization/fixed_point_quantization.md , we still need to rewrite the backward ops.

qingqing01 · 2018-05-28T06:44:13Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+So the quantization transipler will change some inputs of the corresponding backward operators. 
+
+### How to calculate quantization scale


use scale, change the description above.

qingqing01 · 2018-05-28T06:48:41Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.
+
+For activations, the quantization scales are estimated during training, then use them in inference. There are several different ways to estimat:


qingqing01 · 2018-05-28T06:48:49Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.
+
+For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.


qingqing01 · 2018-05-28T06:48:52Z

doc/fluid/design/quantization/fixed_point_quantization.md

+
+### How to calculate quantization scale
+
+There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.


qingqing01 · 2018-07-10T03:57:15Z

@panyx0718 @kuke Is there any problem about this PR ?

kuke

LGTM

* Design doc of fixed-point quantization. * Update fixed point quantization desigin doc. * Fix doc format. * Update the backward part. * Fix the grammatical.

qingqing01 added 2 commits May 9, 2018 13:29

Design doc of fixed-point quantization.

21c557a

Update fixed point quantization desigin doc.

40f6a18

qingqing01 force-pushed the quantization_doc branch 3 times, most recently from 9de2945 to 02508da Compare May 23, 2018 05:28

Fix doc format.

a79a36f

qingqing01 force-pushed the quantization_doc branch from 02508da to a79a36f Compare May 23, 2018 05:47

Update the backward part.

882e6f4

qingqing01 force-pushed the quantization_doc branch from ce4af0b to 882e6f4 Compare May 23, 2018 06:38

qingqing01 changed the title ~~[WIP] Design doc of fixed-point quantization.~~ Design doc of fixed-point quantization. May 23, 2018

qingqing01 requested review from panyx0718, kuke and wanghaoshuang May 23, 2018 06:41

kuke reviewed May 25, 2018

View reviewed changes

panyx0718 reviewed May 27, 2018

View reviewed changes

Fix the grammatical.

773e566

qingqing01 force-pushed the quantization_doc branch from d3d743f to 773e566 Compare May 28, 2018 07:10

qingqing01 commented May 28, 2018

View reviewed changes

kuke approved these changes Jul 10, 2018

View reviewed changes

qingqing01 merged commit 79d797f into PaddlePaddle:develop Jul 11, 2018

qingqing01 deleted the quantization_doc branch November 14, 2019 05:29

		@@ -0,0 +1,115 @@
		Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements. It is especially import for the inference in embedded device deployment.

		@@ -0,0 +1,115 @@
		Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements. It is especially import for the inference in embedded device deployment.

		According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.


		According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.

		This document is to design a quantized training framework on Fluid. The first part will introduce how to quantize, The second part will describe the quantized training framework. The last part will describe how to the quantization range.


		### How to quantize

		There are many ways to quantizate the float value to fixed-point value. For example:

		where, $x$ is the float value to be quantized, $[a, b]$ is the quantization range, $a$ is the minimum value and $b$ is the maximal value. $\left \lfloor \right \rceil$ denotes rounding to the nearest integer. If the quantization level is $k$, $n$ is $2^k$, for example, $k$ is 8 and $n$ is 256. $q$ is the quantized integer.


		The quantization we apllied is parameterized by the number of quantization levels and maximum absolute value:


		#### Backward pass

		See the figure 3. The gradients are calculated by dequantized weights and activations. All inputs and outputs are float point with 32 bit. And in the weight updating process, the gradients will be added to the original weight, not the quantized or dequantized weights.


		So the quantization transipler will change some inputs of the corresponding backward operators.

		### How to calculate quantization scale


		### How to calculate quantization scale

		There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.


		There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.

		For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.


		For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.

		For activations, the quantization scales are estimated during training, then use them in inference. There are several different ways to estimat:


		</p>

		We use this equivalent workflow in the training. In our desigin, there is a quantization transipler to insert the quantization operator and the de-quantization operator in the Fluid `ProgramDesc`.

Design doc of fixed-point quantization. #10553

Design doc of fixed-point quantization. #10553

Conversation

qingqing01 commented May 10, 2018 • edited Loading

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 May 28, 2018 • edited Loading

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 May 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 commented Jul 10, 2018

kuke left a comment

Choose a reason for hiding this comment

qingqing01 commented May 10, 2018 •

edited

Loading

qingqing01 May 28, 2018 •

edited

Loading

qingqing01 May 28, 2018 •

edited

Loading