Skip to content

Commit

Permalink
Update the backward part.
Browse files Browse the repository at this point in the history
  • Loading branch information
qingqing01 committed May 23, 2018
1 parent a79a36f commit 882e6f4
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 11 deletions.
29 changes: 18 additions & 11 deletions doc/fluid/design/quantization/fixed_point_quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,18 @@ How to calculate the quantization range (or maximum absolute value) for inferenc

### Training Framework

#### Forward pass

The forward pass is simulated quantization, see the figure 1.

The training framework is as following figure.

<p align="center">
<img src="quantization_training_framework.png" align="center"/><br/>
<img src="quantization_forward.png" width="300" height="340" /><br/>

Fig 1. Forward and backward in training.
Fig 1. Forward in training with simulated quantization.
</p>

#### Forward pass

The forward pass is simulated quantization, see the figure a.

- At first, both input and weight will be quantized to 8 bit.
- Then, do the multiplication (or convolution) operation with integers.
- Then, dequantize the multiplication (or convolution) results to 32 bit float point.
Expand All @@ -68,27 +68,34 @@ Dequantize $Y$:
$$
\begin{align}
Y_{dq} &=\frac{Y}{(n - 1) * (n - 1)} * X_m * W_m \\\
&=\frac{X_q * W_q}{(n - 1) * (n - 1)} * X_m * W_m \\
&=\frac{X_q * W_q}{(n - 1) * (n - 1)} * X_m * W_m \\\
&=(\frac{X_q}{n - 1} * X_m) * (\frac{W_q}{n - 1} * W_m)
\end{align}
$$

From these formulas, dequantization also can be moved before GEMM, do dequantization for $Xq$ and $Wq$ at first, then do GEMM. The forward workflow in training is equivalent to following framework.

<p align="center">
<img src="quantization_forward.png" width="300" height="300" /><br/>
<img src="quantization_forward.png" width="300" height="330" /><br/>

Fig 2. Equitvalent forward in training.
Fig 2. Equitvalent forward in training with simulated quantization.

</p>

We use this equivalent workflow in the training. In our desigin, there is a quantization transipler to insert the quantization operator and the de-quantization operator in the Fluid `ProgramDesc`.

#### Backward pass

See the figure b. The backward pass still remains unchanged, all inputs and outputs of backward operator are float point with 32 bit.
See the figure 3. The gradients are calculated by dequantized weights and activations. All inputs and outputs are float point with 32 bit. And in the weight updating process, the gradients will be added to the original weight, not the quantized or dequantized weights.

<p align="center">
<img src="quantization_backward_and_optimization.png" /><br/>

Fig 3. Backward and weight updating in training with simulated quantization.

</p>

So the quantization transipler should not change the backward pass.
So the quantization transipler will change some inputs of the corresponding backward operators.

### How to calculate quantization scale

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified doc/fluid/design/quantization/quantization_forward.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.

0 comments on commit 882e6f4

Please sign in to comment.