Skip to content

Commit

Permalink
Merge branch 'main' into opt-to-upstream
Browse files Browse the repository at this point in the history
  • Loading branch information
gramalingam committed Dec 5, 2023
2 parents e1f896d + 7e0c267 commit e66fe8d
Show file tree
Hide file tree
Showing 114 changed files with 1,413 additions and 526 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ jobs:
path: 'docs/docsgen/build/html'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@9dbe3824824f8a1377b8e298bafde1a50ede43e5 # v2.0.4
uses: actions/deploy-pages@de14547edc9944350dc0481aa5b7afb08e75f254 # v2.0.5
59 changes: 59 additions & 0 deletions docs/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -24582,6 +24582,65 @@ This version of the operator has been available since version 21 of the default
<dd>'x_scale' determines the output type.</dd>
</dl>

### <a name="QLinearMatMul-21"></a>**QLinearMatMul-21**</a>

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html.
It consumes two quantized input tensors, their scales and zero points, scale and zero point of output,
and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point).
For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor
(per row for 'a' and per column for 'b'). Scalar refers to per tensor quantization whereas N-D refers to per row
or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be
an M element vector [v_1, v_2, ..., v_M] for per row quantization and K element vector of shape [v_1, v_2, ..., v_K]
for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may
have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization.
Production must never overflow, and accumulation may overflow if and only if in 32 bits.

#### Version

This version of the operator has been available since version 21 of the default ONNX operator set.

#### Inputs

<dl>
<dt><tt>a</tt> (non-differentiable) : T1</dt>
<dd>N-dimensional quantized matrix a</dd>
<dt><tt>a_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input a</dd>
<dt><tt>a_zero_point</tt> (non-differentiable) : T1</dt>
<dd>zero point of quantized input a</dd>
<dt><tt>b</tt> (non-differentiable) : T2</dt>
<dd>N-dimensional quantized matrix b</dd>
<dt><tt>b_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input b</dd>
<dt><tt>b_zero_point</tt> (non-differentiable) : T2</dt>
<dd>zero point of quantized input b</dd>
<dt><tt>y_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized output y</dd>
<dt><tt>y_zero_point</tt> (non-differentiable) : T3</dt>
<dd>zero point of quantized output y</dd>
</dl>

#### Outputs

<dl>
<dt><tt>y</tt> (non-differentiable) : T3</dt>
<dd>Quantized matrix multiply results from a * b</dd>
</dl>

#### Type Constraints

<dl>
<dt><tt>TS</tt> : tensor(float), tensor(float16), tensor(bfloat16)</dt>
<dd>Constrain scales.</dd>
<dt><tt>T1</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>The type of input a and its zeropoint.</dd>
<dt><tt>T2</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>The type of input b and its zeropoint.</dd>
<dt><tt>T3</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>The type of the output and its zeropoint.</dd>
</dl>

### <a name="QuantizeLinear-21"></a>**QuantizeLinear-21**</a>

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
Expand Down
250 changes: 144 additions & 106 deletions docs/Operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ For an operator input/output's differentiability, it can be differentiable,
|<a href="#Pad">Pad</a>|<a href="Changelog.md#Pad-19">19</a>, <a href="Changelog.md#Pad-18">18</a>, <a href="Changelog.md#Pad-13">13</a>, <a href="Changelog.md#Pad-11">11</a>, <a href="Changelog.md#Pad-2">2</a>, <a href="Changelog.md#Pad-1">1</a>|
|<a href="#Pow">Pow</a>|<a href="Changelog.md#Pow-15">15</a>, <a href="Changelog.md#Pow-13">13</a>, <a href="Changelog.md#Pow-12">12</a>, <a href="Changelog.md#Pow-7">7</a>, <a href="Changelog.md#Pow-1">1</a>|
|<a href="#QLinearConv">QLinearConv</a>|<a href="Changelog.md#QLinearConv-10">10</a>|
|<a href="#QLinearMatMul">QLinearMatMul</a>|<a href="Changelog.md#QLinearMatMul-10">10</a>|
|<a href="#QLinearMatMul">QLinearMatMul</a>|<a href="Changelog.md#QLinearMatMul-21">21</a>, <a href="Changelog.md#QLinearMatMul-10">10</a>|
|<a href="#QuantizeLinear">QuantizeLinear</a>|<a href="Changelog.md#QuantizeLinear-21">21</a>, <a href="Changelog.md#QuantizeLinear-19">19</a>, <a href="Changelog.md#QuantizeLinear-13">13</a>, <a href="Changelog.md#QuantizeLinear-10">10</a>|
|<a href="#RNN">RNN</a>|<a href="Changelog.md#RNN-14">14</a>, <a href="Changelog.md#RNN-7">7</a>, <a href="Changelog.md#RNN-1">1</a>|
|<a href="#RandomNormal">RandomNormal</a>|<a href="Changelog.md#RandomNormal-1">1</a>|
Expand Down Expand Up @@ -19770,24 +19770,26 @@ expect(

#### Version

This version of the operator has been available since version 10 of the default ONNX operator set.
This version of the operator has been available since version 21 of the default ONNX operator set.

Other versions of this operator: <a href="Changelog.md#QLinearMatMul-10">10</a>

#### Inputs

<dl>
<dt><tt>a</tt> (non-differentiable) : T1</dt>
<dd>N-dimensional quantized matrix a</dd>
<dt><tt>a_scale</tt> (non-differentiable) : tensor(float)</dt>
<dt><tt>a_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input a</dd>
<dt><tt>a_zero_point</tt> (non-differentiable) : T1</dt>
<dd>zero point of quantized input a</dd>
<dt><tt>b</tt> (non-differentiable) : T2</dt>
<dd>N-dimensional quantized matrix b</dd>
<dt><tt>b_scale</tt> (non-differentiable) : tensor(float)</dt>
<dt><tt>b_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized input b</dd>
<dt><tt>b_zero_point</tt> (non-differentiable) : T2</dt>
<dd>zero point of quantized input b</dd>
<dt><tt>y_scale</tt> (non-differentiable) : tensor(float)</dt>
<dt><tt>y_scale</tt> (non-differentiable) : TS</dt>
<dd>scale of quantized output y</dd>
<dt><tt>y_zero_point</tt> (non-differentiable) : T3</dt>
<dd>zero point of quantized output y</dd>
Expand All @@ -19803,129 +19805,165 @@ This version of the operator has been available since version 10 of the default
#### Type Constraints

<dl>
<dt><tt>T1</tt> : tensor(int8), tensor(uint8)</dt>
<dd>Constrain input a and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>T2</tt> : tensor(int8), tensor(uint8)</dt>
<dd>Constrain input b and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>T3</tt> : tensor(int8), tensor(uint8)</dt>
<dd>Constrain output y and its zero point data type to 8-bit integer tensor.</dd>
<dt><tt>TS</tt> : tensor(float), tensor(float16), tensor(bfloat16)</dt>
<dd>Constrain scales.</dd>
<dt><tt>T1</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>The type of input a and its zeropoint.</dd>
<dt><tt>T2</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>The type of input b and its zeropoint.</dd>
<dt><tt>T3</tt> : tensor(int8), tensor(uint8), tensor(float8e4m3fn), tensor(float8e4m3fnuz), tensor(float8e5m2), tensor(float8e5m2fnuz)</dt>
<dd>The type of the output and its zeropoint.</dd>
</dl>


#### Examples

<details>
<summary>qlinearmatmul</summary>
<summary>int</summary>

```python
node = onnx.helper.make_node(
"QLinearMatMul",
inputs=[
"a",
"a_scale",
"a_zero_point",
"b",
"b_scale",
"b_zero_point",
"y_scale",
"y_zero_point",
],
outputs=["y"],
)
for quant_type_name in ["uint8", "int8"]:
quant_type = getattr(np, quant_type_name)
for dtype_name in ["float32", "float16"]:
dtype = getattr(np, dtype_name)
node = onnx.helper.make_node(
"QLinearMatMul",
inputs=[
"a",
"a_scale",
"a_zero_point",
"b",
"b_scale",
"b_zero_point",
"y_scale",
"y_zero_point",
],
outputs=["y"],
)

# 2D
a = np.array(
[
[208, 236, 0, 238],
[3, 214, 255, 29],
],
dtype=np.uint8,
)
# 2D
a = np.array([[208, 236, 0, 238], [3, 214, 255, 29]])
if quant_type == np.int8:
a -= 127
a = a.astype(quant_type)

a_scale = np.array([0.0066], dtype=np.float32)
a_zero_point = np.array([113], dtype=np.uint8)
a_scale = np.array([0.0066], dtype=dtype)
a_zero_point = np.array(
[113 - 127] if quant_type == np.int8 else [113], dtype=quant_type
)

b = np.array(
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
dtype=np.uint8,
)
b = np.array(
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]]
)
if quant_type == np.int8:
b -= 127
b = b.astype(quant_type)

b_scale = np.array([0.00705], dtype=np.float32)
b_zero_point = np.array([114], dtype=np.uint8)
b_scale = np.array([0.00705], dtype=dtype)
b_zero_point = np.array(
[114 - 127] if quant_type == np.int8 else [114], dtype=quant_type
)

y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array([118], dtype=np.uint8)
y_scale = np.array([0.0107], dtype=dtype)
y_zero_point = np.array(
[118 - 127] if quant_type == np.int8 else [118], dtype=quant_type
)

output = np.array(
[
[168, 115, 255],
[1, 66, 151],
],
dtype=np.uint8,
)
if quant_type == np.int8:
output = np.array([[41, -12, -9], [1, -75, 20]])
else:
output = np.array([[168, 115, 255], [1, 66, 151]])
output = output.astype(quant_type)

expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name="test_qlinearmatmul_2D",
)
expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name=f"test_qlinearmatmul_2D_{quant_type_name}_{dtype_name}",
)

# 3D
a = np.array(
[
[[208, 236, 0, 238], [3, 214, 255, 29]],
[[208, 236, 0, 238], [3, 214, 255, 29]],
],
dtype=np.uint8,
)
# 3D
a = np.array(
[
[[208, 236, 0, 238], [3, 214, 255, 29]],
[[208, 236, 0, 238], [3, 214, 255, 29]],
],
)
if quant_type == np.int8:
a -= 127
a = a.astype(quant_type)

a_scale = np.array([0.0066], dtype=np.float32)
a_zero_point = np.array([113], dtype=np.uint8)
a_scale = np.array([0.0066], dtype=dtype)
a_zero_point = np.array(
[113 - 127] if quant_type == np.int8 else [113], dtype=quant_type
)

b = np.array(
[
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
],
dtype=np.uint8,
)
b = np.array(
[
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
],
)
if quant_type == np.int8:
b -= 127
b = b.astype(quant_type)

b_scale = np.array([0.00705], dtype=np.float32)
b_zero_point = np.array([114], dtype=np.uint8)
b_scale = np.array([0.00705], dtype=dtype)
b_zero_point = np.array([114], dtype=quant_type)

y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array([118], dtype=np.uint8)
y_scale = np.array([0.0107], dtype=dtype)
y_zero_point = np.array(
[118 - 127] if quant_type == np.int8 else [118], dtype=quant_type
)

output = np.array(
[[[168, 115, 255], [1, 66, 151]], [[168, 115, 255], [1, 66, 151]]],
dtype=np.uint8,
)
if quant_type == np.int8:
if dtype == np.float32:
output = np.array(
[
[[-86, 117, 120], [115, 39, -121]],
[[-86, 117, 120], [115, 39, -121]],
]
)
else:
output = np.array(
[
[[-86, 116, 119], [115, 39, -121]],
[[-86, 116, 119], [115, 39, -121]],
]
)
else:
output = np.array(
[
[[168, 115, 255], [1, 66, 151]],
[[168, 115, 255], [1, 66, 151]],
]
)
output = output.astype(quant_type)

expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name="test_qlinearmatmul_3D",
)
expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name=f"test_qlinearmatmul_3D_{quant_type_name}_{dtype_name}",
)
```

</details>
Expand Down
Loading

0 comments on commit e66fe8d

Please sign in to comment.