UID | title | description | helpviewer_keywords | old-location | tech.root | ms.assetid | ms.date | req.header | req.include-header | req.target-type | req.target-min-winverclnt | req.target-min-winversvr | req.kmdf-ver | req.umdf-ver | req.ddi-compliance | req.unicode-ansi | req.idl | req.max-support | req.namespace | req.assembly | req.type-library | req.lib | req.dll | req.irql | targetos | req.typenames | req.redist | ms.custom | f1_keywords | dev_langs | topic_type | api_type | api_location | api_name | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NS:directml.DML_GEMM_OPERATOR_DESC |
DML_GEMM_OPERATOR_DESC |
Performs a general matrix multiplication function of the form `Output = FusedActivation(Alpha * TransA(A) x TransB(B) + Beta * C)`, where `x` denotes matrix multiplication, and `*` denotes multiplication with a scalar. |
|
direct3d12\dml_gemm_operator_desc.htm |
directml |
11482420-678E-4914-90F0-9F952BC09FF7 |
12/01/2022 |
directml.h |
Windows |
Windows |
19H1 |
|
|
|
|
|
|
Performs a general matrix multiplication function of the form Output = FusedActivation(Alpha * TransA(A) x TransB(B) + Beta * C)
, where x
denotes matrix multiplication, and *
denotes multiplication with a scalar.
This operator requires 4D tensors with layout { BatchCount, ChannelCount, Height, Width }
, and it will perform BatchCount * ChannelCount number of independent matrix multiplications.
For example, if ATensor has Sizes of { BatchCount, ChannelCount, M, K }
, and BTensor has Sizes of { BatchCount, ChannelCount, K, N }
, and OutputTensor has Sizes of { BatchCount, ChannelCount, M, N }
, then this operator performs BatchCount * ChannelCount independent matrix multiplications of dimensions {M,K} x {K,N} = {M,N}.
Type: const DML_TENSOR_DESC*
A tensor containing the A matrix. This tensor's Sizes should be { BatchCount, ChannelCount, M, K }
if TransA is DML_MATRIX_TRANSFORM_NONE, or { BatchCount, ChannelCount, K, M }
if TransA is DML_MATRIX_TRANSFORM_TRANSPOSE.
Type: const DML_TENSOR_DESC*
A tensor containing the B matrix. This tensor's Sizes should be { BatchCount, ChannelCount, K, N }
if TransB is DML_MATRIX_TRANSFORM_NONE, or { BatchCount, ChannelCount, N, K }
if TransB is DML_MATRIX_TRANSFORM_TRANSPOSE.
Type: _Maybenull_ const DML_TENSOR_DESC*
A tensor containing the C matrix, or nullptr
. Values default to 0 when not provided. If provided, this tensor's Sizes should be { BatchCount, ChannelCount, M, N }
.
Type: const DML_TENSOR_DESC*
The tensor to write the results to. This tensor's Sizes are { BatchCount, ChannelCount, M, N }
.
Type: DML_MATRIX_TRANSFORM
The transform to be applied to ATensor; either a transpose, or no transform.
Type: DML_MATRIX_TRANSFORM
The transform to be applied to BTensor; either a transpose, or no transform.
Type: FLOAT
The value of the scalar multiplier for the product of inputs ATensor and BTensor.
Type: FLOAT
The value of the scalar multiplier for the optional input CTensor. If CTensor is not provided, then this value is ignored.
Type: _Maybenull_ const DML_OPERATOR_DESC*
An optional fused activation layer to apply after the GEMM. For more info, see Using fused operators for improved performance.
This operator was introduced in DML_FEATURE_LEVEL_1_0
.
- ATensor, BTensor, CTensor, and OutputTensor must have the same DataType and DimensionCount.
- CTensor and OutputTensor must have the same Sizes.
Tensor | Kind | Dimensions | Supported dimension counts | Supported data types |
---|---|---|---|---|
ATensor | Input | { [BatchCount], [ChannelCount], M, K } | 2 to 4 | FLOAT32, FLOAT16 |
BTensor | Input | { [BatchCount], [ChannelCount], K, N } | 2 to 4 | FLOAT32, FLOAT16 |
CTensor | Optional input | { [BatchCount], [ChannelCount], M, N } | 2 to 4 | FLOAT32, FLOAT16 |
OutputTensor | Output | { [BatchCount], [ChannelCount], M, N } | 2 to 4 | FLOAT32, FLOAT16 |
Tensor | Kind | Dimensions | Supported dimension counts | Supported data types |
---|---|---|---|---|
ATensor | Input | { BatchCount, ChannelCount, M, K } | 4 | FLOAT32, FLOAT16 |
BTensor | Input | { BatchCount, ChannelCount, K, N } | 4 | FLOAT32, FLOAT16 |
CTensor | Optional input | { BatchCount, ChannelCount, M, N } | 4 | FLOAT32, FLOAT16 |
OutputTensor | Output | { BatchCount, ChannelCount, M, N } | 4 | FLOAT32, FLOAT16 |