Skip to content

Latest commit

 

History

History
48 lines (36 loc) · 5.14 KB

oneDNN.md

File metadata and controls

48 lines (36 loc) · 5.14 KB

oneDNN

Introduction

oneDNN is the open source cross-platform performance acceleration library for deep learning from Intel, The documentation guides you to find out which primitives are supported. OneDNN has been integrated into DeepRec, which can be enabled by adding the compiling option in the compile command. --config=mkl_threadpool is used to enable oneDNN accelerated arithmetic computation. Adding the compiling option --config=opt will enable the optimization of --copt=-march=native, which can further accelerate arithmetic performance on the CPU which supports AVX512, for example, Skylake, Caslake and Icelake.

Tips: MKL was first renamed as DNNL and then renamed as oneDNN. Tensorflow initially used MKL to accelerate the computation of the operators, and in subsequent versions of iteration, oneDNN gradually take the place of MKL, but the macro definitions were still retained.

Macro definition of oneDNN in DeepRec:

Macro Definition Values(Bold for Default) Explanation
TF_MKL_PRIMITIVE_ONLY_FOR_RECO 1/true, 0/false 1: Only replace the operators which supported by oneDNN in recommendation models; 0: Replace all of the operators to that supported by oneDNN.
TF_MKL_OPTIMIZE_PRIMITIVE_MEMUSE 1/true, 0/false 1: Reduce the use of main memory by releasing the primitives; 0: Don't release primitives.
TF_DISABLE_MKL 0, 1 0: Enable MKL; 1: Disable MKL
TF_MKL_NUM_INTRAOP Integer, such as 14 ,Not set by default Integer:set the number of intra threads used by oneDNN;Not set:number of TF intra threads used most.
ONEDNN_VERBOSE 0/1/2 Print the level of log output by oneDNN primitive.
DNNL_MAX_CPU_ISA ALL, AVX512_CORE_AMX, AVX512_CORE_BF16, … The highest ISA used by oneDNN (for versions less than 2.5.0)
ONEDNN_MAX_CPU_ISA ALL, AVX512_CORE_AMX, AVX512_CORE_BF16, … The highest ISA by oneDNN (for versions more than or equal to 2.5.0)

Primitives supported by oneDNN:

Primitive Available Types Available Backward Operations
Matrix Multiplication f32, bf16, f16, u8, s8 Scale, Zero, Eltwise, Sum, Binary
Inner Product f32, bf16, f16, u8, s8 Scale, Eltwise, Sum, Binary
Layer Normalization f32, bf16, f16 /
Batch Normalization f32, bf16, f16, s8 Eltwise
Local Response Normalization (LRN) f32, bf16, f16 /
Binary (+, =, *, /, >, <, min, max...) f32, bf16, f16, u8, s8 Scale, Eltwise, Sum, Binary
Eltwise (relu, gelu, tanh, linear...) f32, s32, bf16, f16, u8, s8 Binary
PReLU f32, s32, bf16, s8, u8 /
Sum f32, s32, bf16, f16, u8, s8 /
Reduction f32, bf16, u8, s8 Eltwise, Sum, Binary
Softmax f32, bf16, f16 /
LogSoftmax f32, bf16 /
Reorder f32, s32, bf16, f16, u8, s8 Scale, Sum
Concat f32, s32, bf16, f16, u8, s8 /
Convolution f32, bf16, f16, u8, s8 Scale, Zero, Eltwise, Sum, Binary
Pooling f32, s32, bf16, f16, u8, s8 Binary
RNN (LSTM, GRU, Vanilla RNN...) f32, bf16, f16, u8, s8 /
Resampling f32, s32, bf16, f16, s8, u8 Eltwise, Sum, Binary
Shuffle f32, s32, bf16, s8, u8 /