oneAPI Deep Neural Network Library (oneDNN)

default branch changed from "master" to ve-v1.4

old master branch is around v0.16, very old.

new default branch is ve-v1.4, and has many ref impls rewritten to use high vector length. The single most important approach is to vectorize the offset calculations of memory_desc_wrapper, since nc++ won't vectorize loops containing function calls at all. nc++ scalar code is very slow.

VE Aurora port : v1.4

nc++ full build OK: examples, tests, benchdnn targets all OK (except one test that is no longer possible with 'vanilla' build)
nc++-3.0.27 work:
- many layers now vectorize the offset calcs
- now some vectorized math funcs depart (as per doc) for +/-inf,Nan,+/-0 behavior (despite some efforts to manually correct many cases).
  - this is a documented optimization effect of vectorization.
- compilation VERY sensitive to idempotent code changes.
  - Now ./build.sh -vdd (-O3 debug build) passed all tests w/o segfaults.
still quite a bit of debug stuff in cmake stuff.
- REMOVED
last set of nc++ bugs were: (i) complicated '&&' expressions misevaluated, and (ii) vector VFCP compare with NaN gave wrong result.
- and some segfaults esp. for conv backward weights code
- several other files need modified compiler options (try by hand until find a set of options that skirt around bad code productions).
reorder compile takes a long time.
- compile much faster breaking apart into several src/cpu/ve/ files, BUT this introduces a bug -- the lengthy, monolithic compile seems to avoid segfaults. This suggests that the 'ipa' phase of compiler optimization is introducing bugs. There are so many reorder functions that eventually the compiler quites ipa optimizations, and the library avoids segfaults in many benchdnn tests. (So still use monolithic "forever" compile).
older issues include workarounds for incorrect C++11 zero-initialization
many non-convolution ref impls have VE specific optimizations.
- sometimes based on v0.16 versions, some with new approaches/tools.
  - ex. manual loop splitting to move conditionals out of loops
  - nc++ lambda functions not inlined nicely, giving "unvectorizable" function call --- manually inlining the lambda functions is one way to improve vectorization.
  - I often adapt threading and loop order to allow long VE vector length with "channels" inner loop.
  - typical speedups 10--200x for vectorizing offset calcs.
- have not yet hooked up to libvednn and libblas, as done in v0.16
  - did some preliminary testing with libblas (./build.sh -Cvdd or so).
current design can extend/replace files by adding to a ve/ subdirectory if file mods are too ugly, or full of debug code.
still have to change 'any' format to prefer (say) nchw, and retest.
convolution WIP:
- several compiler workarounds needed for col2im, im2col and esp conv bkw weights to avoid segfaults or wrong answers.
  - code quite fragile wrt. to idempotent code changes reintroducing buggy compilation -- behavior quite sensitive to compile options.
- vectorize offset calcs (for blocked memory formats)
- pull in older speedups for im2col, col2im
convolution forward vectorization work:
- split source files in src/cpu/ve/
- offset and eltwise vectorizations
- nc++-3.0.27 mods
  - several conv loops dumbed down to circumvent compiler optimizations sometimes segfaulting.
  - build.sh -v -vd and -vdd pass all tests/examples/benchdnn (e.g. build.sh -vdTttttt)
    - i.e. -O3 [+ asserts [+ debug]]
  - ./build.sh -vdddd [-O0] may have a compiler ICE
  - ./build.sh -v may also encounter nc++-3.0.31 compiler ICE (not there with nc++-3.0.27?)
  - reinstated gemm convolutions after focusing on ref:any testing & optimization
  - changed min 'malloc' aligment to 8, in prep for vednnx_convolution

Original README:

This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) and Deep Neural Network Library (DNNL).

With the launch of oneAPI we changed the project name and repository location to be consistent with the rest of oneAPI libraries:

Short library name changed to oneDNN.

Repository moved from intel/mkl-dnn to oneapi-src/oneDNN. Existing links to the code and documentation will continue to work.

There are no changes to the API, environment variables, or build options planned at this point.

oneAPI Deep Neural Network Library (oneDNN) is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel Architecture Processors and Intel Processor Graphics. Support for other architectures such as Arm* 64-bit Architecture (AArch64) is experimental. See the System Requirements section below.

oneDNN is intended for deep learning applications and framework developers interested in improving application performance on Intel CPUs and GPUs. Deep learning practitioners should use one of the applications enabled with oneDNN:

Documentation

Developer guide explains programming model, supported functionality, details of primitives implementations and includes annotated examples.
API reference provides comprehensive reference of the library API.

Installation

Pre-built binaries for Linux*, Windows*, and macOS* are available for download in the releases section. Package names use the following convention:

OS	Package name
Linux	`dnnl_lnx_<version>_cpu_<cpu runtime>[_gpu_<gpu runtime>].tgz`
Windows	`dnnl_win_<version>_cpu_<cpu runtime>[_gpu_<gpu runtime>].zip`
macOS	`dnnl_mac_<version>_cpu_<cpu runtime>.tgz`

Several packages are available for each operating system to ensure interoperability with CPU or GPU runtime libraries used by the application.

Configuration	Dependency
`cpu_iomp`	Intel OpenMP runtime
`cpu_gomp`	GNU* OpenMP runtime
`cpu_vcomp`	Microsoft Visual C OpenMP runtime
`cpu_tbb`	Threading Building Blocks (TBB)

The packages do not include library dependencies and these need to be resolved in the application at build time. See the System Requirements section below and the Build Options section in the developer guide for more details on CPU and GPU runtimes.

If the configuration you need is not available, you can build the library from source.

System Requirements

oneDNN supports platforms based on the following architectures:

WARNING

Arm 64-bit Architecture (AArch64) support is experimental with limited testing validation.

The library is optimized for the following CPUs:

Intel Atom processor with Intel SSE4.1 support
4th, 5th, 6th, 7th, and 8th generation Intel(R) Core(TM) processor
Intel(R) Xeon(R) processor E3, E5, and E7 family (formerly Sandy Bridge, Ivy Bridge, Haswell, and Broadwell)
Intel(R) Xeon Phi(TM) processor (formerly Knights Landing and Knights Mill)
Intel Xeon Scalable processor (formerly Skylake and Cascade Lake)
future Intel Xeon Scalable processor (code name Cooper Lake)

On a CPU based on Intel 64 or AMD64 architecture, oneDNN detects the instruction set architecture (ISA) at runtime and uses just-in-time (JIT) code generation to deploy the code optimized for the latest supported ISA.

WARNING

On macOS, applications that use oneDNN may need to request special entitlements if they use the hardened runtime. See the linking guide for more details.

The library is optimized for the following GPUs:

Intel HD Graphics
Intel UHD Graphics
Intel Iris Plus Graphics

Requirements for Building from Source

oneDNN supports systems meeting the following requirements:

Operating system with Intel 64 architecture support
C++ compiler with C++11 standard support
CMake 2.8.11 or later
Doxygen 1.8.5 or later to build documentation

Configurations of CPU and GPU engines may introduce additional build time dependencies.

CPU Engine

oneDNN CPU engine is used to execute primitives on Intel Architecture Processors, 64-bit Arm Architecture (AArch64) processors, and compatible devices.

The CPU engine is built by default and cannot be disabled at build time. The engine can be configured to use the OpenMP or TBB threading runtime. The following additional requirements apply:

OpenMP runtime requires C++ compiler with OpenMP 2.0 or later standard support
TBB runtime requires Threading Building Blocks (TBB) 2017 or later.

Some implementations rely on OpenMP 4.0 SIMD extensions. For the best performance results on Intel Architecture Processors we recommend using the Intel C++ Compiler.

GPU Engine

Intel Processor Graphics is supported by the oneDNN GPU engine. The GPU engine is disabled in the default build configuration. The following additional requirements apply when GPU engine is enabled:

OpenCL* runtime library (OpenCL version 1.2 or later)
OpenCL driver (with kernel language support for OpenCL C 2.0 or later) with Intel subgroups extension support

Runtime Dependencies

When oneDNN is built from source, the library runtime dependencies and specific versions are defined by the build environment.

Linux

Common dependencies:

System C/C++ runtime (libc.so, libstdc++.so)
Dynamic Linking Library (libdl.so)
C Math Library (libm.so)
POSIX Threads Library (libpthread.so)

Runtime specific dependencies:

Runtime configuration	Compiler	Dependency
`DNNL_CPU_RUNTIME=OMP`	GCC	GNU OpenMP runtime (libgomp.so)
`DNNL_CPU_RUNTIME=OMP`	Intel C/C++ Compiler	Intel OpenMP runtime (libiomp5.so)
`DNNL_CPU_RUNTIME=OMP`	Clang	Intel OpenMP runtime (libiomp5.so)
`DNNL_CPU_RUNTIME=TBB`	any	TBB (libtbb.so)
`DNNL_GPU_RUNTIME=OCL`	any	OpenCL runtime (libOpenCL.so)

Windows

Common dependencies:

Microsoft Visual C++ Redistributable (msvcrt.dll)

Runtime specific dependencies:

Runtime configuration	Compiler	Dependency
`DNNL_CPU_RUNTIME=OMP`	Microsoft Visual C++ Compiler	No additional requirements
`DNNL_CPU_RUNTIME=OMP`	Intel C/C++ Compiler	Intel OpenMP runtime (iomp5.dll)
`DNNL_CPU_RUNTIME=TBB`	any	TBB (tbb.dll)
`DNNL_GPU_RUNTIME=OCL`	any	OpenCL runtime (OpenCL.dll)

macOS

Common dependencies:

System C/C++ runtime (libc++.dylib, libSystem.dylib)

Runtime specific dependencies:

Runtime configuration	Compiler	Dependency
`DNNL_CPU_RUNTIME=OMP`	Intel C/C++ Compiler	Intel OpenMP runtime (libiomp5.dylib)
`DNNL_CPU_RUNTIME=TBB`	any	TBB (libtbb.dylib)

Validated Configurations

CPU engine was validated on RedHat* Enterprise Linux 7 with

GNU Compiler Collection 4.8, 5.4, 6.1, 7.2, and 8.1
Clang* 3.8.0
Intel C/C++ Compiler 17.0, 18.0, and 19.0

on Windows Server* 2012 R2 with

Microsoft Visual C++ 14.0 (Visual Studio 2015 Update 3)
Intel C/C++ Compiler 17.0 and 19.0

on macOS 10.13 (High Sierra) with

Apple LLVM version 9.2 (XCode 9.2)
Intel C/C++ Compiler 18.0 and 19.0

GPU engine was validated on Ubuntu* 18.04 with

GNU Compiler Collection 6.1 and 8.1
Clang 3.8.1
Intel C/C++ Compiler 19.0
Intel SDK for OpenCL applications 2019 Update 3
Intel Graphics Compute Runtime for OpenCL 19.37.14191

on Windows Server 2019 with

Microsoft Visual C++ 14.0 (Visual Studio 2015 Update 3)
Intel C/C++ Compiler 19.0
Intel SDK for OpenCL applications 2019 Update 3
Intel Graphics - Windows 10 DCH Drivers 26.20.100.6709

Requirements for Pre-built Binaries

See README included into corresponding binary package.

Support

Please submit your questions, feature requests, and bug reports on the GitHub issues page.

You may reach out to project maintainers privately at dnnl.maintainers@intel.com.

Contributing

We welcome community contributions to oneDNN. If you have an idea on how to improve the library:

For changes impacting the public API, submit an RFC pull request.
Ensure that the changes are consistent with the code contribution guidelines and coding style.
Ensure that you can build the product and run all the examples with your patch.
Submit a pull request.

For additional details, see contribution guidelines.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

oneDNN is licensed under Apache License Version 2.0. Refer to the "LICENSE" file for the full license text and copyright notice.

This distribution includes third party software governed by separate license terms.

3-clause BSD license:

Apache License Version 2.0:

MathJax

Boost Software License, Version 1.0:

Boost C++ Libraries

This third party software, even if included with the distribution of the Intel software, may be governed by separate license terms, including without limitation, third party license terms, other Intel software license terms, and open source software license terms. These separate license terms govern your use of the third party programs as set forth in the "THIRD-PARTY-PROGRAMS" file.

Legal Information

Name		Name	Last commit message	Last commit date
Latest commit History 5,098 Commits
.github		.github
.gitlab		.gitlab
Platform		Platform
cmake		cmake
dev		dev
doc		doc
examples		examples
in		in
include		include
logs		logs
scripts		scripts
segfault		segfault
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.binary.in		README.binary.in
README.lrn		README.lrn
README.md		README.md
THIRD-PARTY-PROGRAMS		THIRD-PARTY-PROGRAMS
_clang-format		_clang-format
build.sh		build.sh
conv-bench.sh		conv-bench.sh
gdb.sh		gdb.sh
log4crc		log4crc
smin.sum		smin.sum
vetest.sh		vetest.sh

License

necla-ml/gen-dnn

Folders and files

Latest commit

History

Repository files navigation

oneAPI Deep Neural Network Library (oneDNN)

default branch changed from "master" to ve-v1.4

VE Aurora port : v1.4

Original README:

Documentation

Installation

System Requirements

Requirements for Building from Source

CPU Engine

GPU Engine

Runtime Dependencies

Linux

Windows

macOS

Validated Configurations

Requirements for Pre-built Binaries

Support

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages