Release 18.08
This release of Arm NN integrates the latest Compute Library and adds improvements to thread-safety, memory consumption and overall performance.
New Features:
- The amount of system memory needed for a loaded network has been reduced compared to Release 18.05.
- Support for LSTM operator.
- Support for 16-bit floating point including:
- Support for 16-bit floating point weights and bias tensors in ModelBuilder (INetwork) API
- Optimiser option to automatically convert 32-bit floating point models to 16-bit floating point where supported.
- Support for computing inference in 16-bit floating point precision.
- Support for Tensorflow Lite parser including additional operator support for :
- AVERAGE_POOL_2D
- CONV_2D
- DEPTHWISE_CONV_2D
- SOFTMAX
- SQUEEZE
- Support for ONNX parser including additional layer support for:
- Addition
- Convolution
- MatMul
- Max Pool
- Constant
- Relu
- Reshape
- More detailed profiling with JSON output format support.
- Captures CL and Neon kernel level events
Public API Changes:
- API for creating a Runtime object has changed. It no longer takes an armnn::Compute argument but instead requires a CreationOptions object. (See include/armnn/IRuntime.hpp)
- The Optimize function now takes an additional 2 parameters (See include/armnn/INetwork.hpp)
- The backendPreferences which is a vector of compute devices that the user wants to execute the workloads on in preference order. The optimize function will attempt to use the first backend in the list, only falling back to subsequent backends if the first does not support the layer. e.g. a preference list of GpuAcc, CpuAcc will attempt to execute on the Mali GPU, falling back to a v7/v8 ARM CPU if the workload in question is not supported by the GPU
- (Optional) OptimizerOptions parameter which contains the flag to convert a 32-bit floating point model to 16-bit floating point automatically.
Other changes:
- This release of ArmNN requires at least release 18.08 of the Compute Library.
- Fixed an issue where a 4d softmax causes entire network to fail conversion.
- Fixed ParseFlatbuffersFixture to pass quantized input/output properly
- Fixed thread-safety of runtime.
- Fixed Mobilenet caffe model crashing when GpuAcc is selected as compute device
- Fixed failing NetworkTests when CL support is on but Neon support is off