Skip to content

@nikraj01 nikraj01 released this Jan 17, 2020

ArmNN 19.08.01 Release Notes

This is an incremental release of ArmNN 19.08 to fix CTS issues.

ArmNN SDK

New Features:

TfLite Parser:

Public API Changes:

Backend API Changes:

Other changes:

Known issues:

Android NNAPI driver

Deprecated features:

New Features:

Other changes:

All errors and crashes occurring on the 19.08 release when running the Android Compliance Test Suite (CTS) R2 on Android 10 (Android Q) have been fixed, including:

  • Driver termination during TestRandomGraph when using GPU acceleration (ie. ARMNN_COMPUTE_CL_ENABLE:=1)
  • Some TestRandomGraph/RandomGraphTest tests which include CONCATENATION and L2_POOLING_2D operators.
  • Some TestRandomGraph/RandomGraphTest tests which include operators taking the optional data layout argument if the argument is present and set to NCHW.
  • Some TestRandomGraph/RandomGraphTest tests which include operators using FLOAT16 input.
  • Some TestRandomGraph/RandomGraphTest tests which include RESIZE_BILINEAR operators.
  • Some TestRandomGraph/RandomGraphTest tests which include RESIZE operators.
  • Some TestRandomGraph/RandomGraphTest tests which include RESIZE_NEAREST_NEIGHBOR operators.
  • Some TestRandomGraph/RandomGraphTest tests which include SPACE_TO_DEPTH operators.
  • TestRandomGraph/SingleOperationTest#ADD_V1_0/31
  • TestRandomGraph/SingleOperationTest#MUL_V1_0/31
  • TestRandomGraph/SingleOperationTest#SUB_V1_2/31
  • TestRandomGraph/SingleOperationTest#STRIDED_SLICE_V1_2/17
  • TestRandomGraph/SingleOperationTest#PRELU_V1_2/14
  • Several other errors occurring in Activations when debug.nn.partition is set to 2

Backend API Changes:

Known Issues:

Assets 2

@nikraj01 nikraj01 released this Dec 5, 2019

New Features:

  • Added Abs support to CpuRef, CpuAcc and GpuAcc backend.
  • Added Comparison support to CpuRef, covering the following operations: Equal, Greater, GreaterOrEqual, Less, LessOrEqual, NotEqual. Refactored the Equal and Greater layers previously present in terms of the new Comparison layer.
  • Added Rsqrt support to CpuAcc and GpuAcc backend.
  • Added ArgMinMax support to CpuRef, CpuAcc and GpuAcc backend.
  • Added InstanceNormalization support to CpuRef, CpuAcc and GpuAcc backend.
  • Added LogSoftmax support to CpuRef backend.
  • Added Slice support to CpuAcc backend.
  • Added DepthToSpace support to CpuRef, CpuAcc and GpuAcc backend.
  • Added StandIn Layer which is a layer to represent "unknown" or "unsupported" operations in the input graph. StandIn layer has a configurable number of input and output slots. No workloads created for StandIn layer.
  • Added QSymm8PerAxis support for Encoder and Decoder.
  • Added per-channel quantization support for Convolution2d, DepthwiseConvolution2d and TransposeConvolution2d on CpuRef backend.
  • Added FSRCNN support to CpuRef (fp32 and uint8), CpuAcc (fp32 and uint8) and GpuAcc (fp32) backend.
  • Added initial external profiling support. A new ProfilingService class allows to connect to an external profiling service and to exchange an initial set of counter metadata, such as advertising a list of counters the client can select from, and periodically send the values of the selected counters to the client. The profiling support is compatible with DS5 and Streamline clients. The profiling service relies on gatord to forward the packets to the external profiling server.
  • Added utility functions for creating Timeline Packets:
    • Timeline Label Binary Packet
    • Timeline Entity Binary Packet
    • Timeline Event Class Binary Packet
    • Timeline Message Directory Package
    • Timeline Event Binary Packet
  • Added SendTimelinePacket implementation to send Timeline Packets:
    • Timeline Label Binary Packet
    • Timeline Entity Binary Packet
    • Timeline Event Class Binary Packet
    • Timeline Message Directory Package
    • Timeline Event Binary Packet
  • Added TimelineUtilityMethods class to manage profioling entities
  • Added utility function to create a named typed entity
  • Added utility function to create a named typed child entity
  • Added utility function to create a typed label
  • Added utility function to declare a label
  • Added utility function to record an event
  • Added Timeline Decoder
  • Added ITimelineDecoder C interface
  • Added an example implementation of ITimelineDecoder
  • Added command handlers for the timeline directory and objects

TfLite Parser:

  • Added support for Transpose.
  • Added support for parsing unsupported layers by representing them as a placeholder StandInLayer in the resulting Armn NN network. Please note that such networks will not be executable, as there are no workloads for StandInLayer – its only purpose is to maintain the original network topology.
  • Fixed a bug in parsing custom layers that caused the TfLiteParser to attempt to parse all custom layers as a DetectionPostProcess layer. Now unsupported custom layers are parsed as a StandInLayer – similarly to unsupported built-in layers.
  • Added support for Slice.

Public API Changes:

Backend API Changes:

  • New CreateTensorHandle functions have been added to ITensorHandleFactory to allow for the creation of TensorHandles with unmanaged memory.

Other changes:

  • Modified ExecuteNetwork so that it can generate dummy input data if no input data files are specified. This can be useful when the user is not interested in inference results, but in performance metrics or if they only wish to see whether Arm NN can execute a certain network.
  • CTS bug fix in pooling layers on assessing when the kernel is solely over padding values.
  • Change to algorithm for calculating subgraphs to submit to backends for optimisation to remove dependency cycles and unwanted subgraph splitting.
  • Added Encoder and Decoder support to Dequantize layer.

Known issues:

Assets 2

@Surmeh Surmeh released this Sep 2, 2019 · 11 commits to branches/armnn_19_08 since this release

New Features:

  • Added Dequantize layer support to CpuAcc and GpuAcc backend
  • Added Quantize layer support to CpuAcc and GpuAcc backend
  • Added Quantized_LSTM layer support to CpuAcc and GpuAcc backend
  • Added PReLU layer support to CpuRef, CpuAcc and GpuAcc backend
  • Added ResizeNearestNeighbor layer support to CpuRef, CpuAcc and GpuAcc backend
  • Added SpaceToDepth layer support to CpuRef, CpuAcc and GpuAcc
  • Added StridedSlice layer support to CpuAcc backend
  • Added TransposeConvolution2d layer support to CpuRef and GpuAcc backend
  • Added customizable Padding support to CpuAcc and GpuAcc backend.
  • Added QuantisedAsymm8 support to the following reference workloads:
    • L2Normalization
    • PReLU
    • Rsqrt
    • SpaceToDepth
  • Added QuantisedSymm16 support to the following reference workloads:
    • BatchNormalization
    • BatchToSpaceNd
    • DetectionPostProcess
    • Floor
    • FullyConnected
    • Gather
    • L2Normalization
    • Mean
    • Normalization
    • Pad
    • Permute
    • Pooling2d
    • PReLU
    • Reshape
    • Resize
    • Rsqrt
    • Softmax
    • SpaceToBatchNd
    • SpaceToDepth
    • Splitter
    • StridedSlice
  • Added layer normalization support for Ref, CL and Neon Lstm workload
  • Added dilated convolution2d support for CL and Neon
  • Added axis support for Softmax for Ref backend.
  • The reference backend can now be built optionally as all the other backends, it's enabled by default in the global makefile cmake/GlobalConfig.cmake (unlike all the other backends).
    To enable/disable it, use the new ARMNNREF CMake option (for example, add "-DARMNNREF=0" to disable it).
    Or alternatively, to make the any change "permanent", change ArmNN's global makefile (cmake/GlobalConfig.cmake) accordingly, like:
    option(ARMNNREF "Build with ArmNN reference support" ON) the default, or:
    option(ARMNNREF "Build with ArmNN reference support" OFF) to disable the reference backend
    Disabling the reference backend will impact some of the unit tests that are built with ArmNN, as many of them use the reference backend as a way to perform cross-verification and end-to-end tests.
    Follow the usage of ARMNNREF through the makefiles and ARMNNREF_ENABLED in the code to know which unit tests may be excluded if the reference backend is disabled.
  • Added dynamic backend loading support, backends can now be loaded dynamically at runtime.
    Updated the readme file at src/backends/README.md to explain the feature.
    The public release note with technical details on the implementation: https://developer.mlplatform.org/w/arm_nn/design_notes/dynamic_backend_loading/

TfLite Parser:

  • Added support for L2Normalization, TransposeConvolution2d

Public API Changes:

Backend API Changes:

  • Added GetAPIVersion method to retrieve the current version of the Backend API.
  • Added BackendVersion object to handle the Backend API and the dynamic backend versions.
  • Added notes in the README file to describe the base interface for dynamic backends and the versioning strategy to enforce ABI compatibility.
  • Added notes in the README file to describe how to specify the paths where to load the dynamic backends from.
  • Added notes in the README file to describe the naming convention the dynamic backends files should comply with in order to be processed by ArmNN.
  • Any available/valid dynamic backend is now loaded during the Runtime object creation, and added to the Backend Registry.
  • Added a reference dynamic backend as an example, including unit tests, that can be used by the customers as a guideline for creating their own dynamic backends.

Other changes:

Known issues :

  • Some Neon Quantized LSTM and Reference unit test failures can occur under Raspberry Pi configurations. We are continuing to investigate this and will fix the problem in a future release.

Assets 2

@Surmeh Surmeh released this May 31, 2019

New Features:

  • Added Caffe, Onnx, and TfLite Support to Armnn Converter executable.
  • Added support for QuantisedSymm16 data type.
  • Added new quantization scheme for QuantisedSymm16 quantization target.
  • Added QuantisedSymm16 support to the following workloads:
    • Reference Elementwise Workload (Addition, Subtraction, Division, Multiplication, Maximum, Minimum, Greater, and Equal Operators).
    • Reference Activation Workloads (Linear, Sigmoid ReLU, SoftReLU, BoundedReLU, LeakyReLU, Sqrt, Square, Abs, Tanh).
    • Reference LSTM Workload.
    • Reference Concat Workload.
    • Reference Constant Workload.
    • Reference Convolution2D Workload.
    • Reference DepthwiseConvolution2d Workload.
  • Extended QuantizerVisitor class to support customizable quantization scheme (selected based on the parameter from QuantizerOptions struct).
  • Extended QuantizerVisitor class to support an option to preserve input and output types by inserting Quantize and Dequantize layers to the quantized network.
  • Dequantize layer support for CpuRef backend
  • Quantize layer support for CpuRef backend
  • Added support for attaching custom callback function to the Debug layer
  • Added new method RegisterDebugCallback(...) to IRuntime which allows a custom callback function to be attached to the Debug layer.
  • Added CpuAcc and GpuAcc support for merging height in NCHW or width in NHWC cases.
  • Added CpuAcc support for the sigmoid activation function.
  • Added TfLite Parser support for :
    • Rank-0 operands.
    • Split operator.
    • Unpack operator.
    • TanH operator.
  • Added support for TfLite DeepSpeech v1 model.
  • Support for Serialization / Deserialization of the following ArmNN layers:  
    • Normalization
    • BatchNormalization
    • L2 Normalization
    • Minimum
    • Maximum
    • Equal
    • Rsqrt
    • Floor
    • Greater
    • ResizeBilinear
    • Subtraction
    • StridedSlice
    • Mean
    • Merger (concat)
    • Splitter
    • DetectionPostProcess
    • LSTM


Public API Changes:

  • Implemented QuantizerOptions struct to enable customization of network quantization process.
  • Updated Create(...) and CreateRaw(...) static method of the INetworkQuantizer class to take an additional QuantizerOptions argument (default value provided).
  • Added Quantization Scheme parameter to the Armnn Quantizer Command Line tool, valid options are QAsymm8 and QSymm16. The default scheme should be QAsymm8
  • Added type preservation parameter to the Armnn Quantizer tool 
  • Updated the EraseLayer methods in the Graph API, they no longer return an iterator 
  • The GetOutput method of the ISubgraphViewConverter interface has been renamed to CompileNetwork
  • Updated the Backend API, the old OptimizeSubgraphView method is now deprecated in favor of a new version the returns a more comprehensive OptimizationViews object, containing:
    • A list of successful optimizations, in the form of substitution pairs, associating a SubgraphView representing a portion of the original graph, to a replacent subgraph, also in the form of SubgraphView, containing the substitution layers
    • A list of failed optimizations, in the form of SubgraphView objects
    • A list of untouched subgraphs, in the form of SubgraphView objects
  • The SubGraph class has been renamed (and improved) to SubgraphView, the old definition is now kept as a deprecate alias of SubgraphView
  • The method CreateSubGraphConverter of the backend API has been deprecated and it's no longer used by any backend implementation
  • INetwork.hpp: AddMergerLayer has been deprecated and replaced by AddConcatLayer
  • ILayerSupport.hpp and LayerSupport.hpp: IsMergerSupported has been deprecated and replaced by IsConcatSupported
  • ILayerVisitor.hpp: a default implementation of VisitConcatLayer which calls VisitMergerLayer has been provided to ease migration
  • LayerVisitorBase.hpp: VisitConcatLayer method added


Backend API Changes:

  • The SubGraph class has been renamed SubgraphView.
  • The method "SubgraphUniquePtr IBackendInternal::OptimizeSubgraph(const Subgraph& subgraph, bool& optimizationAttempted) const" has been deprecated and should be replaced with the new method "OptimizationViews IBackendInternal::OptimizeSubgraphView(const SubgraphView& subgraph) const"


Other changes:

  • ExecuteNetwork improvements:
    • Added support for FP16 turbo mode
    • Outputs total inference time
    • Added threshold time
    • Improvements to error reporting
  • Fixed issue where layers can fail on CpuAcc or GpuAcc backends due to merger layer corrupting data layouts
  • Fixed issue where the tensor numDimensions must be greater than 0
  • Fixed issue where backend optimizer would create subgraphs with circular dependencies. This is relevant for backend developers using the SubGraphSelector (now renamed to SubgraphViewSelector) API .
  • There is some initial work on supporting flow control in neural networks via Switch and Merge layers. This API has been added to the release for the purpose of gathering feedback, and is currently non-functional. Switch and Merge layers can be added via the INetwork interface but attempting to load a network with these operators via the Optimize and IRuntime interfaces will fail.


Known issues:

  • NeonTimerMeasure unit test error on Raspberry Pi. We are continuing to investigate and will fix the problem in a future release.
  • TensorFlow version must be fixed to 8f593c48c84a2d52d2ba8becf2eaef20250325a0 version for Raspberry Pi.
Assets 2

@TelmoARM TelmoARM released this Mar 8, 2019 · 6 commits to branches/armnn_18_11 since this release

New Features:

  • Maximum operator support for CpuRef and CpuAcc backend.
  • Minimum operator support for CpuRef, CpuAcc and GpuAcc backend.
  • Maximum operator support for TensorFlow parser.
  • Pad operator support for TensorFlow parser.
  • ExpandDims operator support for TensorFlow parser.
  • Sub operator support for TensorFlow parser.
  • BatchToSpace operator support for GpuAcc backend.
  • StridedSlice operator support for CpuRef, GpuAcc and CpuAcc backend.
  • SpaceToBatchNd operator support for GpuAcc backend. Some padding configuration is currently not interpret correctly
  • Greater operator support for CpuRef, GpuAcc and CpuAcc backend.
  • Greater operator support for TensorFlow parser.
  • Equal operator support for CpuRef backend.
  • Equal operator support for TensorFlow parser.
  • AddN operator support for TensorFlow parser.
  • Split operator support for Tensorflow parser.STRIDED_SLICE
  • Reciprocal of square root (Rsqrt) operator support for CpuRef backend.
  • Mean operator support for TensorFlow parser.
  • ResizeBilinear operator support for CpuAcc backend.
  • Logistic support for TensorFlow Lite parser.
  • Logistic support for GpuAcc backend.
  • Gather operator support for CpuRef backend.
  • Gather operator support for TensorFlow parser.
  • TensorFlow Lite parser support for BatchToSpace operator.
  • TensorFlow Lite parser support for Maximum operator.
  • TensorFlow Lite parser support for Minimum operator.
  • TensorFlow Lite parser support for ResizeBilinear operator.
  • TensorFlow Lite parser support for SpaceToBatch operator.
  • TensorFlow Lite parser support for StridedSlice operator.
  • TensorFlow Lite parser support for Sub operator.
  • TensorFlow Lite parser support for concatenation on tensors with rank other than 4
  • TensorFlow Lite parser support for Detection Post Process.
  • TensorFlow Lite parser support for Reciprocal of square root (Rsqrt).
  • Detection Post Process custom operator Reference implementation added.
  • Support for Serialization / Deserialization of the following ArmNN layers:
    • Activation
    • Addition
    • Constant
    • Convolution2d
    • DepthwiseConvolution2d
    • FullyConnected
    • Multiplication
    • Permute
    • Pooling2d
    • Reshape
    • Softmax
    • SpaceToBatchNd
  • New executable to convert network from TensorFlow Protocol Buffers to ArmNN format
  • New C++ Quantization API, supported layers are:
    • Input
    • Output
    • Addition
    • Activation
    • BatchNormalization
    • FullyConnected
    • Convolution2d
    • DepthwiseConvolution2d
    • Softmax
    • Permute
    • Constant
    • StridedSlice
    • Splitter
    • Pooling2d
    • FullyConnected
    • Reshape
    • eMerger
    • SpaceToBatch
    • ResizeBilinear

Public API Changes:

  • Support for the boolean data types. These are specified as 8-bit unsigned integers where zero (all bits off) represents false and any non-zero value (any bits on) represents true.
  • AddRsqrtLayer() method added to the graph builder API.
  • The profiling event now uses BackendId instead of Compute to identify the backend. BackendId is a wrapper class for the string that identifies a backend, and it is provided by the backend itself, rather than being statically enumerated like Compute.
  • Added the new method OptimizeSubGraph to the backend interface that allows the backends to apply their specific optimizations to a given sub-grah.
  • The old way backends had to provide a list optimizations to the Optimizer (through the GetOptimizations method) is still in place for backward compatibility, but it's now considered deprecated and will be remove in a future release.
  • Added the new interface class INetworkQuantizer for the Quantization API exposing two methods
    OverrideInputRange: allowing the caller to replace the quantization range for a specific input layer
    ExportNetwork: returning the quantized version of the loaded network

Known issues:

  • Large graphs with many branches and joins can take an excessive time to load, or cause a software hang while loading into ArmNN. This issue affects versions of ArmNN from 18.11 onwards. We are continuing to investigate and will fix the problem in a future release. Models known to be affected include Inception v4 and Resnet V2 101.

  • Merge layer with 8-bit quantized data where the tensors to be merged have different quantization parameters does not work on the GpuAcc or CpuAcc backends. This is known to affect quantised Mobilenet-SSD models, and some quantized Mobilenet v2 models.

Assets 2

@Surmeh Surmeh released this Nov 28, 2018

New Features:

• Addition support for 8-bit tensors on the GpuAcc backend
• FullyConnected support for 8-bit tensors on the GpuAcc backend
• Division support for the GpuAcc backend.
• Subtraction support for the GpuAcc and CpuAcc backends.
• Arithmetic Mean operator support for the GpuAcc.
• Pad operator support for GpuAcc and CpuRef backends.
• SpaceToBatchNd operator support for CpuRef backend.
• BatchToSpaceNd operator support for CpuRef backend.
• Added support for NHWC Normalization with 'cross channels' method, including CpuRef backend support. NHWC data layout is not yet supported for 'Within channels' normalization method on any backend.
• Added support for NHWC ResizeBilinear for the CpuRef and GpuAcc backends
• Added support for NHWC Convolution2d for the CpuRef and GpuAcc backends.
• Added support for NHWC DepthwiseConvolution.
• Added support for NHWC Pooling2d for the CpuRef, GpuAcc and Neon backends
• Added support for NHWC L2Normalization.
• Added support for NHWC BatchNormalization.
• Added support for Float32 LSTM for CpuRef backend.
• Added CONCATENATION, FULLY_CONNECTED, MAX_POOL_2D, RELU, RELU6, RESHAPE operators support to the TfLite Parser.
• Added Fully Connected Support for 8-bit tensors on the CpuAcc becked.
• Added arbitrary axis support for the Merger Layer.

Public API Changes:

• armnn::Optional helper class was introduced and used in the IsDepthwiseConvolutionSupported(...) and IsConvolution2dSupported(...) functions to represent optional biases
• The IsXXXSupported(...) free functions now take a BackendId instead of the Compute enum. Backward compatibility is maintained through the automatic conversion from the Compute to the BackendId type.
• The Compute enum and the IsXXXSupported(...) free functions are being deprecated in favor of the IBackend and ILayerSupport interfaces, which provide the same functionality in a more flexible and extensible manner. The deprecated functions will be removed in a future release.

Other changes:

• An issue has been fixed where Profiler JSON output would report units of milliseconds but the data was actually in microseconds.

Assets 2

@TelmoARM TelmoARM released this Aug 31, 2018 · 1363 commits to master since this release

This release of Arm NN integrates the latest Compute Library and adds improvements to thread-safety, memory consumption and overall performance.

New Features:

  • The amount of system memory needed for a loaded network has been reduced compared to Release 18.05.
  • Support for LSTM operator.
  • Support for 16-bit floating point including:
  • Support for 16-bit floating point weights and bias tensors in ModelBuilder (INetwork) API
  • Optimiser option to automatically convert 32-bit floating point models to 16-bit floating point where supported.
  • Support for computing inference in 16-bit floating point precision.
  • Support for Tensorflow Lite parser including additional operator support for :
    • AVERAGE_POOL_2D
    • CONV_2D
    • DEPTHWISE_CONV_2D
    • SOFTMAX
    • SQUEEZE
  • Support for ONNX parser including additional layer support for:
    • Addition
    • Convolution
    • MatMul
    • Max Pool
    • Constant
    • Relu
    • Reshape
  • More detailed profiling with JSON output format support.
  • Captures CL and Neon kernel level events

Public API Changes:

  • API for creating a Runtime object has changed. It no longer takes an armnn::Compute argument but instead requires a CreationOptions object. (See include/armnn/IRuntime.hpp)
  • The Optimize function now takes an additional 2 parameters (See include/armnn/INetwork.hpp)
  • The backendPreferences which is a vector of compute devices that the user wants to execute the workloads on in preference order. The optimize function will attempt to use the first backend in the list, only falling back to subsequent backends if the first does not support the layer. e.g. a preference list of GpuAcc, CpuAcc will attempt to execute on the Mali GPU, falling back to a v7/v8 ARM CPU if the workload in question is not supported by the GPU
  • (Optional) OptimizerOptions parameter which contains the flag to convert a 32-bit floating point model to 16-bit floating point automatically.

Other changes:

  • This release of ArmNN requires at least release 18.08 of the Compute Library.
  • Fixed an issue where a 4d softmax causes entire network to fail conversion.
  • Fixed ParseFlatbuffersFixture to pass quantized input/output properly
  • Fixed thread-safety of runtime.
  • Fixed Mobilenet caffe model crashing when GpuAcc is selected as compute device
  • Fixed failing NetworkTests when CL support is on but Neon support is off
Assets 2

@Surmeh Surmeh released this Jul 5, 2018 · 1364 commits to master since this release

This patch release updates the Arm NN makefiles to allow it to build on both Android O and P.

Assets 2

@Surmeh Surmeh released this Jun 11, 2018 · 1365 commits to master since this release

  • Fixed broken links for developer.arm guides
  • Added build guide for ArmNN using Android NDK
Assets 2

@TelmoARM TelmoARM released this May 23, 2018 · 1366 commits to master since this release

This release of Arm NN integrates the latest Compute Library and adds improvements to thread-safety, memory consumption and overall performance.

New Features:

  • In general, the amount of RAM needed for a loaded network has been reduced by 20-30% compared to release 18.03.
  • The latest 8-bit quantized operations from Compute Library have been integrated. In testing, 8-bit quantized mobilenets models are 3x faster compared to release 18.03.
  • It is now supported to load and unload graphs simultaneously from multiple threads. In other words, the methods IRuntime::LoadNetwork() and IRuntime::UnloadNetwork() are thread-safe.

Public API Changes:

  • IsConvolution2dSupported requires additional TensorInfo arguments describing the output and bias tensors.

Other changes:

  • This release of ArmNN requires at least release 18.05 of the Compute Library.
  • Fixed an issue where pooling operations with different pooling width and height would produce the wrong output.
  • Fixed an issue in the Caffe parser where BatchNormalization would return the wrong results when the rolling average factor was non-zero.
  • Fixed the known issue in 18.03 where the multiplication layer could not support tensors of different shapes.
Assets 2
You can’t perform that action at this time.