Skip to content

Release 21.08

Compare
Choose a tag to compare
@nikraj01 nikraj01 released this 26 Aug 16:14

Summary

Arm NN 21.08 was focused on providing new capabilities and improve performance::

  • Added the ability to import protected DMA Buffers and allow Arm NN to run inferences that are in Protected GPU Memory. As well as providing Custom Memory Allocator which supports importing malloc, Dma_buf and protected Dma buffers.
  • Users with multi core NPUs has been given the ability to pin inferences to selected cores giving them the ability to balance parallel workloads across the NPU and increase throughput.
  • Boost has been completely removed from the code base making Arm NN easier to integrate into other software stacks.
  • Added support for non-constant weights and biases on FullyConnected which lay the groundwork for supporting more models.
  • More operators supported on Arm NN, TfLite Parser, TfLite Delegate and Android NNAPI driver.

New Features

  • Moved unit tests from BOOST to doctest.
  • UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added on CpuRef backend.
  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
  • Reduce Operator can now support multiple axes.
  • Optimisation added to fuse PAD Operator into Depthwise Convolution Operator.
  • Added SIN and LOG support to ElementWiseUnary Operator on CpuRef, CpuAcc (Only LOG is supported) and GpuAcc backends.
  • Added SHAPE Operator support on CpuRef backend.
  • Moved useful test utilities to new static library (libarmnnTestUtils.a).
  • Added ability to create multiple LoadedNetworks from one OptimizedNetwork.
  • Arm NN TfLite Delegate Image Classification sample application added to samples directory.
  • Added fully comprehensive Arm NN Operator list page to Doxygen.
  • Added support to allow Arm NN to run inferences that are in Protected GPU Memory.
    • Creation of Protected Memory is handled via a Custom Memory Allocator which supports importing malloc, Dma_buf and protected DMA buffers.

TfLite Parser

  • EXPAND_DIMS Operator support added.
  • PRELU Operator support added.
  • SHAPE Operator support added.
  • Comparison Operator support added (EQUAL, GREATER, GREATER_EQUAL, LESS, LESS_EQUAL and NOT_EQUAL).
  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
  • Added support for shape_signature, which will now be the preferred way to detect dynamic tensors.
    • If creating an instance of the ITfLiteParser and the model used is dynamic, then please ensure that m_InferAndValidate is set in the TfLiteParserOptions and m_shapeInferenceMethod is set to InferAndValidate in the OptimizerOptions.

ArmNN Serializer/Deserializer

  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].
  • Added SIN and LOG support to ElementWiseUnary Operator.
  • UNIDIRECTIONAL_SEQUENCE_LSTM Operator support added.

ExecuteNetwork App Changes

  • Added option to specify what size Arm NN thread pool to use when running inferences asynchronously.
  • Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8.
  • Added option to specify different input data for every iteration of ExecuteNetwork.
  • Added option to print additional information such as the TensorInfo, Descriptor and Convolution method when profiling is enabled.

NOTE: To run dynamic models through ExecuteNetwork the --infer-output-shape flag should be set.

Bug Fixes

  • Removed duplicate check for Dequantize input type when checking if operator is supported.
  • Fixed undefined behaviour in PolymorphicDowncast.
  • Fixed binding of reference to null pointer in RefFullyConnectedWorkload.
  • Fixed PermutationVector.end() to cope with dimensions < 5 in PermutationVector class.
  • Fixed cl_ext.h include path in CL backend.
  • Fixed bugs in PreCompiledLayer. E.g. A new shared_ptr was being created instead of allowing std::move to convert the unique_ptr into a shared_ptr.
  • Fixed gcc 9.3.0 compiler warning in TfLiteParser.
  • Fixed issue so that the BackendRegistry is cleaned up correctly following negative tests.

Other Changes

  • Print Elementwise and Comparison Operator descriptors in a dot graph.
  • Added IsConstant flag to TensorInfo. This should be set if using the new AddFullyConnectedLayer Graph API when weights and bias are constant. An example of this can be found in samples/SimpleSample.cpp.
  • Added support for qasymms8 (int8) and added qasymmu8 (uint8) as alias for qasymm8 to ImageTensorGenerator.

ABI/API Changes

The following front-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading. Due to these changes we have bumped our ARMNN_VERSION to 26.0.0 while also bumping our Parsers and Delegate to 24.2.0 following Semantic Versioning guidelines.

Feature SHA Gerrit Review Resultant ABI/API changes
Rework the async threadpool f364d53 https://review.mlplatform.org/c/ml/armnn/+/5801
    Be aware that these classes are in the experimental namespace and should be treated as such.
    struct INetworkProperties: Field m_NumThreads has been removed from the middle position of this structural type. Size of this type has been changed from 32 bytes to 24 bytes.
    class IWorkingMemHandle: Pure virtual method GetInferenceId ( ) has been removed from this class.
    class IAsyncExecutionCallback: The following methods have been removed:
  • GetEndTime ( ) const
  • GetStartTime ( ) const
  • Wait ( ) const
  • GetStatus ( ) const
Add IsConstant flag to TensorInfo b082ed0 https://review.mlplatform.org/c/ml/armnn/+/5842
    class TensorInfo: Size of this class has been increased from 80 bytes to 88 bytes. This is due to the addition of private member bool m_IsConstant.
    An object of this class can be allocated by applications which the old size will be hardcoded at original compile time. Call of any exported constructor will break the memory of neighboring objects on the stack or heap.
    struct BindingPointInfo: Size of field m_TensorInfo has been changed from 80 bytes to 88 bytes. The fields or parameters of such data type may be incorrectly initialized or accessed by old client applications.
Add protected mode to ArmNN CreationOptions 15fcc7e https://review.mlplatform.org/c/ml/armnn/+/5963
    struct IRuntime::CreationOptions: Field m_ProtectedMode has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add the Custom Memory Allocator interface definition 801e2d5 https://review.mlplatform.org/c/ml/armnn/+/5967
    struct IRuntime::CreationOptions: Field m_CustomAllocator has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
Add front end support for UnidirectionalSequenceLstm on ArmNN 8ed39ae https://review.mlplatform.org/c/ml/armnn/+/5956
    struct LstmDescriptor: Field m_TimeMajor has been added to this type. This field will not be initialized by old clients. Size of the inclusive type has been changed.
JSON profiling output 554fa09 https://review.mlplatform.org/c/ml/armnn/+/5968
    struct INetworkProperties: Field m_ProfilingEnabled has been added to this type. This field will not be initialized by old clients.
ConstTensorsAsInput: FullyConnected 81beae3 https://review.mlplatform.org/c/ml/armnn/+/5942
    class ILayerVisitor: Pure virtual method VisitFullyConnectedLayer ( IConnectableLayer const*, struct FullyConnectedDescriptor const&, char const* ) has been added to this class. The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.The following previously deprecated functions have been removed:
  • INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, ConstTensor const& biases, char const* name)
  • INetwork::AddFullyConnectedLayer(struct FullyConnectedDescriptor const& fullyConnectedDescriptor, ConstTensor const& weights, char const* name)
Adds CustomAllocator interface and Sample App c1c872f https://review.mlplatform.org/c/ml/armnn/+/5987
    struct IRuntime::CreationOptions: Field m_CustomAllocatorMap has been added at the middle position of this structural type. Size of the inclusive type has been changed. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.
    class BackendRegistry: Field m_CustomMemoryAllocatorMap has been added to this type. Size of this type has been changed from 80 bytes to 136 bytes.
Allow profiling details to be switched off during profiling f487486 https://review.mlplatform.org/c/ml/armnn/+/6069
    struct INetworkProperties: Field m_OutputNetworkDetails has been added at the middle position of this structural type. Layout of structure fields has been changed and therefore fields at higher positions of the structure definition may be incorrectly accessed by applications.

The following back-end API changes have occurred during the implementation of 21.08 that users should be aware of before upgrading.

Feature SHA Gerrit Review Resultant ABI/API changes
Refactor the reporting of capabilities from backends b9af86e https://review.mlplatform.org/c/ml/armnn/+/5728
    class IBackendInternal: virtual function GetCapabilities() const has been added, replacing the now deprecated HasCapability() function.
    The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.
Add protected mode to ArmNN CreationOptions 15fcc7e https://review.mlplatform.org/c/ml/armnn/+/5963
    class IBackendInternal: virtual function UseCustomMemoryAllocator() has been added.
    The layout of v-table has been changed. Call of any virtual method at higher position in this class or its subclasses may result in crash or incorrect behavior of applications.

TfLite Delegate

New features

  • PRELU Operator Support added.
  • SHAPE Operator support added.
  • Added Asynchronous Network Execution.
  • Changed weights layout for Depthwise Convolution Operator from [M,I,H,W] to [1,H,W,I*M].

Build Dependencies

Tools Supported Version
Git 2.17.1 or later
SCons 2.4.1 (Ubuntu) 2.5.1 (Debian)
Cmake 3.5.1 (Ubuntu) and 3.7.2 (Debian)
Tensorflow 2.3.1
Onnx 1.6.0
Flatbuffer 1.12.0
Protobuf 3.12.0
Android NDK r20b
mapbox/variant 1.2.0

Android 12 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-12 SP1A.210812.003 r32p1_01eac0 12_r1 (eng.upr473.20210901.005349)1 12_r1 (eng.upr473.20210901.024841)

1: CtsNNAPITestCases with Mali Driver r32p1_01eac0. The following test is known to be failing: AddTwoWithHardwareBufferInputWithGPUUsage. Investigations indicate this failure is due to Android NN HAL utilizing Gralloc functionality not required by the Gralloc API. This issue has been raised with Google Android team, and is tracked as https://partnerissuetracker.corp.google.com/issues/202025253. Please quote Arm reference MIDCET-3783 when discussing this issue.

Android 11 Compatibility Testing was performed using the following:

Android Tag Android Build ID Mali Driver Android Compatibility Test Suite Android Vendor Test Suite
android-11.0.0_r1 RP1A.200720.009 r31p0_01eac0 11_r4 (7352019) 11_r4(7337463)
android-11.0.0_r6 RPM1.210413.002 r32p0_01eac0 11_r4 (7352019) 11_r4 (7337463)
android-11.0.0_r6 RPM1.210413.002 r33p0_01eac0 11_r4 (7352019) 11_r4 (7337463)

Android 10 Compatibility Testing was performed using the following:

Androidtag Android Build ID Mali Driver
android-10.0.0_r39 QQ3A.200605.002.A1 R23P0_01REL0