09_02_06_00

Latest

Latest

vtrip97 released this 08 Apr 14:42

New in this Release

Description	Notes
Support for new device J722S/AM67A
Support for vision transformer models (Deit, Swin, DETR) Added/optimized new operators : Matmul, broadcasted (matmul, eltwise), 2D softmax, Layernorm, Patch embedding, Patch Merging, GeLU, SiLU	TDA4VM has limited validation, refer TIDL-3867
Support for ConvNext and YoloV8 model architecture Added/optimized new operators : Object detection layer for YoloV8	TDA4VM has limited validation for matmul with variable input, refer TIDL-3867
Improved robustness for low latency inference mode (advanced_options:inference_mode = TIDL_infereneModeLowLatency)	Only applicable for AM69A/J784S4
Support for non-linear activation functions ( Tanh, Sigmoid, Softmax, GELU, ELU, SiLU ) for AM62A and J722S	Other devices already have in previous release(s)
Optimization of scatterND sum operator
Migration TFLite-RT version 2.12

Fixed in this Release

ID	Description	Affected Platforms
TIDL-2950	7x7 Depthwise separable convolution with number of groups greater than panelWidth / kernel rows , results in wrong output on EVM/Target	All except AM62, TDA4VM
TIDL-3873	Transpose behavior for different combinations is not stable	All except AM62
TIDL-3874	Matmul operator has issues with (A) With variable/activations as inputs (B) different dimensions	All except AM62
TIDL-3833	Model inference gets stuck in conv layer on target/EVM, works on host emulation with below warning during init stage of inference : WARNING: srcJoint freq greater than mapping buffer space. Might cause overflow!	All except TDA4VM, AM62
TIDL-3831	Max Pool with asymmetric stride on target/EVM has functional mismatch with host emulation, target behavior is incorrect	All except AM62
TIDL-3812	TVM_DLR : Models with two dimensional softmax have functional issue	All except AM62
TIDL-3773	Layers with multiple consumers running in asymmetric quantization gives wrong output if any of the consumers do not support assyemtric quantization	All except AM62, TDA4VM
TIDL-3747	Resize layer with "coordinate_transformation_mode": "align_corners" not supported in TIDL	All except AM62
TIDL-3714	Protobuf version is not in sync with what is required for model compilation	All except AM62
TIDL-3679	Model compilation fails with quantization_scale_type:4 and tensor_bits:16-bit	All except AM62
TIDL-3659	Concat Layer Along Height/Width giving wrong output on target when number of input channels is one	All except AM62
TIDL-3648	Concat layer gives wrong output on target/evm with following message : "WorkloadUnit_XfrLinkInit: Error: Out of channel:"	All except AM62
TIDL-3641	Low latency inference mode (inference mode = 2) has undergone limited functional validation	AM69A ( J784S4)
TIDL-3010	Data convert layer that does the layout change (from NHWC to NCHW) hangs on the target/EVM when the shape of the input tensor to the data convert is of the form 1x1x1xN	All except AM62
TIDL-2878	Object detection post processing results in crash if all convolution heads are not part of same subgraph	All except AM62
TIDL-2821	Non depthwise separable convolution layers with input pad = 0 and running in 16-bit hangs on EVM	All except AM62
TIDL-1878	Custom layer with float output is resulting into error during compilation	All except AM62

Known Issues

ID	Description	Affected Platforms	Occurrence	Workaround in this release
TIDL-3863	Networks with 7x7 depth wise separable layers and very large number of layers results in compilation failure. Refer error message “Memory limit exceeded for Workload Creation. Max number of Workload Limit per core is” during compilation stage	All except AM62	Very Rare	Modify the network to avoid 7x7 DWS layers
TIDL-3866	Vision transformers with layerNorm operator with 16-bit data type can have bit mismatch b/w “Host emulation” vs “target/EVM”. This bit mismatch (1-bit delta) is harmless in correct functional behavior and can be ignored	All except AM62	Frequent	None
TIDL-3867	Vision Transformer and Matmul with variable input feature has undergone limited validation on TDA4VM/J721E	TDA4VM/J721E	Frequent	None
TIDL-3864	Matmul with broadcast is supported with below constraint Let’s say if the tensor dimensions as input to Matmul are B1 x N1 x C1 x H1 x W1 and B2 x N2 x C2 x H2 x W2 then Either B1=N1=C1=1 should satisfy or B2=N2=C2=1 should satisfy	All except AM62	Frequent	None
TIDL-3865	Eltwise with broadcast is supported with below constraint Let’s say if the tensor dimensions as input to Eltwise are B1 x N1 x C1 x H1 x W1 and B2 x N2 x C2 x H2 x W2 then Either B1=N1=C1=1 should satisfy or B2=N2=C2=1 should satisfy W1 has to be W2 Examples: - (1) 1 x 1 x 1 x 1x 5 and 1 x 1 x 1 x 10 x 5 – supported - (2) 1 x 1 x 1 x 10 x 5 and 1 x 1 x 20 x 10 x 5 – supported - (3) 1 x 1 x 1 x 10 x 1 and 1 x 1 x 1 x 10 x 5 – not supported	All except AM62	Frequent	None
TIDL-3868	Vision transformer support has below constraints 1. advanced_options:inference_mode = TIDL_infereneModeLowLatency is not supported 2. advanced_options:high_resolution_optimization = 1 is not supported 3. Mixed precision is not supported	All except AM62	Frequent	None
TIDL-3870	Partial network with batch dimension is not supported	All except AM62	Frequent	Use scripts provided as part of Model optimization
TIDL-2991	Non-strided row flow convolution with top pad > 1 and procSize < inWidth can lead to incorrect outputs	All except AM62	Rare	Modify the network to avoid this situation
TIDL-2947	Convolution with pad greater than the input width results in incorrect outputs	All except AM62	Rare	Modify the network to avoid this situation
TIDL-3704	CPP inference mode of ONNX RT throws below error and results in functional incorrect behavior “onnxruntime::common::Status onnxruntime::IExecutionFrame::GetOrCreateNodeOutputMLValue(int, int, const onnxruntime::TensorShape, OrtValue&, const onnxruntime::Node&) shape && tensor.Shape() == *shape was false. OrtValue shape verification failed.”	All except AM62	Rare	Set export TIDL_RT_ONNX_VARDIM=1 To prevent this
TIDL-2592	TFLite-RT with TIDL-RT delegation support models with only 4 dimensional tensors	All except AM62	Rare	None
TIDL-3872	Preemption of a network by another network is not supported	J722S	Frequent	None
TIDL-3871	Low latency inference mode (single network instance split across multiple c7x cores) expressed by option advanced_options:inference_mode = TIDL_infereneModeLowLatency is not supported	J722S	Frequent

Assets 2