Skip to content

Releases: Xilinx/Vitis-AI

Vitis AI 3.5 Release

29 Jun 14:15
Compare
Choose a tag to compare

Release Notes 3.5

Version Compatibility

Vitis™ AI v3.5 and the DPU IP released with the v3.5 branch of this repository are verified as compatible with Vitis, Vivado™, and PetaLinux version 2023.1. If you are using a previous release of Vitis AI, you should review the version compatibility matrix for that release.

Documentation and Github Repository

  • Merged UG1313 into UG1414
  • Streamlined UG1414 to remove redundant content
  • Streamlined UG1414 to focus exclusively on core tool usage. Core tools such as the Optimizer, Quantizer and Compiler are now being utilized across multiple targets (ie Ryzen™ AI, EPYC™) and this change seeks to make UG1414 more portable to these targets
  • Migrated Adaptable SoC and Alveo specific content from UG1414 to Github.IO
  • New Github.IO Toctree structure
  • Integrated VART Runtime APIs in Doxygen format

Docker Containers and GPU Support

  • Removed Anaconda dependency from TensorFlow 2 and PyTorch containers in order to address Anaconda commercial license requirements
  • Updated Docker container to disable Ubuntu 18.04 support (which was available in Vitis AI but not officially supported). This was done to address CVE-2021-3493.

Model Zoo

  • Add more classic models without modification such as YOLO series and 2D Unet
  • Provide model info card for each model and Jupyter Notebook tutorials for new models
  • New copyleft repo for GPL license models

ONNX CNN Quantizer

  • Initial release
  • This is a new quantizer that supports the direct PTQ quantization of ONNX models for DPU. It is a plugin built for the ONNXRuntime native quantizer.
  • Supports power-of-two quantization with both QDQ and QOP format.
  • Supports Non-overflow and Min-MSE quantization methods.
  • Supports various quantization configurations in power-of-two quantization in both QDQ and QOP format.
  • Supports signed and unsigned configurations.
  • Supports symmetry and asymmetry configurations.
  • Supports per-tensor and per-channel configurations.
  • Supports ONNX models in excess of 2GB.
  • Supports the use of the CUDAExecutionProvider for calibration in quantization.

PyTorch CNN Quantizer

  • Pytorch 1.13 and 2.0 support
  • Mixed precision quantization support, supporting float32/float16/bfloat16/intx mixed quantization
  • Support of bit-wise accuracy cross check between quantizer and ONNX-runtime
  • Split and chunk operators were automatically converted to slicing
  • Dict input/output support for model forward function
  • Keywords argument support for model forward function
  • Matmul subroutine support
  • Add support for BFP data type quantization
  • QAT supports training on mutiple GPUs
  • QAT supports operations with multiple inputs or outputs

TensorFlow 2 CNN Quantizer

  • Updated to support Tensorflow 2.12 and Python 3.8.
  • Adds support for quantizing subclass models.
  • Adds support for mix precision, supports layer-wise data type configuration, supports float32, float16, bfloat16, and int quantization.
  • Adds support for BFP datatypes, and add a new quantize strategy called 'bfp'.
  • Adds support to quantize Keras nested models.
  • Adds experimental support for quantizing the frozen pb format model in TensorFlow 2.x.
  • Adds a new 'gpu' quantize strategy which uses float scale quantization and is used in GPU deployment scenarios.
  • Adds support to exporting the quantized model to frozen pb format or onnx format.
  • Adds support to exporting the quantized model with power-of-two scales to frozen pb format with "FixNeuron" inside, to be compatible with some compilers with pb format input.
  • Adds support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.

Bug Fixes:

  1. Fixes a gradient bug in the 'pof2s_tqt' quantize strategy.
  2. Fixes a bug of quantization position change introduced by the fast fine-tuning process after the PTQ.
  3. Fixes a graph transformation bug when a TFOpLambda op has multiple inputs.

TensorFlow 1 CNN Quantizer

  • Adds support for fast fine-tuning that improves PTQ accuracy.
  • Adds support for folding Reshape and ResizeNearestNeighbor operators.
  • Adds support for splitting Avgpool and Maxpool with large kernel sizes into smaller kernel sizes.
  • Adds support for quantizing Sum, StridedSlice, and Maximum operators.
  • Adds support for setting the input shape of the model, which is useful in the deployment of models with undefined input shapes.
  • Adds support for setting the opset version in exporting onnx format.

Bug Fixes:

  1. Fixes a bug where the AddV2 operation is misunderstood as a BiasAdd.

Compiler

  • New operators supported: Broadcast add/mul, Bilinear downsample, Trilinear downsample, Group conv2d, Strided-slice
  • Performance improved on XV2DPU
  • Error message improved
  • Compilation time speed up

PyTorch Optimizer

  • Removed requirement for license purchase
  • Migrated to Github open-source
  • Supports PyTorch 1.11, 1.12 and 1.13
  • Supports pruning of grouped convolution
  • Supports setting the number of channels to be a multiple of the specified number after pruning

TensorFlow 2 Optimizer

  • Removed requirement for license purchase
  • Migrated to Github open-source
  • Supports TensorFlow 2.11 and 2.12
  • Supports pruning of tf.keras.layers.SeparableConv2D
  • Fixed tf.keras.layers.Conv2DTranspose pruning bug
  • Supports setting the number of channels to be a multiple of the specified number after pruning

Runtime

  • Supports Versal AI Edge VEK280 evalustion kit
  • Buffer optimization for multi-batches to improve performance
  • Add new tensor buffer interface to enhance zero copy

Vitis ONNX Runtime Execution Provider (VOE)

  • Supports ONNX Opset version 18, ONNX Runtime 1.16.0 and ONNX version 1.13
  • Supports both C++ and Python APIs(Python version 3)
  • Supports VitisAI EP and other EPs to work together to deploy the model
  • Provide Onnx examples based on C++ and Python APIs
  • VitisAI EP is open source and upstreamed to ONNX public repo on Github

Library

  • Added three new model libraries and support for five additional models

Model Inspector:

Support inspection for new DPU IPs

Profiler

  • Added Profiler support for DPUCV2DX8G

DPU IP - Versal AIE-ML Targets DPUCV2DX8G (Versal AI Edge / Core)

  • First general access release
  • Configurable from C20B1 to C20B14
  • Support most 2D operators required to deploy models found in the Model Zoo
  • General support for the VE2802/VC2802 and V70
  • Early access support for the VE2302 via this lounge

DPU IP - Zynq Ultrascale+ DPUCZDX8G

  • IP has reached maturity
  • No updates for this release
  • No updated reference design (DPU TRD) will be published for minor (ie x.5) releases
  • No updated pre-built board image will be published for minor (ie x.5) releases

DPU IP - Versal AIE Targets DPUCVDX8H

  • IP has reached maturity
  • No updates for this release
  • No updated reference design (DPU TRD) will be published for minor (ie x.5) releases
  • No updated pre-built board image will be published for minor (ie x.5) releases

DPU IP - CNN - Alveo Data Center DPUCVDX8G

  • IP has reached maturity
  • No updates for this release
  • No updated reference design (DPU TRD) will be published for minor (ie x.5) releases
  • No updated pre-built board image will be published for minor (ie x.5) releases

WeGO

  • Enhanced WeGO to support V70 DPU GA release.
  • Upgraded WeGO to provide support for PyTorch 1.13.1 and TensorFlow r2.12.
  • Enhanced WeGO-Torch to support PyTorch 2.0 as a preview feature.
  • Introduced new C++ API support for WeGO-Torch in addition to Python APIs.
  • Implemented WeGO-TF1 and WeGO-TF2 as out-of-tree plugins.

Known Issues

  • Engineering to add comments

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Vitis AI 3.0 Release

13 Jan 10:04
Compare
Choose a tag to compare

Release Notes 3.0

Documentation and Github Repository

  • Migrated core documentation to Github.IO.

  • Incorporated offline HTML documentation for air-gapped users.

  • Restructured user documentation.

  • Restructured repository directory structure for clarity and ease-of-use.

Docker Containers and GPU Support

  • Migrated from multi-framework to per framework Docker containers.

  • Enabled Docker ROCm GPU support for quantization and pruning.

Model Zoo

  • Updated Model Zoo with commentary regarding dataset licensing restrictions

  • Added 14 new models and deprecated 28 models for a total of 130 models

  • Added super resolution 4x, as well as 2D and 3D semantic segmentation for Medical applications

  • Optimized models for benchmarks:

    • MLPerf: 3D-Unet

    • FAMBench: MaskRCNN

  • Provides optimized backbones supporting YoloX, v4, v5, v6 and EfficientNet-Lite

  • Ease-of-use enhancements, including replacing markdown-format performance tables with a downloadable Model Zoo spreadsheet

  • Added 72 PyTorch and TensorFlow models for AMD EPYC™ CPUs, targeting deployment with ZenDNN

  • Added models to support AMD GPU architectures based on ROCm and MLGraphX

TensorFlow 2 CNN Quantizer

  • Based on TensorFlow 2.10

  • Updated the Model Inspector to for improved accuracy of partitioning results expected from the DPU compiler.

  • Added support for datatype conversions for float models, including FP16, BFloat16, FP32, and double.

  • Added support for exporting quantized ONNX format models (to support the ONNX Runtime).

  • Added support for new layer types including SeparableConv2D and PReLU.

  • Added support for unsigned integer quantization.

  • Added support for automatic modification of input shapes for models with variable input shapes.

  • Added support to align the input and output quantize positions for Concat and Pooling layers.

  • Added error codes and improved the readability of the error and warning messages.

  • Various bug fixes.

TensorFlow 1 CNN Quantizer

  • Separated the quantizer code from the TensorFlow code, making it a plug-in module to the official TensorFlow code base.

  • Added support for exporting quantized ONNX format models (to support the ONNX Runtime).

  • Added support for datatype conversions for float models, including FP16, BFloat16, FP32 and double.

  • Added support for additional operations, including Max, Transpose, and DepthToSpace.

  • Added support for aligning input and output quantize positions of Concat and Pooling operations.

  • Added support for automatic replacement of Softmax with DPU-accelerated Softmax.

  • Added error codes and improved the readability of the error and warning messages.

  • Various bug fixes.

PyTorch CNN Quantizer

  • Support PyTorch 1.11 and 1.12.

  • Support exporting torch script format quantized model.

  • QAT supports exporting trained model to ONNX and torch script.

  • Support FP16 model quantization.

  • Optimized Inspector to support more pattern types, and backward compatible of device assignment.

  • Cover more PyTorch operators: more than 560 types of PyTorch operators are supported.

  • Enhanced parsing to support control flow parsing.

  • Enhanced message system with more useful message text.

  • Support fusing and quantization of BatchNorm without affine calculation.

Compiler

  • Added support for new operators, including: strided_slice, cost volume, correlation 1D & 2D, argmax, group conv2d, reduction_max, reduction_mean

  • Added support for Versal™ AIE-ML architectures DPUCV2DX8G (V70 and Versal AI Edge)

  • Focused effort to improve the intelligibility of error and partitioning messages

PyTorch Optimizer

  • Added support for fine-grained model pruning (sparsity)

  • OFA support for convolution layers with kernel sizes = (1,3) and dialation

  • OFA support for ConvTranspose2D

  • Added pruning configuration that allows users to specify pruning hyper-parameters

  • Specific exception types are defined for each type of error

  • Enhanced parallel model analysis with increased robustness

  • Support for PyTorch 1.11 and 1.12

TensorFlow 2 Optimizer

  • Added support for Keras ConvTranspose2D, Conv3D, ConvTranspose3D

  • Added support TFOpLambda operator

  • Added pruning configuration that allows users to specify pruning hyper-parameters

  • Specific exception types are defined for each type of error

  • Added support for TensorFlow 2.10

Runtime and Library

  • Added support for Versal AI Edge VEK280 evaluation kit and Alveo™ V70 accelerator cards (Early Access)

  • Added support for ONNX runtime, with eleven ONNX-specific examples

  • Added four new model libraries to the Vitis™ AI Library and support for fifteen additional models

  • Focused effort to improve the intelligibility of error messages

Profiler

  • Added Profiler support for DPUCV2DX8G (VEK280 Early Access)

  • Added Profiler support for Versal DDR bandwidth profiling

DPU IP - Zynq Ultrascale+ DPUCZDX8G

  • Upgraded to enable Vivado™ and Vitis 2022.2 release

  • Added support for 1D and 2D Correlation, Argmax and Max

  • Reduced resource utilization

  • Timing closure improvements

DPU IP - Versal AIE Targets DPUCVDX8G

  • Upgraded to enable Vivado and Vitis 2022.2 release

  • Added support for 1D and 2D Correlation

  • Added support for Argmax and Max along the channel dimension

  • Added support for Cost-Volume

  • Reduced resource utilization

  • Timing closure improvements

DPU IP - Versal AIE-ML Targets DPUCV2DX8G (Versal AI Edge)

  • Early access release supporting early adopters with an early, unoptimized AIE-ML DPU

  • Supports most 2D operators (currently does not support 3D operators)

  • Batch size support from 1~13

  • Supports more than 90 Model Zoo models

DPU IP - CNN - Alveo Data Center DPUCVDX8H

  • Upgraded to enable Vitis 2022.2 release

  • Timing closure improvements via scripts supplied for .xo workflows

DPU IP - CNN - V70 Data Center DPUCV2DX8G

  • Early access release supporting early adopters with an unoptimized DPU

  • Supports most 2D operators (currently does not support 3D operators)

  • Batch size 13 support

  • Supports more than 70 Model Zoo models

WeGO

  • Integrated WeGO with the Vitis-AI Quantizer to enable on-the-fly quantization and improve easy-of-use

  • Introduced serialization and deserialization with the WeGO flow to offer the capability of building once and running anytime

  • Incorporated AMD ZenDNN into WeGO, enabling additional optimization for AMD EPYC CPU targets

  • Improve WeGO robustness to offer a better developer experience and support a wider range of models

Known Issues

  • Bitstream loading error occurs when the AIE-ML DPU application running on the VEK280 kit is interrupted manually

  • HDMI not functional for the early access VEK280 image. The issue will be fixed in the next release

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Vitis AI 2.5 Release

16 Jun 01:34
Compare
Choose a tag to compare

New Features/Highlights

AI Model Zoo added 14 new models, including BERT-based NLP, Vision Transformer (ViT), Optical Character Recognition (OCR), Simultaneous Localization and Mapping (SLAM), and more Once-for-All (OFA) models
Added 38 base & optimized models for AMD EPYC server processors
AI Quantizer added model inspector, now supports TensorFlow 2.8 and Pytorch 1.10
Whole Graph Optimizer (WeGO) supports Pytorch 1.x and TensorFlow 2.x
Deep Learning Processing Unit (DPU) for Versal® ACAP supports multiple compute units (CU), new Arithmetic Logic Unit (ALU) engine, Depthwise convolution and more operators supported by the DPUs on VCK5000 and Alveo™ data center accelerator cards
Inference server supports ZenDNN as backend on AMD EPYC™ server processors
New examples added to Whole Application Acceleration (WAA) for VCK5000 Versal development card and Zynq® UltraScale+™ evaluation kits

Release Notes

AI Model Zoo

Added 14 new models, and 134 models in total
Expanded model categories for diverse AI workloads :
    Added models for data center application requirements including text detection and end-to-end OCR
    Added BERT-based NLP and Vision Transformer (ViT) models on VCK5000
    More OFA-optimized models, including OFA-RCAN for Super-Resolution and OFA-YOLO for Object Detection
    Added models for industrial vision and SLAM, including Interest Point Detection & Description model and Hierarchical Localization model.
Added 38 base & optimized models for AMD EPYC CPU
EoU enhancement:
    Improved model index by application categories

AI Quantizer-CNN

Added Model Inspector that inspects a float model and shows partition results
Support Tensorflow 2.8 and Pytorch 1.10
Support float-scale and per-channel quantization
Support configuration for different quantize strategies

AI Optimizer

OFA enhancement:
    Support even kernel size of convolution
    Support ConvTranspose2d
    Updated examples
One-step and iterative pruning enhancement:
    Resumed model analysis or search after exception

AI Compiler

Support ALU for DPUCZDX8G
Support new models

AI Library / VART

Added 6 new model libraries and support 17 new models
Custom Op Enhancement
Added new CPU operators
Xdputil Tool Enhancement
Two new demos on VCK190 Versal development board

AI Profiler

Full support on custom OP and Graph Runner
Stability optimization

Edge DPU-DPUCZDX8G

New ALU engine to replace pool engine and DepthWiseConv engine in MISC:
    ALU: support new features, e.g. large-kernel-size MaxPool, AveragePool, rectangle-kernel-size AveragePool, 16bit const weights
    ALU: support HardSigmoid and HardSwish
    ALU: support DepthwiseConv + LeakyReLU
    ALU: support the parallelism configuration
DPU IP and TRD on ZCU102 with encrypted RTL IP based on 2022.1 Vitis platform

Edge DPU-DPUCVDX8G

Optimized ALU that better support features like channel-attention
Support multiple compute units
Support DepthwiseConv + LeakyReLU
Support Versal DPU IP and TRD on VCK190 with encrypted RTL and AIE code which still in C32B1-6/C64B1-5, and based on 2022.1 Vitis platform

Cloud DPU-DPUCVDX8H

Enlarged DepthWise convolution kernel size that ranges from 1x1 to 8x8
Support AIE based pooling and ElementWise add & multiply, and big kernel size pooling
Support more DepthWise convolution kernel sizes

Cloud DPU-DPUCADF8H

Support ReLU6/LeakyReLU and MobileNet series models
Fixed the issue of missing directories in some cases in the .XO flow

Whole Graph Optimizer (WeGO)

Support PyTorch 1.x and TensorFlow 2.x in-framework inference
Added 19 PyTorch 1.x/Tensorflow 2.x/Tensorflow 1.x examples, including classification, object detection, segmentation and more

Inference Server

Added gRPC API to inference server flow
Support Tensorflow/Pytorch
Support AMD ZenDNN as backend

WAA

New examples for VCK5000 & ZCU104 - ResNet & adas_detection
New ResNet example containing AIE based pre-prorcessing kernel
Xclbin generation using Pre-built DPU flow for ZCU102/U50 ResNet and adas_detection applications
Xclbin generation using build flow for ZCU104/VCK190 ResNet and adas_detection applications
Porting of all VCK190 examples from ES1 to production version and use base platform instead of custom platform

Vitis AI 2.0 Release

20 Jan 07:21
Compare
Choose a tag to compare

Release 2.0

New Features/Highlights

  1. General Availability (GA) for VCK190(Production Silicon), VCK5000(Production Silicon) and U55C
  2. Add support for newer Pytorch and Tensorflow version: Pytorch 1.8-1.9, Tensorflow 2.4-2.6
  3. Add 22 new models, including Solo, Yolo-X, UltraFast, CLOCs, PSMNet, FairMOT, SESR, DRUNet, SSR as well as 3 NLP models and 2 OFA (Once-for-all) models
  4. Add the new custom OP flow to run models with DPU un-supported OPs with enhancement across quantizer, compiler and runtime
  5. Add more layers and configurations of DPU for VCK190 and DPU for VCK5000
  6. Add OFA pruning and TF2 keras support for AI optimizer
  7. Run inference directly from Tensorflow (Demo) for cloud DPU

Release Notes

Model Zoo

  • 22 new models added, 130 total
    • 19 new Pytorch models including 3 NLP and 2 OFA models
    • 3 new Tensorflow models
  • Added new application models
    • AD/ADAS: Solo for instance segmentation, Yolo-X for traffic sign detection, UltraFast for lane detection, CLOCs for sensor fusion
    • Medical: SESR for super resolution, DRUNet for image denoise, SSR for spectral remove
    • Smart city and industrial vision: PSMNet for binocular depth estimation, FairMOT for joint detection and Re-ID
  • EoU Enhancements
    • Updated automatic script to search and download required models

Quantizer

  • TF2 quantizer
    • Add support TF 2.4-2.6
    • Add support for custom OP flow, including shape inference, quantization and dumping
    • Add support for CUDA 11
    • Add support for input_shape assignment when deploying QAT models
    • Improve support for TFOpLambda layers
    • Update support for hardware simulation, including sigmoid layer, leaky_relu layer, global and non-global average pooling layer
    • Bugfixs for sequential models and quantize position adjustment
  • TF1 quantizer
    • Add quantization support for new ops, including hard-sigmoid, hard-swish, element-wise multiply ops
    • Add support for replacing normal sigmoid with hard sigmoid
    • Update support for float weights dumping when dumping golden results
    • Bugfixs for inconsistency of python APIs and cli APIs
  • Pytorch quantizer
    • Add support for pytorch 1.8 and 1.9
    • Support CUDA 11
    • Support custom OP flow
    • Improve fast finetune performance on memory consumption and accuracy
    • Reduce memory consumption by feature map among quantization
    • Improve QAT functions including better initialization of quantization scale and new API for getting quantizer’s parameters
    • Support more quantization of operations: some 1D and 3D ops, DepthwiseConvTranspose2D, pixel-shuffle, pixel-unshuffle, const
    • Support CONV/BN merging in pattern of CONV+CONCAT+BN
    • Some message enhancement to help user locate problem
    • Bugfixs about consistency with hardware

Optimizer

  • TensorFlow 1.15
    • Support tf.keras.Optimizer for model training
  • TensorFlow 2.x
    • Support TensorFlow 2.3-2.6
    • Add iterative pruning
  • PyTorch
    • Support PyTorch 1.4-1.9.1
    • Support shared parameters in pruning
    • Add one-step pruning
    • Add once-for-all(OFA)
    • Unified APIs for iterative and one-step pruning
    • Enable pruned model to be used by quantizer
    • Support nn.Conv3d and nn.ConvTranspose3d

Compiler

  • DPU on embedded platforms
    • Support and optimize conv3d, transposedconv3d, upsample3d and upsample2d for DPUCVDX8G(xvDPU)
    • Improve the efficiency of high resolution input for DPUCVDX8G(xvDPU)
    • Support ALUv2 new features
  • DPU on Alveo/Cloud
    • Support depthwise-conv2d, h-sigmoid and h-swish for DPUCVDX8H(DPUv4E)
    • Support depthwise-conv2d for DPUCAHX8H(DPUv3E)
    • Support high resolution model inference
  • Support custom OP flow

AI Library and VART

  • Support all the new models in Model Zoo: end-to-end deployment in Vitis AI Library
  • Improved GraphRunner to better support custom OP flow
  • Add examples on how to integrate custom OPs
  • Add more pre-implemented CPU OPs
  • DPU driver/runtime update to support Xilinx Device Tree Generator (DTG) for Vivado flow

AI Profiler

  • Support CPU tasks tracking in graph runner
  • Better memory bandwidth analysis in text summary
  • Better performance to enable the analysis of large models

Custom OP Flow

  • Provides new capability of deploying models with DPU unsupported OPs
    • Define custom OPs in quantization
    • Register and implement custom OPs before the deployment by graph runner
  • Add two examples
    • Pointpillars Pytorch model
    • MNIST Tensorflow 2 model

DPU

  • CNN DPU for Zynq SoC / MPSoC, DPUCZDX8G (DPUv2)
    • Upgraded to 2021.2
    • Update interrupt connection in Vivado flow
  • CNN DPU for Alveo-HBM, DPUCAHX8H (DPUv3E)
    • Support depth-wise convolution
    • Support U55C
  • CNN DPU for Alveo-DDR, DPUCADF8H (DPUv3Int8)
    • Updated U200/U250 xlcbins with XRT 2021.2
    • Released XO Flow
    • Released IP Product Guide (PG400)
  • CNN DPU for Versal, DPUCVDX8G (xvDPU)
    • C32 (32-aie cores for a single batch) and C64 (64-aie cores for a single batch) configurable
    • Support configurable batch size 1~5 for C64
    • Support and optimize new OPs: conv3d, transposedconv3d, upsample3d and upsample2d
    • Reduce Conv bubbles and compute redundancy
    • Support 16-bit const weights in ALUv2
  • CNN DPU for Versal, DPUCVDX8H (DPUv4E)
    • Support depth-wise convolution with 6 PE configuration
    • Support h-sigmoid and h-swish

Whole App Acceleration

  • Upgrade to Vitis and Vivado 2021.2
  • Custom plugin example: PSMNet using Cost Volume (RTL Based) accelerator on VCK190
  • New accelerator for Optical Flow (TV-L1) on U50
  • High resolution segmentation application on VCK190
  • Options to compare throughput & accuracy between FPGA and CPU Versions
    • Throughput improvements ranging from 25% to 368%
  • Reorganized for better usability and visibility

TVM

  • Add support of DPUs for U50 and U55C

WeGO (Whole Graph Optimizer)

  • Run inference directly from Tensorflow framework for cloud DPU
    • Automatically perform subgraph partitioning and apply optimization/acceleration for DPU subgraphs
    • Dispatch non-DPU subgraphs to TensorFlow running on CPU
  • Resnet50 and Yolov3 demos on VCK5000

Inference Server

  • Support xmodel serving in cloud / on-premise (EA)

Known Issues

  • vai_q_caffe hangs when TRAIN and TEST phases point to the same LMDB file
  • TVM compiled Inception_v3 model gives low accuracy with DPUCADF8H (DPUv3Int8)
  • TensorFlow 1.15 quantizer error in QAT caused by an incorrect pattern match

Vitis AI 1.4.1 Release

27 Oct 14:06
Compare
Choose a tag to compare

Release 1.4.1

New Features/Highlights

  • Vitis AI RNN docker public release, including RNN quantizer and compiler
  • New unified xRNN runtime for U25 & U50LV based on VART Runner interface and XIR xmodels
  • Release Versal DPU TRD based on 2021.1
  • Versal WAA app updated to provide better throughput using the new XRT C++ APIs and zero copy
  • TVM easy-of-use improvement
  • Support VCK190 and VCK5000 production boards
  • Some bugs fixed, e.g. update on xCompiler data alignment issue affecting WAA, quantizer bug fixed

Vitis AI 1.4 Release

27 Jul 06:05
Compare
Choose a tag to compare

New Features/Highlights

  • Support new platforms, including Versal ACAP platforms VCK190, VCK5000 and Kria SoM
  • Better Pytorch and Tensorflow model support: Pytorch 1.5-1.7.1, improved quantization for Tensorflow 2.x models
  • New models, including 4D Radar detection, Image-Lidar sensor fusion, 3D detection & segmentation, multi-task, depth estimation, super resolution for automotive, smart medical and industrial vision applications
  • New Graph Runner API to deploy models with multiple subgraphs
  • DPUCADX8G (DPUv1)deprecated with DPUCADF8H (DPUv3Int8)
  • DPUCAHX8H (DPUv3E) and DPUCAHX8L (DPUv3ME) release with xo
  • Classification & Detection WAA examples for Versal (VCK190)

v1.3.2 Release

21 Apr 09:39
Compare
Choose a tag to compare
  • Enable Ubuntu 20.04 on MPSoC (Vitis AI Runtime and Vitis AI Library)
  • Added environment variable for Vitis AI Library’s model search path
  • Bug fixes for pytorch / LSTM and log improvement

Vitis-AI 1.3.1 Release

23 Mar 07:48
5f40342
Compare
Choose a tag to compare
  • Update compiler to improve performance by 5% in average for most models
  • Added zero copy support (new APIs in VART / Vitis AI Library)
  • Added cross-layer equalization support in TensorFlow v1.15
  • Added WAA U50 TRD
  • Updated U280 Pre-processing using Multi-preprocessing JPEG decode kernels
  • Bug fixes and improvements for v1.3

Vitis-AI 1.3 Release

17 Dec 22:08
Compare
Choose a tag to compare
  • Added support for Pytorch and Tensorflow 2.3 frameworks
  • Added more ready-to-use AI models for a wider range of applications, including 3D point cloud detection and segmentation, COVID-19 chest image segmentation and other reference models
  • Unified XIR-based compilation flow from edge to cloud
  • Vitis AI Runtime (VART) fully open source
  • New RNN overlay for NLP applications
  • New CNN DPUs for the low-latency and higher throughput applications on Alveo cards
  • EoU enhancement with Beta version model partitioning and custom layer/operators plug-in

Vitis-AI Release 1.2.1

30 Jul 03:29
0353caa
Compare
Choose a tag to compare

v1.2.1

  • Added Zynq Ultrascale Plus Whole App examples
  • Updated U50 XRT and shell to Xilinx-u50-gen3x4-xdma-2-202010.1-2902115
  • Updated docker launch instructions
  • Updated TRD makefile instructions