Release DALI v1.0.0 · NVIDIA/DALI

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

The API documentation has been improved:
- The functional API has became the main DALI API (#2653).
- Rewrote all examples to use the functional API (#2761, #2755, #2744, #2748, #2745, and #2716).
- Applied layout and editorial changes (#2729, #2730, #2713, #2710, #2703, and #2694).
New operators:
- A GridMask GPU operator for GridMask data augmentation (#2652).
- A RandomObjectBBox operator with caching to randomly select a bounding box (#2718, #2696, #2677, and #2657).
- A MultiPaste operator, is required to implement Mosaic augmentation (#2583).
External Source can now run the per-sample callbacks in parallel (#2543).
Added pipeline_def decorator, which is an easier to define a pipeline with the functional API (#2757 and #2629).
Moved all decoders to a dedicated Python module (#2741, #2743, and #2725).
Moved all readers to a dedicated Python module (#2720, #2721, #2717, #2715, and #2722).
Exposed the pipeline output names in the C API (#2665).
Introduced the following named Slice operator arguments (#2625):
- start/rel_start
- end/rel_end
- shape/rel_shape
Enabled additional codecs and demuxers in FFmpeg (#2651).
Added an option to disable the first batch preparation during the iterator construction (#2664).

Fixed issues

This DALI release includes the following fixes:

Fixed the JPEG 2000 ROI decoding (#2692).
Fixed the layout length check in Transpose (#2693).
Fixed the .gpu() usage detection and error for CPU-only pipelines (#2682).

Improvements

Rework frameworks notebooks to fn API (#2761)
Bump up OpenCV-python version in tests (#2749)
Enhance deprecated argument documentation (#2755)
Convert notebooks to fn API: audio_processing, custom_operator, serialization (#2744)
Expose all pipeline constructor arguments as properties. (#2757)
Convert notebooks to fn API: sequence_processing (#2748)
Gridmask Gpu (#2652)
Run external source callback in parallel (#2543)
Bump up nvidia-tensorflow version to 1.15.5 21.02 (#2738)
Rewrite image processing examples to fn api. (#2745)
Update augmentation gallery (#2716)
Remove dynlink CUDA libs from the build image (#2739)
Rework getting started (#2729)
Adjust Python decoders tests to decoders module (#2741)
Adjust notebooks to new decoder module (#2743)
Update memory resource interfaces. (#2742)
Move decoders to decoders module (#2725)
Add Examples and Tutorials metadata title (#2730)
Adjust test to new readers module (#2720)
Adjust examples to new readers module (#2721)
Documentation home update (#2713)
Move tfrecord reader to readers module (#2722)
Move readers to dedicated submodule (#2717)
Add hash-based caching to RandomObjectBBox. (#2718)
Add break of VideoReader loop when keyframe past requested has been reached (#2706)
Improve set_outputs to accept list or tuple of data nodes as well (#2698)
Documentation: New layout of Examples and Tutorials section (#2710)
Rename test files for readers (#2715)
Add error checking if provided shape to tfrecord can house underlying data (#2705)
Documentation editorial changes: Init caps for all headings, Copyright update (#2703)
Add documentation to functional API (all fn.*) + New documentation layout (#2653)
Parallel random object BBox (#2677)
Rework ThreadPool and spinlock (#2696)
Improvements in Dockerfile.deps so that RUN commands are easily run in a non-docker environment (#2686)
Fix formatting of Resnet-N with Tensorflow example (#2694)
Operator RandomObjectBBox (#2657)
MultiPaste operator (#2583)
Add better exception granurality to memory::alloc_shared and memory::alloc_unique (#2683)
Make DALI pipeline use default seed (-1) when None is set to seed (#2676)
Make preparation of the first batch during the iterator construction optional (#2664)
Parallelize commands in bundle-wheel.sh (#2672)
Pipeline decorator (#2629)
Move to CUDA 11.2 update 1 (#2668)
Make sure that OpenCV decoding fallback follows EXIF information handling (#2666)
Expose names of Pipeline outputs in C API (#2665)
Enable named Slice arguments: start/rel_start, end/rel_end, shape/rel_shape (#2625)
Update nvidia-tensorflow in qa scripts to 20.12 (#2654)
Enable more codecs and demuxers in FFmpeg (#2651)

Bug fixes

Fix paddle ssd (#2765)
Fix Gluon example (#2764)
Remove redundant dimension from Optical Flow example. (#2762)
Fix 403 error when downloading Mnist dataset in Pytorch Lighting example (#2759)
Fix documentation instances of deprecated fn.image_decoder (#2754)
Shutdown executor when an error occurs in the executor itself, not in one of operators. (#2750)
Fix libcufile.so name to have *.0 sufix (#2735)
Fix test exclude pattern for Xavier (#2731)
Fix auto replacement of deprecated args for schema inheritance (#2733)
Fix constant input promotion for mixed backend. (#2726)
Fix type of slice's rel_shape argument (#2714)
Fix a regression in RandomObjectBBox: weights not set to default. (#2719)
Update TensorFlow ReseNet50 example to work with the latest TF 2.4.x version (#2704)
Add auto generated docs files to .gitignore (#2711)
Update DALI PyTorch ligthing example to work with the newest lighting (#2697)
Fix JPEG2K fused decoding (with ROI), add native tests for JP2k decoding (#2692)
Fix TL1_tensorflow-dali_test (#2687)
Remove unnecessary cuda runtime dependency from alloc.h (#2691)
Fix layout length check in Transpose. (#2693)
Replace eval with safer ast.literal_eval (#2690)
Fix .gpu usage detection and error for CPU only pipelines (#2682)
Add support for TensorFlow 2.4.1 in tests and for TF plugin (#2679)
Fix wrong early exit in function inside bundle-wheel.sh (#2675)
Fix apex compilation on Ubuntu 20.04 in TL1_ssd_training (#2671)
Fix cmake installation in TL1 for Ubuntu 20.04 (#2669)
Remove the split stages implementation of the hybrid image decoder (#2753)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

fn.audio_decoder / ops.AudioDecoder has been renamed to fn.decoders.audio / ops.decoders.Audio.
fn.image_decoder / ops.ImageDecoder has been renamed to fn.decoders.image / ops.decoders.Image.
fn.image_decoder_crop / ops.ImageDecoderCrop has been renamed to fn.decoders.image_crop / ops.decoders.ImageCrop.
fn.image_decoder_random_crop / ops.ImageDecoderRandomCrop has been renamed to fn.decoders.image_random_crop / ops.decoders.ImageRandomCrop.
fn.image_decoder_slice / ops.ImageDecoderSlice has been renamed to fn.decoders.image_slice / ops.decoders.ImageSlice.
fn.caffe2_reader / ops.Caffe2Reader has been renamed to fn.readers.caffe2 / ops.readers.Caffe2.
fn.caffe_reader / ops.CaffeReader has been renamed to fn.readers.caffe / ops.readers.Caffe.
fn.coco_reader / ops.CocoReader has been renamed to fn.readers.coco / ops.readers.Coco.
fn.file_reader / ops.FileReader has been renamed to fn.readers.file / ops.readers.File.
fn.mxnet_reader / ops.MXNetReader has been renamed to fn.readers.mxnet / ops.readers.MXNet.
fn.nemo_asr_reader / ops.NemoAsrReader has been renamed to fn.readers.nemo_asr / ops.readers.NemoAsr.
fn.numpy_reader / ops.NumpyReader has been renamed to fn.readers.numpy / ops.readers.Numpy.
fn.sequence_reader / ops.SequenceReader has been renamed to fn.readers.sequence / ops.readers.Sequence.
fn.tfrecord_reader / ops.TFRecordReader has been renamed to fn.readers.tfrecord / ops.readers.TFRecord.
fn.video_reader / ops.VideoReader has been renamed to fn.readers.video / ops.readers.Video.
fn.video_reader_resize/ops.VideoReaderResize has been renamed to fn.readers.video_resize / ops.readers.VideoResize.

Known issues:

The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.0.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.0.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.0.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.0.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

https://developer.download.nvidia.com/compute/redist/nvidia-dali/libsndfile-1.0.28.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DALI v1.0.0