Skip to content

DALI v0.11.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@JanuszL JanuszL released this 26 Jun 20:14

Bug fixes

  • Fix propagation of DALI build SHA, flavor and timestamp (#948)
  • Fix warning (#947)
  • Fix data race in displacement filter (#945)
  • Fix OF sequence number bug (#896)
  • Drop TF 1.14rc0 from test as it doesn't have working TensorBoard (#941)
  • Make Transpose operator as one supporting sequences (#928)
  • Update aarch64 build docs (#931)
  • Fix lint error (#932)
  • Fix lint result being ignored for include/dali. Fix linter errors in include/dali. (#923)
  • Fix floating point precision error to calculate width and height for resizing (#917)
  • Fix wrong registration of python operators after loading plugin (#910)
  • Bound installed torchvision version with present CUDA version in tests (#912)
  • Update README and iterator docs (#889)
  • Fix SSD example and tests (#908)
  • Disable threading inside the OpenCV (#887)
  • Fix lint error printing in Python 3. (#907)
  • Fix compilation error in assert(size(shample_shape)). (#901)
  • fix cmake warning (#886)
  • Restore performance in JoC RN50 inference (#962)

Improvements

  • Change CPU to batch processing (#936)
  • Add specializations of Operator class for all backends (#934)
  • Replace the displacement flip with dedicated operator. (#849)
  • Replace current crop and slice with new version based on slice kernel (#930)
  • Add multiple inputs and outputs in the python operator (#942)
  • Add ThreadPool to Host Workspace (#935)
  • Make test_detection_pipeline to use DALI extra as an option (#922)
  • Add the seqence reader example (#895)
  • Box encoder gpu offsets (#939)
  • Add cascading notify in thread pool (#933)
  • Add optional offset computation to BoxEncoder (#921)
  • Add sanity test for PyTorch SuperRes example (#633)
  • Remove prebuild TensorFlow plugins from DALI (#920)
  • New slice operator (#913)
  • Remove unnecessary copies by using const ref or move (#655)
  • view_as_tensor_gpu utility function & copy tensor (#658)
  • Use SmallVector in TensorShape. (#915)
  • Add GTC 2019 video and presentation do the documentation (#926)
  • Optimize slice kernel. (#924)
  • Update nvJPEG version (#919)
  • Rework DeviceGuard to restore original context upon the exit (#882)
  • Slice GPU batched kernel (#905)
  • Add ability to use docker based build for insource-builds (#891)
  • NewCrop: support for 4D inputs (#900)
  • Upgrade PyTorch to 1.1.0 in QA tests config (#909)
  • Device-usable TensorsShape and core utils. (#903)
  • Add SmallVector class. (#902)
  • Add N-dimensional Slice CPU kernel (#893)
  • DALI for aarch64-linux platform (#856)
  • Make linter to work with Python3 (#904)
  • VideoReader stride (#755)
  • Device-side testing. (#897)
  • Update docs of the Readers (#894)
  • Add L1 test for split queues executor (#780)
  • Generic N-dimensional GPU slice kernel (#877)
  • Update info about operators supporting sequences (#885)
  • Move error handling to DALI core. (#867)
  • Add possibility to build debug dali using build.sh (#857)
  • Add proper errors to the ExternalSource (#875)
  • Use raw ImageNet data for RN50 convergence test (#636)
  • Simplified README with links to NVIDIA docs
  • Add as_tensor with provided shape method to python API (#953)

Breaking API changes

  • CPU operators have moved from per-sample processing (pipeline process sample after sample, all the way through the pipeline) to batch-procession (all samples are processed by the first operator before moving to the next operator). This may result in a small performance degradation for some use cases. However, in the long term it will make some currently unavailable optimizations possible, together with making possible operations that need to view the whole batch during the processing (like random sample blending inside a batch).
  • CropCastPermute is removed. CropMirrorNormalize should be used instead (with the default values for normalization).

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may be not compatible with TensorFlow 1.14.0 release. The DALI TensorFlow plugin requires that the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc 4.8.5, depending on the particular version) be present on the system.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.11.0
or for CUDA 10
pip install --extra-index-url http://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.11.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here