Skip to content

Releases: AmusementClub/vs-mlrt

v12.2

23 Nov 09:19
Compare
Choose a tag to compare
v12.2 Pre-release
Pre-release

Update vsmlrt.py:

  • Introduce a new release artifact ext-models.v12.2.7z, which comes from External Models, and it's not bundled into full binary release packages (i.e. the cpu, cuda and vk packages). Please refer to their release notes for details on how to use those models.

  • Export a new API vsmlrt.inference for inference of custom models.

    import vsmlrt
    output = vsmlrt.inference(clips, "path/to/onnx", backend=vsmlrt.Backend.TRT(fp16=True))

    If you encounter issues like Cannot find input tensor with name "input" in the network inputs! Please make sure the input tensor names are correct., you could use vsmlrt.inference(..., input_name=None) or export the model with its input name set to "input".

  • Fix trt inference of cugan-pro (3x) models. (#15)

External Models

07 Dec 07:20
Compare
Choose a tag to compare
External Models Pre-release
Pre-release

More models!

In addition to bundled models, vs-mlrt can also be used to run these models:

With more to come.

Also check onnx models provided by the avs-mlrt community.

Usage

If an external model is not supported by the Python wrapper, you can use the generic vsmlrt.inference API to run these models (requires release v12.2 or later).

import vsmlrt
output = vsmlrt.inference(rgbs, "path/to/onnx", backend=vsmlrt.Backend.TRT(fp16=True))

The rife model requires auxiliary inputs and should be used from vsmlrt.RIFE or vsmlrt.RIFEMerge interface.

v12.1

16 Nov 10:43
Compare
Choose a tag to compare
v12.1 Pre-release
Pre-release

This minor release fixes #9: now if vsort/vstrt fails to load required cuda DLLs, they won't crash the entire process.

However, if vs-mlrt is correctly installed, this shouldn't happen. Please report an issue if you can't access the core.trt or core.ort namespaces. Common mistake is forgetting to extract the vsmlrt-cuda.v12.1.7z package for VSORT-Windows-x64.v12.1.7z or VSTRT-Windows-x64.v12.1.7z packages. If in doubt, use the fully bundled release vsmlrt-windows-x64-cuda.v12.1.7z for CUDA users.

Note: we explicitly do not support using both pytorch and vs-mlrt plugins in the same vpy script as pytorch uses its own set of cuda DLL which might be in conflict with the ones vs-mlrt uses. As those DLLs are not explicitly versioned (e.g. nvinfer.dll instead of nvinfer-x.yz.dll), there is nothing we can do.

v12: latest CUDA libraries

01 Nov 10:57
Compare
Choose a tag to compare

Compared to v11, this release updated CUDA dependencies to CUDA 11.8.0, cuDNN 8.6.0 and TensorRT 8.5.1:

  • Added support for the NVIDIA 40 series GPUs.
  • Added support for RIFE on the trt backend.

Known issue

  • Performance of the OV_CPU or ORT_CUDA(fp16=True) backends for RIFE is lower than expected, which is under investigation. Please consider ORT_CPU or ORT_CUDA(fp16=False) for now.
  • The NCNN_VK backend does not support RIFE.

Installation Notes

For some advanced features, vsmlrt.py requires numpy and onnx packages to be available. You might need to run pip install onnx numpy.

Benchmark

previous benchmark

Configuration: NVIDIA RTX 3090, driver 526.47, windows server 2019, vs r60, python 3.11.0, 1080p fp16

Backends: ort-cuda, trt from vs-mlrt v12.

For the trt backend, the engine is created without CUDA_MODULE_LOADING=LAZY environment variable and with it during benchmarking to reduce device memory consumption.

Data format: fps / GPU memory usage (MB)

rife(model=44, 1920x1088)

backend 1 stream 2 streams
ort-cuda 53.62/1771 83.34/2748
trt 71.30/ 626 107.3/ 962

dpir color

backend 1 stream 2 streams
ort-cuda 4.64/3230
trt 10.32/1992 11.61/3475

waifu2x upconv_7

backend 1 stream 2 streams
ort-cuda 11.07/5916 15.04/10899
trt 18.38/2092 31.64/ 3848

waifu2x cunet

backend 1 stream 2 streams
ort-cuda 4.63/8541 5.32/16148
trt 11.44/4771 15.59/ 8972

realesrgan v2/v3

backend 1 stream 2 streams
ort-cuda 8.84/2283 11.10/4202
trt 14.59/1324 21.37/2174

v11 RIFE support

26 Oct 00:37
Compare
Choose a tag to compare

Added support for the RIFE video frame interpolation algorithm.

There are two APIs for RIFE:

  • vsmlrt.RIFE is a high-level API for interpolating a clip. set the multi argument to specify the fps factor. Just remember to perform scene detection on the input clip.
  • vsmlrt.RIFEMerge is a novel temporal std.MaskedMerge-like interface for RIFE. Use it if you want to precisely control the frames and/or time point for the interpolation.

Known issues

  • vstrt doesn't support RIFE for the moment1. The next release of TensorRT should include RIFE support and we will release v12 when that happens.

  • vstrt backend also doesn't yet support latest RTX 4000 series GPUs. This will be fixed after upgrading to the upcoming TensorRT 8.5 release. RTX 4000 series GPU owners please use other the other CUDA backends.

  • Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

    The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB2.

  1. It's missing grid_sample operator support, see https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md.

  2. this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes

v11.test

23 Sep 07:08
Compare
Choose a tag to compare
v11.test Pre-release
Pre-release

internal testing only.

Added support for the RIFE video frame interpolation algorithm. Some features are still being implemented. The Python RIFE model wrapper interface is still subject to change.

Known issue

  • Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

    The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB1.

  1. this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes

Model Release 20220923, RIFE model

23 Sep 07:22
Compare
Choose a tag to compare
Pre-release

New modules (compared to previous model release):

  • RIFE v4.0 from vs-rife v2.0.0. rife/rife_v4.0.onnx, config: fastmode=True, ensemble=False
  • RIFE v4.2, v4.3, v4.4, v4.5, v4.6, v4.7, v4.8, v4.9, v4.10 from Practical-RIFE. rife/rife_{v4.2,v4.3,v4.4,v4.5,v4.6,v4.7,v4.8,v4.9,v4.10}.onnx, config: fastmode=True, ensemble=False
  • Other provided RIFE models can be found here, including v2 representation of RIFE v4.7-v4.10 models. Sorry for the inconvenience.

Notes:

  • For RIFE on ort-gpu, vs-mlrt v11 or later is suggested for best performance. And (as of v11), only ov-cpu, ort-cpu, ort-cuda, trt (pending new TensorRT release) support RIFE. Specifically, ncnn-vk do not support RIFE due to missing gridsample op.

v10: new vulkan based vsncnn (AMD GPU supported)

15 Sep 11:02
Compare
Choose a tag to compare

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

  • Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete).
    • Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
    • Please refer to the benchmark below for performance details. Tl;dr it's comparable to vsort-cuda on most networks (except waifu2x-cunet), but (significantly) slower than vstrt. Owing to its C++ implementation, it's generally faster than Python based ncnn implementations.
    • Hint: If your GPU has enough memory, please consider setting num_streams>1 to extract more performance.
    • Even though it's possible to use software based Vulkan implementations (as we did in the GHA tests), if you want to do CPU-only inference, it's much better to use vsov-cpu (or vsort-cpu).
  • Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 2~8x faster. Now there shouldn't be any reasons not to use vs-mlrt.

Benchmark

Configuration: NVIDIA RTX 3090, driver 516.94, windows server 2019, vs r60, python 3.10.7, 1080p fp16

Backends: ncnn-vk, ort-cuda, trt from vs-mlrt v10, dpir-ncnn v2.0.0, w2xncnnvk r2

Data format: fps / GPU memory usage (MB)

dpir color

backend 1 stream 2 streams
ncnn-vk 4.33/3347 4.72/6119
ort-cuda 4.56/3595
trt 10.64/2595 11.10/4593
dpir-ncnn 3.68/3326

waifu2x upconv_7

backend 1 stream 2 streams
ncnn-vk 9.46/6820 14.71/13468
ort-cuda 12.10/6411 13.98/11273
trt 21.32/3317 29.10/ 5053
w2xncnnvk 6.68/6931 12.70/13626

waifu2x cunet

backend 1 stream 2 streams
ncnn-vk 1.46/11908 1.53/23574
ort-cuda 4.85/ 8793 5.18/16231
trt 11.60/ 4960 15.60/ 9057
w2xncnnvk 1.38/11966 1.58/23687

realesrgan v2/v3

backend 1 stream 2 streams
ncnn-vk 7.23/2781 8.35/5330
ort-cuda 9.05/2669 10.18/4539
trt 15.93/1667 19.58/2543

v10.pre

14 Sep 10:20
Compare
Choose a tag to compare
v10.pre Pre-release
Pre-release

This is a pre-release for testing & benchmarking purposes only.
For production use, please use the official v10 release.

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

  • Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete). Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
  • Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.pre.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 3x faster. Now there shouldn't be any reasons not to use vs-mlrt.

v9.2

07 Aug 07:48
Compare
Choose a tag to compare

Fixed issues

  • In vs-mlrt v9 and v9.1 on windows, the ORT_CUDA backend may fails for out of memory when processing a noninitial frame. This has been fixed and the performance should be improved.
  • Parameter use_cuda_graph of the ORT_CUDA backend now works properly on windows. It is however not recommended to use currently.

Full Changelog: v9.1...v9.2