23 Nov 09:19

github-actions

6a843b9

v12.2 Pre-release

Pre-release

Update vsmlrt.py:

Introduce a new release artifact ext-models.v12.2.7z, which comes from External Models, and it's not bundled into full binary release packages (i.e. the cpu, cuda and vk packages). Please refer to their release notes for details on how to use those models.
Export a new API vsmlrt.inference for inference of custom models.
```
import vsmlrt
output = vsmlrt.inference(clips, "path/to/onnx", backend=vsmlrt.Backend.TRT(fp16=True))
```
If you encounter issues like Cannot find input tensor with name "input" in the network inputs! Please make sure the input tensor names are correct., you could use vsmlrt.inference(..., input_name=None) or export the model with its input name set to "input".
Fix trt inference of cugan-pro (3x) models. (#15)

Assets 13

07 Dec 07:20

WolframRhodium

external-models

7352920

External Models Pre-release

Pre-release

More models!

In addition to bundled models, vs-mlrt can also be used to run these models:

anime-segmentation/isnet_is.onnx: anime character segmentation at a0a563c, RGBS -> GRAYS, requires mod64 input
oidn/rt_ldr.onnx: image denoising from Intel® Open Image Denoise library, RGBS, requires mod16 input
ppocr/ml_PP-OCRv3_det.onnx: multilingual text detection model from PaddleOCR, RGBS -> GRAYS, requires mod32 input
waifu2x swin_unet: waifu2x's swin_unet models. It's supported by the Python wrapper with vsmlrt.Waifu2xModel.{swin_unet_art,swin_unet_art_scan,swin_unet_photo{v2}}.
- file list:
  - waifu2x/swin_unet_art/{scale2x, scale4x, noise0, noise0_scale2x, ..., noise3_scale4x}.onnx
  - waifu2x/swin_unet_art_scan/{scale4x, noise0_scale4x, ..., noise3_scale4x}.onnx
  - waifu2x/swin_unet_photo{_v2}/{scale4x, noise0_scale4x, ..., noise3_scale4x}.onnx]
RIFE (ensemble) (source):
- 4.0~4.6 ensemble
- 4.0~4.6 v2
- 4.7
- 4.8
- 4.9
- 4.10
- 4.11
- 4.12
- 4.12 lite
- 4.13
- 4.13 lite
- 4.14
- 4.14 lite
- 4.15
- 4.15 lite
- 4.16 lite
- 4.17
v2 models handle paddings internally and reduce PCIe traffic flow.
safa/{safa_{v0.1,v0.2,v0.3,v0.4}_{non_adaptive,adaptive1x,adaptive}.onnx: SAFA video enhancement models. Individually packaged.
scunet: SCUNet denoisig models.
ArtCNN/ArtCNN_{C4F32,C16F64}{_Chroma,_DS}: ArtCNN models for anime super-resolution and restoration.

With more to come.

Also check onnx models provided by the avs-mlrt community.

Usage

If an external model is not supported by the Python wrapper, you can use the generic vsmlrt.inference API to run these models (requires release v12.2 or later).

import vsmlrt
output = vsmlrt.inference(rgbs, "path/to/onnx", backend=vsmlrt.Backend.TRT(fp16=True))

The rife model requires auxiliary inputs and should be used from vsmlrt.RIFE or vsmlrt.RIFEMerge interface.

Assets 30

16 Nov 10:43

github-actions

v12.1

890dfe2

v12.1 Pre-release

Pre-release

This minor release fixes #9: now if vsort/vstrt fails to load required cuda DLLs, they won't crash the entire process.

However, if vs-mlrt is correctly installed, this shouldn't happen. Please report an issue if you can't access the core.trt or core.ort namespaces. Common mistake is forgetting to extract the vsmlrt-cuda.v12.1.7z package for VSORT-Windows-x64.v12.1.7z or VSTRT-Windows-x64.v12.1.7z packages. If in doubt, use the fully bundled release vsmlrt-windows-x64-cuda.v12.1.7z for CUDA users.

Note: we explicitly do not support using both pytorch and vs-mlrt plugins in the same vpy script as pytorch uses its own set of cuda DLL which might be in conflict with the ones vs-mlrt uses. As those DLLs are not explicitly versioned (e.g. nvinfer.dll instead of nvinfer-x.yz.dll), there is nothing we can do.

Assets 12

01 Nov 10:57

github-actions

v12

b96fd10

v12: latest CUDA libraries

Compared to v11, this release updated CUDA dependencies to CUDA 11.8.0, cuDNN 8.6.0 and TensorRT 8.5.1:

Added support for the NVIDIA 40 series GPUs.
Added support for RIFE on the trt backend.

Known issue

Performance of the OV_CPU or ORT_CUDA(fp16=True) backends for RIFE is lower than expected, which is under investigation. Please consider ORT_CPU or ORT_CUDA(fp16=False) for now.
The NCNN_VK backend does not support RIFE.

Installation Notes

For some advanced features, vsmlrt.py requires numpy and onnx packages to be available. You might need to run pip install onnx numpy.

Benchmark

previous benchmark

Configuration: NVIDIA RTX 3090, driver 526.47, windows server 2019, vs r60, python 3.11.0, 1080p fp16

Backends: ort-cuda, trt from vs-mlrt v12.

For the trt backend, the engine is created without CUDA_MODULE_LOADING=LAZY environment variable and with it during benchmarking to reduce device memory consumption.

Data format: fps / GPU memory usage (MB)

rife(model=44, 1920x1088)

backend	1 stream	2 streams
ort-cuda	53.62/1771	83.34/2748
trt	71.30/ 626	107.3/ 962

dpir color

backend	1 stream	2 streams
ort-cuda	4.64/3230
trt	10.32/1992	11.61/3475

waifu2x upconv_7

backend	1 stream	2 streams
ort-cuda	11.07/5916	15.04/10899
trt	18.38/2092	31.64/ 3848

waifu2x cunet

backend	1 stream	2 streams
ort-cuda	4.63/8541	5.32/16148
trt	11.44/4771	15.59/ 8972

realesrgan v2/v3

backend	1 stream	2 streams
ort-cuda	8.84/2283	11.10/4202
trt	14.59/1324	21.37/2174

Assets 12

26 Oct 00:37

github-actions

v11

afa5399

v11 RIFE support

Added support for the RIFE video frame interpolation algorithm.

There are two APIs for RIFE:

vsmlrt.RIFE is a high-level API for interpolating a clip. set the multi argument to specify the fps factor. Just remember to perform scene detection on the input clip.
vsmlrt.RIFEMerge is a novel temporal std.MaskedMerge-like interface for RIFE. Use it if you want to precisely control the frames and/or time point for the interpolation.

Known issues

vstrt doesn't support RIFE for the moment¹. The next release of TensorRT should include RIFE support and we will release v12 when that happens.
vstrt backend also doesn't yet support latest RTX 4000 series GPUs. This will be fixed after upgrading to the upcoming TensorRT 8.5 release. RTX 4000 series GPU owners please use other the other CUDA backends.
Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB².

It's missing grid_sample operator support, see https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md. ↩
this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes ↩

Assets 12

23 Sep 07:08

github-actions

v11.test

fc22c89

v11.test Pre-release

Pre-release

internal testing only.

Added support for the RIFE video frame interpolation algorithm. Some features are still being implemented. The Python RIFE model wrapper interface is still subject to change.

Known issue

Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB¹.

this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes ↩

Assets 12

23 Sep 07:22

WolframRhodium

model-20220923

bf84bcb

Model Release 20220923, RIFE model Pre-release

Pre-release

New modules (compared to previous model release):

RIFE v4.0 from vs-rife v2.0.0. rife/rife_v4.0.onnx, config: fastmode=True, ensemble=False
RIFE v4.2, v4.3, v4.4, v4.5, v4.6, v4.7, v4.8, v4.9, v4.10 from Practical-RIFE. rife/rife_{v4.2,v4.3,v4.4,v4.5,v4.6,v4.7,v4.8,v4.9,v4.10}.onnx, config: fastmode=True, ensemble=False
Other provided RIFE models can be found here, including v2 representation of RIFE v4.7-v4.10 models. Sorry for the inconvenience.

Notes:

For RIFE on ort-gpu, vs-mlrt v11 or later is suggested for best performance. And (as of v11), only ov-cpu, ort-cpu, ort-cuda, trt (pending new TensorRT release) support RIFE. Specifically, ncnn-vk do not support RIFE due to missing gridsample op.

Assets 3

15 Sep 11:02

github-actions

v10

babf997

v10: new vulkan based vsncnn (AMD GPU supported)

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete).
- Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
- Please refer to the benchmark below for performance details. Tl;dr it's comparable to vsort-cuda on most networks (except waifu2x-cunet), but (significantly) slower than vstrt. Owing to its C++ implementation, it's generally faster than Python based ncnn implementations.
- Hint: If your GPU has enough memory, please consider setting num_streams>1 to extract more performance.
- Even though it's possible to use software based Vulkan implementations (as we did in the GHA tests), if you want to do CPU-only inference, it's much better to use vsov-cpu (or vsort-cpu).
Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 2~8x faster. Now there shouldn't be any reasons not to use vs-mlrt.

Benchmark

Configuration: NVIDIA RTX 3090, driver 516.94, windows server 2019, vs r60, python 3.10.7, 1080p fp16

Backends: ncnn-vk, ort-cuda, trt from vs-mlrt v10, dpir-ncnn v2.0.0, w2xncnnvk r2

Data format: fps / GPU memory usage (MB)

dpir color

backend	1 stream	2 streams
ncnn-vk	4.33/3347	4.72/6119
ort-cuda	4.56/3595
trt	10.64/2595	11.10/4593
dpir-ncnn	3.68/3326

waifu2x upconv_7

backend	1 stream	2 streams
ncnn-vk	9.46/6820	14.71/13468
ort-cuda	12.10/6411	13.98/11273
trt	21.32/3317	29.10/ 5053
w2xncnnvk	6.68/6931	12.70/13626

waifu2x cunet

backend	1 stream	2 streams
ncnn-vk	1.46/11908	1.53/23574
ort-cuda	4.85/ 8793	5.18/16231
trt	11.60/ 4960	15.60/ 9057
w2xncnnvk	1.38/11966	1.58/23687

realesrgan v2/v3

backend	1 stream	2 streams
ncnn-vk	7.23/2781	8.35/5330
ort-cuda	9.05/2669	10.18/4539
trt	15.93/1667	19.58/2543

Assets 12

14 Sep 10:20

github-actions

v10.pre

babf997

v10.pre Pre-release

Pre-release

This is a pre-release for testing & benchmarking purposes only.
For production use, please use the official v10 release.

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete). Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.pre.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 3x faster. Now there shouldn't be any reasons not to use vs-mlrt.

Assets 12

07 Aug 07:48

github-actions

v9.2

afcf5e6

v9.2

Fixed issues

In vs-mlrt v9 and v9.1 on windows, the ORT_CUDA backend may fails for out of memory when processing a noninitial frame. This has been fixed and the performance should be improved.
Parameter use_cuda_graph of the ORT_CUDA backend now works properly on windows. It is however not recommended to use currently.

Full Changelog: v9.1...v9.2

Assets 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More models!

Usage

Known issue

Installation Notes

Benchmark

rife(model=44, 1920x1088)

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

Known issues

Known issue

Release Highlight

Major features

Benchmark

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

Release Highlight

Major features

Fixed issues

Releases: AmusementClub/vs-mlrt

v12.2

External Models

More models!

Usage

v12.1

v12: latest CUDA libraries

Known issue

Installation Notes

Benchmark

rife(model=44, 1920x1088)

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

v11 RIFE support

Known issues

v11.test

Known issue

Model Release 20220923, RIFE model

v10: new vulkan based vsncnn (AMD GPU supported)

Release Highlight

Major features

Benchmark

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

v10.pre

Release Highlight

Major features

v9.2

Fixed issues