v2 implementation models for RIFE v4.9 #66

charlessuh · 2023-11-01T17:42:48Z

RIFE v2 models which handle paddings internally and reduce memory transactions on heterogeneous devices.

@WolframRhodium Would it be possible for you to generate v2 implementation models for RIFE v4.9, or describe how someone could generate it themselves?

Thanks!

WolframRhodium · 2023-11-01T23:14:25Z

Thanks for your interest.
rife_v4.9 (v2).zip

rife v1 and v2 representation in onnx in vs-mlrt differs by representation of some constant tensors. Related tensors include tenHorizontal, tenVertical, (tenInput.shape[3] - 1.0) / 2.0 and (tenInput.shape[2] - 1.0) / 2.0.

In v1 representation, these tensors are provided outside of the onnx here. Since at present the vstrt plugin implementation does not provide a way to cache input data, even though these tensor values are the same for the processing of different frames, the plugin will still copy them from host memory to gpu memory each time, which generates significant useless memory and pcie traffic.

In v2 representation, these tensors are generated inside onnx on-the-fly, but it seems the accuracy loss is not tolerable.

For the generation of onnx, I take the rife pytorch representation, manually optimize it and convert to onnx. The graph optimization is not unique and everyone may come up with different onnx representations.

charlessuh · 2023-11-02T05:15:26Z

Awesome, thanks!

After some trial and error, I think the jaggedness / aliasing is correlated with layers that cast int64 -> float32.

I tried using polygraphy with TensorRT v9 to generate an engine that runs those layers (and a few immediate operations after) in higher precision (float32):

%LOCALAPPDATA%\Programs\Python\Python310\Scripts\polygraphy.exe convert --convert-to trt --precision-constraints obey --trt-npps process.py --input-shapes=input:[1,7,2144,3840] --fp16 --tensor-dtypes input:float16 output:float16 -o saved.engine rife_v4.9.onnx

I just have a dumb process.py script for now which is hardcoded to use range(100, 104), range(110, 114), range(325, 332), range(372, 378).

If anyone else would like to try this, or can inform about any issues doing this, please do so!

charlessuh · 2023-11-03T07:31:27Z

Or adding the following arguments if using trtexec:

[
	"--precisionConstraints=obey",
	"--layerPrecisions=/Cast_2:fp32,/Cast_3:fp32,/Cast_5:fp32,/Cast_7:fp32,/Reciprocal:fp32,/Reciprocal_1:fp32,/Mul:fp32,/Mul_1:fp32,/Mul_8:fp32,/Mul_10:fp32,/Sub_5:fp32,/Sub_6:fp32,ONNXTRT_Broadcast_236:fp32,ONNXTRT_Broadcast_238:fp32,ONNXTRT_Broadcast_275:fp32"
]

…r TRT backend #66 (comment)

WolframRhodium · 2023-11-03T07:51:48Z

Thanks!

…e v4.7-v4.9 model with v2 representation for TRT backend #66 (comment)

charlessuh · 2023-11-03T19:04:10Z

Oh, cool.

It looks like I forgot ONNXTRT_Broadcast_273:fp32 during my conversion to layer names, but maybe it doesn't matter... :P
Note that I'm using polygraphy.exe inspect model rife_v4.9.onnx --show layers --display-as=trt to get the layer names.

WolframRhodium · 2023-11-03T22:09:47Z

In terms of seeking performance of v1, what if you install this plugin by placing it next to vstrt.dll?

It looks like I forgot ONNXTRT_Broadcast_273:fp32 during my conversion to layer names, but maybe it doesn't matter...

I think it is not related with some simple experiments. I have removed other similar constraints because they are not a layer name of the onnx, but invented names from onnx-tensorrt backend, which may introduce compatibility issue in the future.

charlessuh · 2023-11-04T02:16:19Z

Oh, doing LoadPlugin("akarin.dll") seems to noticeably improve seek performance a lot with v1. This looks like a good alternative to use! (The only caveat is that v2 seems less sensitive to constrained PCIE bandwidth environments, e.g. if you're using 4.0 x8 instead of x16.)
Yeah, it's not great the names are invented and could change. Unfortunately, I think removing them all results in some minor pixel shifting between frames (if you test a video with perfectly pixel-positioned elements like a logo...)

WolframRhodium · 2023-11-04T02:27:10Z

doing LoadPlugin("akarin.dll") seems to noticeably improve seek performance a lot with v1

Without it, a lot of data is computed in native Python without any acceleration.

I think removing them all results in some minor pixel shifting between frames

Yes I also observe that. I am considering it...

EDIT: since TensorRT 9.0, layer name matching with one wildcard is allowed, which should work for some time.

charlessuh closed this as completed Nov 2, 2023

WolframRhodium added a commit that referenced this issue Nov 3, 2023

scripts/vsmlrt.py: fix rife v4.7-v4.9 model with v2 representation fo…

96c6aa9

…r TRT backend #66 (comment)

WolframRhodium added a commit that referenced this issue Nov 3, 2023

scripts/vsmlrt.py: backport custom_env and custom_env and fix rif…

dea3987

…e v4.7-v4.9 model with v2 representation for TRT backend #66 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 implementation models for RIFE v4.9 #66

v2 implementation models for RIFE v4.9 #66

charlessuh commented Nov 1, 2023

WolframRhodium commented Nov 1, 2023 •

edited

charlessuh commented Nov 2, 2023 •

edited

charlessuh commented Nov 3, 2023

WolframRhodium commented Nov 3, 2023

charlessuh commented Nov 3, 2023

WolframRhodium commented Nov 3, 2023 •

edited

charlessuh commented Nov 4, 2023

WolframRhodium commented Nov 4, 2023 •

edited

v2 implementation models for RIFE v4.9 #66

v2 implementation models for RIFE v4.9 #66

Comments

charlessuh commented Nov 1, 2023

WolframRhodium commented Nov 1, 2023 • edited

charlessuh commented Nov 2, 2023 • edited

charlessuh commented Nov 3, 2023

WolframRhodium commented Nov 3, 2023

charlessuh commented Nov 3, 2023

WolframRhodium commented Nov 3, 2023 • edited

charlessuh commented Nov 4, 2023

WolframRhodium commented Nov 4, 2023 • edited

WolframRhodium commented Nov 1, 2023 •

edited

charlessuh commented Nov 2, 2023 •

edited

WolframRhodium commented Nov 3, 2023 •

edited

WolframRhodium commented Nov 4, 2023 •

edited