Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 implementation models for RIFE v4.9 #66

Closed
charlessuh opened this issue Nov 1, 2023 · 8 comments
Closed

v2 implementation models for RIFE v4.9 #66

charlessuh opened this issue Nov 1, 2023 · 8 comments

Comments

@charlessuh
Copy link

RIFE v2 models which handle paddings internally and reduce memory transactions on heterogeneous devices.

@WolframRhodium Would it be possible for you to generate v2 implementation models for RIFE v4.9, or describe how someone could generate it themselves?

Thanks!

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Nov 1, 2023

Thanks for your interest.
rife_v4.9 (v2).zip

rife v1 and v2 representation in onnx in vs-mlrt differs by representation of some constant tensors. Related tensors include tenHorizontal, tenVertical, (tenInput.shape[3] - 1.0) / 2.0 and (tenInput.shape[2] - 1.0) / 2.0.

In v1 representation, these tensors are provided outside of the onnx here. Since at present the vstrt plugin implementation does not provide a way to cache input data, even though these tensor values are the same for the processing of different frames, the plugin will still copy them from host memory to gpu memory each time, which generates significant useless memory and pcie traffic.

In v2 representation, these tensors are generated inside onnx on-the-fly, but it seems the accuracy loss is not tolerable.

For the generation of onnx, I take the rife pytorch representation, manually optimize it and convert to onnx. The graph optimization is not unique and everyone may come up with different onnx representations.

@charlessuh
Copy link
Author

charlessuh commented Nov 2, 2023

Awesome, thanks!

After some trial and error, I think the jaggedness / aliasing is correlated with layers that cast int64 -> float32.

I tried using polygraphy with TensorRT v9 to generate an engine that runs those layers (and a few immediate operations after) in higher precision (float32):

%LOCALAPPDATA%\Programs\Python\Python310\Scripts\polygraphy.exe convert --convert-to trt --precision-constraints obey --trt-npps process.py --input-shapes=input:[1,7,2144,3840] --fp16 --tensor-dtypes input:float16 output:float16 -o saved.engine rife_v4.9.onnx

I just have a dumb process.py script for now which is hardcoded to use range(100, 104), range(110, 114), range(325, 332), range(372, 378).

If anyone else would like to try this, or can inform about any issues doing this, please do so!

@charlessuh
Copy link
Author

Or adding the following arguments if using trtexec:

[
	"--precisionConstraints=obey",
	"--layerPrecisions=/Cast_2:fp32,/Cast_3:fp32,/Cast_5:fp32,/Cast_7:fp32,/Reciprocal:fp32,/Reciprocal_1:fp32,/Mul:fp32,/Mul_1:fp32,/Mul_8:fp32,/Mul_10:fp32,/Sub_5:fp32,/Sub_6:fp32,ONNXTRT_Broadcast_236:fp32,ONNXTRT_Broadcast_238:fp32,ONNXTRT_Broadcast_275:fp32"
]

@WolframRhodium
Copy link
Contributor

Thanks!

WolframRhodium added a commit that referenced this issue Nov 3, 2023
…e v4.7-v4.9 model with v2 representation for TRT backend

#66 (comment)
@charlessuh
Copy link
Author

Oh, cool.

  1. It looks like I forgot ONNXTRT_Broadcast_273:fp32 during my conversion to layer names, but maybe it doesn't matter... :P

  2. Note that I'm using polygraphy.exe inspect model rife_v4.9.onnx --show layers --display-as=trt to get the layer names.

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Nov 3, 2023

In terms of seeking performance of v1, what if you install this plugin by placing it next to vstrt.dll?

It looks like I forgot ONNXTRT_Broadcast_273:fp32 during my conversion to layer names, but maybe it doesn't matter...

I think it is not related with some simple experiments. I have removed other similar constraints because they are not a layer name of the onnx, but invented names from onnx-tensorrt backend, which may introduce compatibility issue in the future.

@charlessuh
Copy link
Author

  1. Oh, doing LoadPlugin("akarin.dll") seems to noticeably improve seek performance a lot with v1. This looks like a good alternative to use! (The only caveat is that v2 seems less sensitive to constrained PCIE bandwidth environments, e.g. if you're using 4.0 x8 instead of x16.)

  2. Yeah, it's not great the names are invented and could change. Unfortunately, I think removing them all results in some minor pixel shifting between frames (if you test a video with perfectly pixel-positioned elements like a logo...)

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Nov 4, 2023

doing LoadPlugin("akarin.dll") seems to noticeably improve seek performance a lot with v1

Without it, a lot of data is computed in native Python without any acceleration.

I think removing them all results in some minor pixel shifting between frames

Yes I also observe that. I am considering it...

EDIT: since TensorRT 9.0, layer name matching with one wildcard is allowed, which should work for some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants