Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RIFE v2 model 4.7+ not working with static_shape=False #72

Open
aloola18 opened this issue Nov 18, 2023 · 15 comments
Open

RIFE v2 model 4.7+ not working with static_shape=False #72

aloola18 opened this issue Nov 18, 2023 · 15 comments

Comments

@aloola18
Copy link

with static_shape=False
All Rife v1 works
Rife v2 4.6- work

Rife v2 4.7+ not working

I tried to set workspace=1024 still did not work

trtexec_231118_143522.log

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Nov 18, 2023

Thanks, I can reproduce the problem.

log[11/18/2023-16:04:51] [V] [TRT] --------------- Timing Runner: /encode/encode.6/ConvTranspose (CaskDeconvolution[0x8000000a]) [11/18/2023-16:04:51] [V] [TRT] CaskDeconvolution has no valid tactics for this config, skipping

It would be better if you could set verbose=True in backend.TRT() for a more detailed log.

I will go to check whether the problem is related to specific version of TensorRT now.

@aloola18
Copy link
Author

here is my full logs
trtexec_231118_155338.log

@WolframRhodium
Copy link
Contributor

I have reported this issue to NVIDIA. Let's see how they reply.

@WolframRhodium
Copy link
Contributor

They said they're working on it.

TensorRT 9.2.0 released today still suffers from this problem.

@netExtra
Copy link

netExtra commented Dec 4, 2023

What difference would static_shape=False make? I've looked into the differences between static and dynamic shapes and I kind of get the idea. But I want to know practically speaking will it make any difference when I'm running these with SVP?

@KLC04
Copy link

KLC04 commented Dec 4, 2023

What difference would static_shape=False make? I've looked into the differences between static and dynamic shapes and I kind of get the idea. But I want to know practically speaking will it make any difference when I'm running these with SVP?

It simply means that you don't have to build an engine everytime you change resolution

@KLC04
Copy link

KLC04 commented Dec 4, 2023

As a sidenote. Rife v4.13 is re-released with a new architecture from hwzer, might be useful to re-export onnx to see if this issue may be fixed?

@WolframRhodium
Copy link
Contributor

I have already implemented a fix in a similar way as the re-release. It does not change the model architecture but simply rename weights in the model. This is an issue of TensorRT rather than rife itself.

@WolframRhodium
Copy link
Contributor

TensorRT 9.3.0 released today still suffers from this problem.

@WolframRhodium
Copy link
Contributor

TensorRT 10.0.0 released today still suffers from this problem.

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Mar 27, 2024

onnx files remain unchanged.

For trt, you only need to update the files vstrt.dll and vsmlrt.py, and the whole folder vsmlrt-cuda.

Optionally, you can go to folders rife(_v2) and delete all .engine, .cacahe and .lock files, because engines for older version of trt cannot (by default) be used by newer version of trt.

@netExtra
Copy link

netExtra commented Mar 27, 2024

Optionally, you can go to folder models/rife(_v2) and delete all .engine, .cacahe and .lock file, because engines for older version of trt cannot (by default) be used by newer version of trt.

Thanks. this is the answer I was looking for. I remember deleting the engines for previous versions but I just wanted to be clear.

@netExtra
Copy link

onnx files remain unchanged.

For trt, you only need to update the files vstrt.dll and vsmlrt.py, and the whole folder vsmlrt-cuda.

Optionally, you can go to folders rife(_v2) and delete all .engine, .cacahe and .lock files, because engines for older version of trt cannot (by default) be used by newer version of trt.

Apologies but do we know why Tensor 10.0 affects Rife so negatively?

@WolframRhodium
Copy link
Contributor

I don't know.

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Apr 1, 2024

The original problem should be fixed in TensorRT 10.0.1.

On the other hand, I have not received a response for the performance regression bug report. I suspect that is due to a premature compiler optimization that offloads parts of the computational graph (related to /GridSample_3) to a worker stream and breaks operator fusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants