Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The inference time is slower than that reported in the paper #9

Open
danqu130 opened this issue Sep 26, 2021 · 4 comments
Open

The inference time is slower than that reported in the paper #9

danqu130 opened this issue Sep 26, 2021 · 4 comments

Comments

@danqu130
Copy link

I have test the MIMO-UNet and MIMO-UNet+ on a single 2080Ti card (Theoretical performance is higher than TitanXp), which takes about 15ms and 30ms. I didn't make any changes to the open source code, just run the test command (https://github.com/chosj95/MIMO-UNet#test) directly.

python main.py --model_name "MIMO-UNet" --mode "test" --data_dir "dataset/GOPRO" --test_model "MIMO-UNet.pkl"

Namespace(batch_size=4, data_dir='dataset/GOPRO', gamma=0.5, learning_rate=0.0001, lr_steps=[500, 1000, 1500, 2000, 2500, 3000], mode='test', model_name='MIMO-UNet', model_save_dir='results/MIMO-UNet/weights/', num_epoch=3000, num_worker=8, print_freq=100, result_dir='results/MIMO-UNet/result_image/', resume='', save_freq=100, save_image=False, test_model='MIMO-UNet.pkl', valid_freq=100, weight_decay=0)

For MIMO-UNet:

==========================================================
The average PSNR is 31.73 dB
Average time: 0.015028

And for MIMO-UNet+

==========================================================
The average PSNR is 32.45 dB
Average time: 0.030238
  1. Why the time of 8ms/17ms reported in the paper cannot be reproduced?
  2. Why the asynchronous inference time on 2080Ti or 3090(https://github.com/chosj95/MIMO-UNet#gpu-syncronization-issue-on-measuring-inference-time) are slower than Titan XP(https://github.com/chosj95/MIMO-UNet#performance)?

In addition, I think the CUDA synchronized time should be used when reporting the time performance. The unsynchronized time can not correctly measure the speed and complexity of the model.

@danqu130
Copy link
Author

And I also test the synchronized time on 2080Ti.

            input_img = input_img.to(device)
            torch.cuda.synchronize()
            tm = time.time()

            pred = model(input_img)[2]
            torch.cuda.synchronize()
            elapsed = time.time() - tm
            adder(elapsed)

            pred_clip = torch.clamp(pred, 0, 1)

For MIMO-UNet:

==========================================================
The average PSNR is 31.73 dB
Average time: 0.209198

And for MIMO-UNet+

==========================================================
The average PSNR is 32.45 dB
Average time: 0.459141

This result is consistent with the performance gap between 2080ti and 3090. But I am still confused about the performance on Titan XP.

@danqu130
Copy link
Author

I also test MT-RNN and MPRNet on the same 2080Ti PC.

For MT-RNN, the asynchronous inference time and synchronized time is 46ms and 480ms, respectively. The time reported in MT-RNN paper is 0.07s on Titan V.
For MPRNet, the asynchronous inference time and synchronized time is 150ms and >1500ms, respectively. The time reported in MPRNet paper is 0.18s on Titan XP.
The theoretical performance of these two graphics cards is worse than 2080Ti, so it makes sense that I got faster asynchronous inference time.

But what confused me is that I got longer asynchronous time consumption (15ms/30ms on 2080Ti) with your results (8ms/17ms on Titan XP) reported in your paper.

@chosj95
Copy link
Owner

chosj95 commented Oct 12, 2021

Thank you for your interest in our work.

The inference time reported in the manuscript was measured in the following HW/SW environments, and the log file for this experiment can be found at the following link.

Hardware: TITAN XP(GPU), intel i5-8400 (CPU)
Software: Pytorch (1.4), CUDA (10.0), OS(Ubuntu 18.04).
Log file: MIMO-UNet , MIMO-UNet+

Please note that depending on the version of Pytorch or Cuda, the change in inference time may be different for each network as discussed in CUDA, Pytorch.

Best,

@danqu130
Copy link
Author

danqu130 commented Oct 12, 2021

I can't view the log directly.

Not for your paper, but for the whole community of image deblurring. What is your opinion on which time should be reported in an academic paper?

I think the unsynchronized time reported in existing methods will cause misunderstanding, especially the time less than 30ms which meets the real-time requirements. In fact, as can be seen from the above experiments, these model can only run less than 5 FPS, and can not be applied to practical applications with real-time requirements.

I will also raise this issue to others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants