The inference time is slower than that reported in the paper #9

danqu130 · 2021-09-26T08:06:30Z

I have test the MIMO-UNet and MIMO-UNet+ on a single 2080Ti card (Theoretical performance is higher than TitanXp), which takes about 15ms and 30ms. I didn't make any changes to the open source code, just run the test command (https://github.com/chosj95/MIMO-UNet#test) directly.

python main.py --model_name "MIMO-UNet" --mode "test" --data_dir "dataset/GOPRO" --test_model "MIMO-UNet.pkl"

Namespace(batch_size=4, data_dir='dataset/GOPRO', gamma=0.5, learning_rate=0.0001, lr_steps=[500, 1000, 1500, 2000, 2500, 3000], mode='test', model_name='MIMO-UNet', model_save_dir='results/MIMO-UNet/weights/', num_epoch=3000, num_worker=8, print_freq=100, result_dir='results/MIMO-UNet/result_image/', resume='', save_freq=100, save_image=False, test_model='MIMO-UNet.pkl', valid_freq=100, weight_decay=0)

For MIMO-UNet:

==========================================================
The average PSNR is 31.73 dB
Average time: 0.015028

And for MIMO-UNet+

==========================================================
The average PSNR is 32.45 dB
Average time: 0.030238

Why the time of 8ms/17ms reported in the paper cannot be reproduced?
Why the asynchronous inference time on 2080Ti or 3090(https://github.com/chosj95/MIMO-UNet#gpu-syncronization-issue-on-measuring-inference-time) are slower than Titan XP(https://github.com/chosj95/MIMO-UNet#performance)?

In addition, I think the CUDA synchronized time should be used when reporting the time performance. The unsynchronized time can not correctly measure the speed and complexity of the model.

The text was updated successfully, but these errors were encountered:

danqu130 · 2021-09-26T08:38:21Z

And I also test the synchronized time on 2080Ti.

            input_img = input_img.to(device)
            torch.cuda.synchronize()
            tm = time.time()

            pred = model(input_img)[2]
            torch.cuda.synchronize()
            elapsed = time.time() - tm
            adder(elapsed)

            pred_clip = torch.clamp(pred, 0, 1)

For MIMO-UNet:

==========================================================
The average PSNR is 31.73 dB
Average time: 0.209198

And for MIMO-UNet+

==========================================================
The average PSNR is 32.45 dB
Average time: 0.459141

This result is consistent with the performance gap between 2080ti and 3090. But I am still confused about the performance on Titan XP.

danqu130 · 2021-09-26T10:04:50Z

I also test MT-RNN and MPRNet on the same 2080Ti PC.

For MT-RNN, the asynchronous inference time and synchronized time is 46ms and 480ms, respectively. The time reported in MT-RNN paper is 0.07s on Titan V.
For MPRNet, the asynchronous inference time and synchronized time is 150ms and >1500ms, respectively. The time reported in MPRNet paper is 0.18s on Titan XP.
The theoretical performance of these two graphics cards is worse than 2080Ti, so it makes sense that I got faster asynchronous inference time.

But what confused me is that I got longer asynchronous time consumption (15ms/30ms on 2080Ti) with your results (8ms/17ms on Titan XP) reported in your paper.

chosj95 · 2021-10-12T02:14:05Z

Thank you for your interest in our work.

The inference time reported in the manuscript was measured in the following HW/SW environments, and the log file for this experiment can be found at the following link.

Hardware: TITAN XP(GPU), intel i5-8400 (CPU)
Software: Pytorch (1.4), CUDA (10.0), OS(Ubuntu 18.04).
Log file: MIMO-UNet , MIMO-UNet+

Please note that depending on the version of Pytorch or Cuda, the change in inference time may be different for each network as discussed in CUDA, Pytorch.

Best,

danqu130 · 2021-10-12T14:38:53Z

I can't view the log directly.

Not for your paper, but for the whole community of image deblurring. What is your opinion on which time should be reported in an academic paper?

I think the unsynchronized time reported in existing methods will cause misunderstanding, especially the time less than 30ms which meets the real-time requirements. In fact, as can be seen from the above experiments, these model can only run less than 5 FPS, and can not be applied to practical applications with real-time requirements.

I will also raise this issue to others.

danqu130 mentioned this issue Oct 12, 2021

CUDA synchronized or unsynchronized time swz30/MPRNet#83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The inference time is slower than that reported in the paper #9

The inference time is slower than that reported in the paper #9

danqu130 commented Sep 26, 2021

danqu130 commented Sep 26, 2021

danqu130 commented Sep 26, 2021

chosj95 commented Oct 12, 2021

danqu130 commented Oct 12, 2021 •

edited

Loading

The inference time is slower than that reported in the paper #9

The inference time is slower than that reported in the paper #9

Comments

danqu130 commented Sep 26, 2021

danqu130 commented Sep 26, 2021

danqu130 commented Sep 26, 2021

chosj95 commented Oct 12, 2021

danqu130 commented Oct 12, 2021 • edited Loading

danqu130 commented Oct 12, 2021 •

edited

Loading