Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the FPS of RITnet #11

Closed
QJieWang opened this issue Apr 15, 2023 · 8 comments
Closed

Question about the FPS of RITnet #11

QJieWang opened this issue Apr 15, 2023 · 8 comments

Comments

@QJieWang
Copy link

Hello, I have some questions about the speed of RITnet. The paper reported a speed of 301HZ on a 1080ti, but when I tested it on a 3090, the highest speed I achieved was only 191FPS.
ce67123599595657d03e38b4ae313d9
I am very curious about this huge difference. Is the speed of 301FPS the result of further model compression? Here is the code I used to calculate the FPS.

import numpy as np
import time
import torch
from model import model_dict
import os


model = model_dict["densenet"]


model = model.to(device="cuda:1")
# random input
dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float).to(device="cuda:1")
_ = model(dummy_input)

model.eval()
# torch.cuda.synchronize()

# Run model, calculate FPS
num_iterations = 100  
total_time = 0
for i in range(num_iterations):
    start_time = time.time()
    output_tensor = model(dummy_input)
    end_time = time.time()
    total_time += end_time - start_time

fps = num_iterations / total_time
print("FPS: {:.2f}".format(fps))
@QJieWang
Copy link
Author

Actually, there is a third-party speed test for RITnet. In "Semantic Segmentation of the Eye With a Lightweight Deep Network and Shape Correction," it was mentioned that they tested RITnet on a 1080ti with 1440 images, which took 22.75 seconds, roughly equivalent to around 63.3 FPS. Therefore, I'm very curious about how the segmentation speed of 301 Hz mentioned in the paper was achieved. Was the model compressed or quantized? Or, perhaps, was the batch size mistakenly included when calculating FPS?
image

@gabrielDiaz-performlab
Copy link
Collaborator

gabrielDiaz-performlab commented Apr 15, 2023 via email

@RSKothari
Copy link

Hello @gabrielDiaz-performlab , @QJieWang , I did respond to @QJieWang via email but it seems you weren't CC'ed in his original email. Please see my response as below.

I reviewed your code sample and have a suggestion. Please consider changing your dummy data to torch.float32. As far as I recall, the model was designed to work with 32-bit eye images (or possibly 16-bit images—I'm not certain, but Aayush can confirm this). Your approach to computing FPS is also correct.

After casting your data to either 32-bit or 16-bit, please get back to us with the results.

As a follow-up to my previous email, I also recommend running approximately 10k iterations and recording the time intervals in a list. Once you've collected the delta time intervals, you can calculate the median duration and use it to report the FPS based on the median value. The median value is a more accurate measure to assess real time capabilities.

@QJieWang
Copy link
Author

The RITnet is designed to work with 32-bit eye images. In Torch, torch.float32 and Torch.float are equivalent. For clarification, I modified the experimental code and output the data type of dummy_input during the experiment. However, the speed still did not change during the experiment.
image
I also added some additional metrics, such as the maximum, minimum, first, last, and median values of FPS to evaluate the model's FPS. However, unfortunately, I still cannot achieve a speed surpassing 191 FPS on 3090.
image
Here is my test code :

import numpy as np
import time
import torch
from model import model_dict
import os
from tqdm import tqdm

model = model_dict["densenet"]


model = model.to(device="cuda:2")
# random input
dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float).to(device="cuda:2")
# dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float32).to(device="cuda:2")
print(F"the Type of dummy_input is {dummy_input.dtype} ")
_ = model(dummy_input)

model.eval()
# torch.cuda.synchronize()

# Run model, calculate FPS
FPS_list = []
for test in tqdm(range(100)):
    torch.cuda.synchronize()
    num_iterations = 100
    total_time = 0
    for i in range(num_iterations):
        start_time = time.time()
        output_tensor = model(dummy_input)
        end_time = time.time()
        total_time += end_time - start_time

    fps = num_iterations / total_time
    FPS_list.append(fps)
FPS = np.array(FPS_list)
print("The First Test FPS: {:.2f}".format(FPS[0]))
print("The MAX FPS: {:.2f}".format(max(FPS)))
print("The MIN FPS: {:.2f}".format(min(FPS)))
print("The Last Test FPS: {:.2f}".format(FPS[-1]))
print("The Median FPS: {:.2f}".format(np.median(FPS)))
# if set dummy_input torch.float16 it will raise error
# Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same
# random input
# dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float16).to(device="cuda:2")
# print(F"the Type of dummy_input is {dummy_input.dtype} ")
# _ = model(dummy_input)

# model.eval()
# # torch.cuda.synchronize()

# # Run model, calculate FPS
# num_iterations = 100
# total_time = 0
# for i in range(num_iterations):
#     start_time = time.time()
#     output_tensor = model(dummy_input)
#     end_time = time.time()
#     total_time += end_time - start_time

# fps = num_iterations / total_time
# print("FPS: {:.2f}".format(fps))

@AayushKrChaudhary
Copy link
Owner

AayushKrChaudhary commented Apr 16, 2023

The code looks good. The forward pass is 300 fps and it was tested multiple times. I have couple of suggestions.

  1. Compute for around 10000 iterations instead of 100.
  2. Ignore first few iterations time. I have seen the time to be different to initially start. If possible just check the end_time-start_time of the last completed iteration to have an understanding of what is last per frame computation.

Regardless, the time was computed in similar fashion as you did except the number of iterations were large.

Regarding comparison in the other papers, the test were mostly complete image reading, image preprocessing and the forward pass which drops the speed to around 60 fps. The code is in python and the speed can be improved using C++ and using half the resolution.

@QJieWang
Copy link
Author

  1. Regarding the 10,000 iterations, although I do not believe that increasing the iterations from 100 to 10,000 can achieve a segmentation speed of over 300FPS from the maximum segmentation speed of 190FPS, if you think it needs to be proven, I have increased the number of iterations. In order to avoid interference from lower speeds, I selected the top 10 data for analysis from the 10,000 iterations. The fact is that even on the 3090, a speed of 300FPS cannot be achieved.
    image

  2. Regarding the experiments in other papers, which also achieved a speed of only about 63FPS on the same device (1080ti), They increased the image preprocessing and postprocessing. However, in fact, RITnet does not have any relevant operations for image postprocessing. "Semantic Segmentation of the Eye With a Lightweight Deep Network and Shape Correction" pursued a small number of model parameters, resulting in very poor segmentation results, so postprocessing was added to improve the segmentation result, which is different from RITnet. RITnet's segmentation result is very good and does not require postprocessing. Therefore, the measurement of RITnet's FPS only adds the operation of loading the image. I believe that this operation alone cannot reduce 300FPS to 63FPS, which is simply unrealistic.

  3. Finally, since the RITnet paper does not describe specific experimental operations, it only gives the conclusion that a segmentation speed of 301HZ was achieved on the 1080ti, which is very different from my experimental results. Although RITnet's segmentation result is very good and achieves top 1, personally, I tend to believe that when the author measured the FPS, the batch size was included in the calculation, resulting in an expansion of FPS. For example, sending 5 images to the 1080ti at a time and returning the result in real-time will result in FPS data being expanded by a factor of 5. Although this operation is reasonable and correct for related eye equipment, it does not conform to the definition of FPS in CV.

RITnet is a great work and has given me a lot of inspiration, but my doubts about the amazing 300FPS segmentation speed of RITnet have not been resolved.
Here is my test code :

import numpy as np
import time
import torch
from model import model_dict
import os
from tqdm import tqdm
import pickle
model = model_dict["densenet"]


model = model.to(device="cuda:2")
# random input
dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float).to(device="cuda:2")
# dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float32).to(device="cuda:2")
print(F"the Type of dummy_input is {dummy_input.dtype} ")
_ = model(dummy_input)

model.eval()
# torch.cuda.synchronize()

# Run model, calculate FPS
FPS_list = []
for test in tqdm(range(10000)):
    torch.cuda.synchronize()
    num_iterations = 100
    total_time = 0
    for i in range(num_iterations):
        start_time = time.time()
        output_tensor = model(dummy_input)
        end_time = time.time()
        total_time += end_time - start_time

    fps = num_iterations / total_time
    FPS_list.append(fps)
# sort all the FPS,and take the top 10
FPS_list = sorted(FPS_list, reverse=True)[:10]
FPS = np.array(FPS_list)
print("The First Test FPS: {:.2f}".format(FPS[0]))
print("The MAX FPS: {:.2f}".format(max(FPS)))
print("The MIN FPS: {:.2f}".format(min(FPS)))
print("The Last Test FPS: {:.2f}".format(FPS[-1]))
print("The Median FPS: {:.2f}".format(np.median(FPS)))
# if set dummy_input torch.float16 it will raise error
# Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same
# random input
# dummy_input = torch.randn(1, 1, 640, 400, dtype=torch.float16).to(device="cuda:2")
# print(F"the Type of dummy_input is {dummy_input.dtype} ")
# _ = model(dummy_input)

# model.eval()
# # torch.cuda.synchronize()

# # Run model, calculate FPS
# num_iterations = 100
# total_time = 0
# for i in range(num_iterations):
#     start_time = time.time()
#     output_tensor = model(dummy_input)
#     end_time = time.time()
#     total_time += end_time - start_time

# fps = num_iterations / total_time
# print("FPS: {:.2f}".format(fps))

@RSKothari
Copy link

@QJieWang Your approach towards computing FPS seems correct. I personally wouldn't be opposed if you reported RITnet FPS performance using the above mentioned approach. We will conduct another round of verification on our end at a later date and provide an updated RITnet FPS update on GIThub (especially regarding the batchsize). Since the FPS does not change our message, academic contributions or core concept, we still stand by RITnet's evaluation. We hope this unblocks you / your analysis.

@QJieWang
Copy link
Author

QJieWang commented Apr 18, 2023

Thank you very much, RITnet is an excellent work. I apologize for bothering you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants