Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency, throughput, GPU performance #3

Closed
chris-ha458 opened this issue Jul 6, 2020 · 3 comments
Closed

Latency, throughput, GPU performance #3

chris-ha458 opened this issue Jul 6, 2020 · 3 comments

Comments

@chris-ha458
Copy link

chris-ha458 commented Jul 6, 2020

Since ReXNet is based on MNASnet architectures(mobile et v1 and v2) I guess that they suffer from the same low throughput low GPU performance issue.
Can you provide any numbers?

I am specifically interested in per image and per batch latency, throughput(images/sec) depending on hardware such as cpu, arm processor, GPU etc.

I know this is a lot to ask for but I believe that it will be valuable to others researchers too and any kind of numbers would be helpful.

Thankfully, this paper focuses more on the design principles which maybe applicable to other gpu friendly sota architectures such as Tresnet or Resnest.

안녕하세요
좋은 논문과 코드 감사드립니다.
논문에서 주로 제시된 rexnet들은 mobilenet기반이라 원래 모델의 gpu 성능 한계를 그대로 유지할 것이라고 생각됩니다.

이에대한 비교가 이루어졌나요? 실제 수치(gpu 배치 크기당 초당 이미지 처리속도 등)가 제공된다면 감사하겠습니다.

사실 그부분에 제시된 모델자체는 제한이있더라도 설계방법론자체는 다른 gpu 효과적 모델에 적용이가능할것같아서 고무적입니다.

감사합니다.

@dyhan0920
Copy link
Collaborator

Thank you for your interest in our work! As you mentioned, the level of optimization of depthwise convolution has been affecting the latency of MobileNet-style models.

Before updating all ReXNets' latencies, we would like to provide the latency of ReXNet-1.0x tested in an M40 GPU via PyTorch and got about 17.2ms per single image. Additionally, we tested EfficientNetB0 in the same setting and got about 19.0ms per single image latency.

We will update more detailed numbers for all ReXNets soon.

@shinya7y
Copy link

Hi,
Thank you for the wonderful work! I'm interested in rank-based channel configuration (https://arxiv.org/abs/1909.04021) and ReXNets' latencies.

Does latency/accuracy trade-offs improve if we round channels to hardware-friendly multiples?
Example: int(round(inplanes * width_mult)) to int(round(inplanes * width_mult // 4) * 4)

In that case, I would appreciate it if you could provide the results and pretrained models.

@dyhan0920
Copy link
Collaborator

@shinya7y Thanks for the great suggestion. Actually, providing pretrained models with hardware-friendly channel settings is one of the high priorities in our task list. The models will be released soon through this repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants