Latency, throughput, GPU performance #3

chris-ha458 · 2020-07-06T01:42:50Z

Since ReXNet is based on MNASnet architectures(mobile et v1 and v2) I guess that they suffer from the same low throughput low GPU performance issue.
Can you provide any numbers?

I am specifically interested in per image and per batch latency, throughput(images/sec) depending on hardware such as cpu, arm processor, GPU etc.

I know this is a lot to ask for but I believe that it will be valuable to others researchers too and any kind of numbers would be helpful.

Thankfully, this paper focuses more on the design principles which maybe applicable to other gpu friendly sota architectures such as Tresnet or Resnest.

안녕하세요
좋은 논문과 코드 감사드립니다.
논문에서 주로 제시된 rexnet들은 mobilenet기반이라 원래 모델의 gpu 성능 한계를 그대로 유지할 것이라고 생각됩니다.

이에대한 비교가 이루어졌나요? 실제 수치(gpu 배치 크기당 초당 이미지 처리속도 등)가 제공된다면 감사하겠습니다.

사실 그부분에 제시된 모델자체는 제한이있더라도 설계방법론자체는 다른 gpu 효과적 모델에 적용이가능할것같아서 고무적입니다.

감사합니다.

dyhan0920 · 2020-07-08T15:40:50Z

Thank you for your interest in our work! As you mentioned, the level of optimization of depthwise convolution has been affecting the latency of MobileNet-style models.

Before updating all ReXNets' latencies, we would like to provide the latency of ReXNet-1.0x tested in an M40 GPU via PyTorch and got about 17.2ms per single image. Additionally, we tested EfficientNetB0 in the same setting and got about 19.0ms per single image latency.

We will update more detailed numbers for all ReXNets soon.

shinya7y · 2020-07-15T13:24:50Z

Hi,
Thank you for the wonderful work! I'm interested in rank-based channel configuration (https://arxiv.org/abs/1909.04021) and ReXNets' latencies.

Does latency/accuracy trade-offs improve if we round channels to hardware-friendly multiples?
Example: int(round(inplanes * width_mult)) to int(round(inplanes * width_mult // 4) * 4)

In that case, I would appreciate it if you could provide the results and pretrained models.

dyhan0920 · 2020-07-16T03:28:33Z

@shinya7y Thanks for the great suggestion. Actually, providing pretrained models with hardware-friendly channel settings is one of the high priorities in our task list. The models will be released soon through this repository.

chris-ha458 mentioned this issue Jul 6, 2020

Comparison with RegNet #4

Closed

dyhan0920 closed this as completed Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency, throughput, GPU performance #3

Latency, throughput, GPU performance #3

chris-ha458 commented Jul 6, 2020 •

edited

dyhan0920 commented Jul 8, 2020

shinya7y commented Jul 15, 2020

dyhan0920 commented Jul 16, 2020

Latency, throughput, GPU performance #3

Latency, throughput, GPU performance #3

Comments

chris-ha458 commented Jul 6, 2020 • edited

dyhan0920 commented Jul 8, 2020

shinya7y commented Jul 15, 2020

dyhan0920 commented Jul 16, 2020

chris-ha458 commented Jul 6, 2020 •

edited