Very slow inference in tensorflow #6

jianlong-yuan · 2018-05-17T05:28:24Z

before i use your loss function, 2.5sec/step
after i use your loss function, 32.0sec/step

i use tensorflow 1.6.0

bermanmaxim · 2018-05-17T08:14:08Z

That slowdown seems quite drastic, do you have e.g. many categories or images to evaluate in one step? I suspect that dedicated CUDA kernel would speed up the implementation a lot (better than the large succession of the masking/selecting/sort operations). However I don't plan to tackle this in the near future - contributions in this direction are welcome.

jianlong-yuan · 2018-05-17T13:18:13Z

I just use it on cityscapes dataset. I use distributed computing. model is deeplab v3+

bermanmaxim · 2018-05-19T15:36:26Z

I did some profiling. It seems in tensorflow the tf.cumsum operation is extremely slow on GPU, and takes a huge amount of time (~99% of the total time).

In Pytorch, as expected, the sort operation is the one that takes the most time, cumsum is virtually instant on GPU.

I will investigate a bit more, it might mandate an issue report for tensorflow.

bermanmaxim · 2018-05-19T18:14:56Z

This python notebook summarizes the problem: tensorflow/profile_ops.ipynb
The cumsum operation is ~4000x slower in tensorflow vs pytorch for typical number of pixels/batch.
tensorflow/tensorflow#813 mention that current implementations of cumsum in Eigen (tensorflow) is naïve. A solution would be to write a custom cuda op for this operation. Pointers for cumsum on gpu are given on https://stackoverflow.com/a/25251434/805502.

bermanmaxim · 2018-05-24T20:16:25Z

After looking more into it, it seems the easiest way is to create a custom tensorflow op using cub exclusive sum instead of the native tf.Cumsum operation. Note that there are already operators defined in tensorflow using cub, e.g. topK, so it shouldn't be too difficult to implement this.

I will not implement this for now as I'm mainly using pytorch - I might do it one day but in the meantime I'll tag this as contributions welcome.

ekelsen · 2018-11-17T05:32:53Z

The speed of cumsum has been improved significantly; I'm going to close this. Feel free to re-open if you feel it still isn't fast enough.

ben2789 · 2018-11-18T20:00:28Z

@ekelsen Which version of tensor flow has these improvements?

ekelsen · 2018-11-18T21:16:20Z

Currently just HEAD: tensorflow/tensorflow@73e3215

bermanmaxim · 2018-11-19T12:29:07Z

Thanks for the pointer @ekelsen. Closing this issue

stillwaterman · 2019-01-15T10:23:31Z

@ekelsen , hello, I'm using Keras(backend: tensorflow 1.12) and cuda9.0, but the train speed is still slow with this loss function. Can you give me advice? My GPU is GTX 1080Ti

ben2789 · 2019-01-15T19:02:59Z

@stillwaterman I expect the build of Tensorflow you are using, was made before the changes to cumsum were implemented. Building Tensorflow from source, might be a reasonable option to expedite training.

Z-Ianthe · 2019-03-25T02:26:39Z

@jianlong-yuan hi~I want to use Lovász-Softmax loss in deeplab v3+ but failed. Could you give me some reference or demos? Thanks.

jianlong-yuan · 2019-03-25T10:35:23Z

@Z-Ianthe How to solve the abouve problems, i put here https://github.com/jianlong-yuan/LovaszSoftmax_tf/tree/master

bermanmaxim · 2019-04-09T13:07:39Z

I don't have time to investivate into tensorflow issues for now but I am at least reopening the issue.

bermanmaxim added the enhancement New feature or request label May 17, 2018

bermanmaxim added tensorflow and removed enhancement New feature or request labels May 19, 2018

bermanmaxim changed the title ~~training is very slow~~ Very slow inference in tensorflow May 19, 2018

bermanmaxim added the contributions-welcome label May 24, 2018

jianlong-yuan mentioned this issue Jun 2, 2018

[Bug] Very slow operation of Cumsum in tensorflow 1.6 tensorflow/tensorflow#19570

Closed

jianlong-yuan closed this as completed Jun 15, 2018

bermanmaxim reopened this Sep 14, 2018

bermanmaxim mentioned this issue Sep 14, 2018

About the slow speed on tensorflow #9

Closed

bermanmaxim mentioned this issue Nov 15, 2018

plug'n play implentation for tensorflow/keras #12

Closed

bermanmaxim added a commit that referenced this issue Nov 19, 2018

Mention TF master for cumsum; closes issue #6

3ed8c94

bermanmaxim closed this as completed Nov 19, 2018

bermanmaxim reopened this Apr 9, 2019

Xreki mentioned this issue Mar 16, 2020

Add script file to benchmark cumsum. PaddlePaddle/benchmark#334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow inference in tensorflow #6

Very slow inference in tensorflow #6

jianlong-yuan commented May 17, 2018

bermanmaxim commented May 17, 2018

jianlong-yuan commented May 17, 2018

bermanmaxim commented May 19, 2018 •

edited

Loading

bermanmaxim commented May 19, 2018 •

edited

Loading

bermanmaxim commented May 24, 2018 •

edited

Loading

ekelsen commented Nov 17, 2018

ben2789 commented Nov 18, 2018

ekelsen commented Nov 18, 2018

bermanmaxim commented Nov 19, 2018

stillwaterman commented Jan 15, 2019 •

edited

Loading

ben2789 commented Jan 15, 2019

Z-Ianthe commented Mar 25, 2019

jianlong-yuan commented Mar 25, 2019

bermanmaxim commented Apr 9, 2019

Very slow inference in tensorflow #6

Very slow inference in tensorflow #6

Comments

jianlong-yuan commented May 17, 2018

bermanmaxim commented May 17, 2018

jianlong-yuan commented May 17, 2018

bermanmaxim commented May 19, 2018 • edited Loading

bermanmaxim commented May 19, 2018 • edited Loading

bermanmaxim commented May 24, 2018 • edited Loading

ekelsen commented Nov 17, 2018

ben2789 commented Nov 18, 2018

ekelsen commented Nov 18, 2018

bermanmaxim commented Nov 19, 2018

stillwaterman commented Jan 15, 2019 • edited Loading

ben2789 commented Jan 15, 2019

Z-Ianthe commented Mar 25, 2019

jianlong-yuan commented Mar 25, 2019

bermanmaxim commented Apr 9, 2019

bermanmaxim commented May 19, 2018 •

edited

Loading

bermanmaxim commented May 19, 2018 •

edited

Loading

bermanmaxim commented May 24, 2018 •

edited

Loading

stillwaterman commented Jan 15, 2019 •

edited

Loading