Figure out a better number of threads to launch kernels with #526

coreylowman · 2023-03-07T02:56:49Z

Apparently always using 1024 (the max) is not the best, because it could overload threads. This is what cudarc currently does.

For example doing a small change to using 128, I see a slight boost in speedup for a small resnet model.

What are ways we could improve this?

Additionally, could some kernels be improved by using 2d or 3d block/grid dims?

coreylowman added gpu Related to GPU support optimization labels Mar 7, 2023

coreylowman mentioned this issue Mar 16, 2023

Next release tracking issue #577

Closed

11 tasks

coreylowman added a commit that referenced this issue Mar 22, 2023

#526 Adding launch_cfg which uses 128 threads by default

4c328f5

coreylowman mentioned this issue Mar 22, 2023

Using 128 threads by default for cuda kernels #599

Merged

coreylowman closed this as completed in #599 Mar 22, 2023

coreylowman added a commit that referenced this issue Mar 22, 2023

#526 Adding launch_cfg which uses 128 threads by default (#599)

d0bdc75

Provide feedback