Using 128 threads by default for cuda kernels #599

coreylowman · 2023-03-22T14:38:32Z

Resolves #526

This shaves off a bit of time during forward & backward pass by allowing cuda to split operations across multiple groups.

Shaves off a millisecond from conv2d bench, and when benchmarking with https://github.com/coreylowman/image-classification on batch size 64, the forward time goes from approximately 36ms -> 28ms, and backward from 60ms -> 51ms

#526 Adding launch_cfg which uses 128 threads by default

4c328f5

coreylowman merged commit d0bdc75 into main Mar 22, 2023

coreylowman deleted the 526-launch-cfg branch March 22, 2023 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using 128 threads by default for cuda kernels #599

Using 128 threads by default for cuda kernels #599

coreylowman commented Mar 22, 2023

Using 128 threads by default for cuda kernels #599

Using 128 threads by default for cuda kernels #599

Conversation

coreylowman commented Mar 22, 2023