Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out a better number of threads to launch kernels with #526

Closed
coreylowman opened this issue Mar 7, 2023 · 0 comments · Fixed by #599
Closed

Figure out a better number of threads to launch kernels with #526

coreylowman opened this issue Mar 7, 2023 · 0 comments · Fixed by #599
Labels
gpu Related to GPU support optimization

Comments

@coreylowman
Copy link
Owner

Apparently always using 1024 (the max) is not the best, because it could overload threads. This is what cudarc currently does.

For example doing a small change to using 128, I see a slight boost in speedup for a small resnet model.

What are ways we could improve this?

Additionally, could some kernels be improved by using 2d or 3d block/grid dims?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Related to GPU support optimization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant