Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native GPU support #65

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open

Native GPU support #65

wants to merge 42 commits into from

Conversation

MilesCranmer
Copy link
Member

@MilesCranmer MilesCranmer commented Feb 3, 2024

This PR adds native GPU support. This is a single CUDA kernel which evaluates an expression directly on the GPU!

This also allows one to evaluate multiple trees at once (which helps can save time in the CUDA kernel).

graphviz

TODO:

  • See whether CUDA.@captured helps at al
    • Nope...
  • Explore whether manually manipulating CUDA streams will help at all
  • See whether I need to use @sync anywhere
  • Consider adding Optim support now or later

Copy link
Contributor

github-actions bot commented Feb 3, 2024

Benchmark Results

master 9f49619... master/9f49619e658053...
eval/ComplexF32/evaluation 7.48 ± 0.48 ms 7.54 ± 0.44 ms 0.993
eval/ComplexF64/evaluation 9.83 ± 0.73 ms 9.85 ± 0.74 ms 0.998
eval/Float32/derivative 10.9 ± 1.8 ms 10.9 ± 2 ms 1
eval/Float32/derivative_turbo 11.3 ± 2.5 ms 11.1 ± 2.4 ms 1.01
eval/Float32/evaluation 2.78 ± 0.21 ms 2.8 ± 0.22 ms 0.993
eval/Float32/evaluation_bumper 0.582 ± 0.015 ms 0.588 ± 0.015 ms 0.99
eval/Float32/evaluation_turbo 0.721 ± 0.038 ms 0.718 ± 0.038 ms 1
eval/Float32/evaluation_turbo_bumper 0.581 ± 0.013 ms 0.586 ± 0.015 ms 0.991
eval/Float64/derivative 14.8 ± 0.9 ms 15.2 ± 1.1 ms 0.973
eval/Float64/derivative_turbo 15 ± 0.84 ms 15.3 ± 1 ms 0.985
eval/Float64/evaluation 2.99 ± 0.25 ms 2.99 ± 0.24 ms 0.999
eval/Float64/evaluation_bumper 1.3 ± 0.046 ms 1.3 ± 0.046 ms 1
eval/Float64/evaluation_turbo 1.25 ± 0.073 ms 1.24 ± 0.075 ms 1.01
eval/Float64/evaluation_turbo_bumper 1.31 ± 0.048 ms 1.29 ± 0.047 ms 1.01
utils/combine_operators/break_sharing 0.0387 ± 0.00066 ms 0.0394 ± 0.00075 ms 0.984
utils/convert/break_sharing 23.3 ± 1.1 μs 22.9 ± 1.1 μs 1.02
utils/convert/preserve_sharing 0.125 ± 0.0035 ms 0.125 ± 0.0035 ms 1
utils/copy/break_sharing 23.8 ± 1.1 μs 23.7 ± 1.1 μs 1.01
utils/copy/preserve_sharing 0.128 ± 0.0043 ms 0.127 ± 0.0043 ms 1.01
utils/count_constant_nodes/break_sharing 9.05 ± 0.13 μs 9.07 ± 0.14 μs 0.998
utils/count_constant_nodes/preserve_sharing 0.111 ± 0.0028 ms 0.112 ± 0.0034 ms 0.988
utils/count_depth/break_sharing 12.9 ± 0.33 μs 13.2 ± 0.36 μs 0.976
utils/count_nodes/break_sharing 8.41 ± 0.12 μs 8.39 ± 0.13 μs 1
utils/count_nodes/preserve_sharing 0.111 ± 0.0032 ms 0.114 ± 0.004 ms 0.974
utils/get_set_constants!/break_sharing 0.0349 ± 0.0014 ms 0.0345 ± 0.0015 ms 1.01
utils/get_set_constants!/preserve_sharing 0.226 ± 0.0059 ms 0.231 ± 0.0064 ms 0.979
utils/get_set_constants_parametric 0.0496 ± 0.0026 ms 0.0489 ± 0.0026 ms 1.01
utils/has_constants/break_sharing 4.27 ± 0.066 μs 4.18 ± 0.065 μs 1.02
utils/has_operators/break_sharing 1.96 ± 0.036 μs 1.94 ± 0.024 μs 1.01
utils/hash/break_sharing 25.4 ± 0.54 μs 25.5 ± 0.48 μs 0.999
utils/hash/preserve_sharing 0.133 ± 0.004 ms 0.136 ± 0.0042 ms 0.982
utils/index_constant_nodes/break_sharing 22.7 ± 0.79 μs 23.3 ± 0.8 μs 0.975
utils/index_constant_nodes/preserve_sharing 0.127 ± 0.0042 ms 0.128 ± 0.004 ms 0.988
utils/is_constant/break_sharing 4.18 ± 0.07 μs 4.13 ± 0.06 μs 1.01
utils/simplify_tree/break_sharing 0.168 ± 0.002 ms 0.174 ± 0.0016 ms 0.967
utils/simplify_tree/preserve_sharing 0.295 ± 0.0059 ms 0.29 ± 0.0061 ms 1.02
utils/string_tree/break_sharing 0.407 ± 0.025 ms 0.407 ± 0.018 ms 1
utils/string_tree/preserve_sharing 0.545 ± 0.027 ms 0.547 ± 0.021 ms 0.998
time_to_load 0.223 ± 0.0015 s 0.235 ± 0.0028 s 0.948

@coveralls
Copy link

coveralls commented Feb 25, 2024

Pull Request Test Coverage Report for Build 8042273246

Details

  • -2 of 137 (98.54%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 94.965%

Changes Missing Coverage Covered Lines Changed/Added Lines %
ext/DynamicExpressionsCUDAExt.jl 78 80 97.5%
Totals Coverage Status
Change from base Build 7996123220: 0.3%
Covered Lines: 1754
Relevant Lines: 1847

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants