-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterized abstractions that work on CPU and GPU architectures. #147
Conversation
@Assert might get compiled out at certain optimization levels. It's really only for debugging.
Codecov Report
@@ Coverage Diff @@
## master #147 +/- ##
==========================================
+ Coverage 66.03% 69.92% +3.89%
==========================================
Files 19 18 -1
Lines 630 645 +15
==========================================
+ Hits 416 451 +35
+ Misses 214 194 -20
Continue to review full report at Codecov.
|
This removes the dependence on ::ModelMetadata and moves closer to an isbitstype Field abstraction. But will have to do a lot of refactoring now.
Skipping the field operations for now, we don't even use them...
Also getting rid of redundant test_ prefixes for all test functions.
Marking this PR ready for review even though it is not ready so the GitLab CI pipeline will start checking this branch. |
Nuked the two separate time stepping kernels.
I could keep cleaning things up but I think I've done enough to close #59 (and 6 other issues!). The only big feature missing is turning our Note that tests will fail on dev/nightly builds (Julia 1.2) because something changed which broke Cassette (which GPUifyLoops depends on). Will release v0.5.0 once this is merged. |
So I just ran the performance benchmarks again (different CPU but same GPU) and the model has gotten slower since the last time we ran benchmarks (PR #116), by about 30~40%. Details pasted below but for the larger GPU models: 256^3 Float32 it went up from 85.9 ms to 119 ms, and for 128^3 Float32 it went up from 7.70 ms to 10.1 ms. These are wall clock time per model iteration. To be fair, we had no boundary conditions API last time we benchmarked. I won't worry about this for now though as we haven't begun to optimize for performance.
|
Pushed a fix for that this morning, and also tagged GPUifyLoops v0.2.1 |
Awesome, will update packages. |
Hmm... if the boundary conditions are default, the only thing that happens is a function is called which is something like |
That's true, there's been a ton of refactoring so it's probably more than just one thing that's changed (I can't say for sure that my changes didn't slow things down). No point worrying too much until we get around to profiling (#162). |
…ions Parameterized abstractions that work on CPU and GPU architectures. Former-commit-id: 1594d9e
This PR refactors many of the existing abstractions so that they can be passed to GPU kernels #59. It massively cleans up the operators and time-stepping loop, allowing us to keep using beautiful abstractions even in GPU kernels.
Note that after this PR is merged,
CUDAnative#master
andGPUifyLoops#vc/interactive
will be required until the respective fixes are merged into an updated release. I will update theProject.toml
when we can move to a new release.Once this is merged I will also release v0.5.0.
When completed and merged this PR will:
Resolve #19
Resolve #59
Resolve #60
Resolve #66
Resolve #74
Resolve #133
Resolve #153