-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semi-Transparent Buffers #1016
base: master
Are you sure you want to change the base?
Semi-Transparent Buffers #1016
Conversation
…ransparent-buffers
…ransparent-buffers
More fixes Fixing the fixes Part 2 Part 3 Part 4 Part 5
…ransparent-buffers
…ransparent-buffers
…ransparent-buffers
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1016 +/- ##
==========================================
- Coverage 13.80% 13.68% -0.13%
==========================================
Files 268 269 +1
Lines 14998 15110 +112
==========================================
- Hits 2071 2068 -3
- Misses 12927 13042 +115 ☔ View full report in Codecov by Sentry. |
…ransparent-buffers
…ransparent-buffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check everything in detail, but the performance on LUMI increases with this PR and the results are correct 👍
@@ -171,8 +172,11 @@ ProxyOutput runProxy(ProxyConfig config) { | |||
initDataStructuresOnDevice(enableDynamicRupture); | |||
#endif // ACL_DEVICE | |||
|
|||
if (config.verbose) | |||
runtime = new seissol::parallel::runtime::StreamRuntime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared_ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree—but it's probably better IMO to do it in a general proxy refactor? We have multiple of such raw-pointer new statements at the moment; all of them could be eventually deleted
@@ -262,6 +270,8 @@ ProxyOutput runProxy(ProxyConfig config) { | |||
delete m_dynRupTree; | |||
delete m_allocator; | |||
|
|||
delete runtime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared_ptr?
fe434fc
to
2498c15
Compare
2498c15
to
3c7adc6
Compare
Or: No More Mandatory USM.
That's right, we finally switch to pure device buffers; in a first implementation (that is, we still keep many indexing arrays etc...).
Currently, we allocate a buffer twice, once on the host, and once on the device—any initialization is done on the host first; then the buffer is copied over. For output, we transfer the data in the reverse direction. We do so to allow explicit host-device memory, while keeping open the possibility to use CPU and GPU at the same time (especially on systems like Grace Hopper, or the MI300A) for different clusters. At least, if it doesn't turn out that using two instances of SeisSol is better in that case anyways.
Consequently, the performance should improve when using IO on simulations which do not have a shared memory space like Grace Hopper and MI300A. As a bonus, we can now also support AMD GPUs without the xnack functionality (or
HSA_XNACK=0
). That particularly includes in consumer cards, especially the RDNA-based series (since those didn't support shared memory yet, to my knowledge). But it also allows e.g. a more modern ROCm to be used on LUMI.Next, we also remove the synchronization as far as possible and refactor the graph-handling code a bit.
Furthermore, we allow the execution of CPU and GPU at the same time—which is currently only correctly functional with unified memory enabled.