Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greatly improve the performance of form_tree when regridding. #443

Merged
merged 1 commit into from May 19, 2023

Conversation

JiakunYan
Copy link
Contributor

This PR changes the HPX launch policy of the dataflow in form_tree from sync to async. This change greatly improves the scalability of Octo-Tiger.

Experiment setting: SDSC Expanse, AMD EPYC 7742 (128 core/node), 32 nodes, max level is 7, stop step is 5, rotating star, HPX LCI parcelport.

Before this change, the execution time is

   Total: 61.0584
   Computation: 42.8371 (70.1575 %)
   Regrid: 34.1035 (55.8538 %)
   Computation + Regrid: 76.9405 (126.011 %)

In particular

checking for refinement
regridding
Regridded tree in 0.048255 seconds
rebalancing 196809 nodes with 172208 leaves
Rebalanced tree in 0.127481 seconds
forming tree connections
32248 amr boundaries
Formed tree in 14.557350 seconds
solving gravity
regrid done in 15.936172 seconds

The time spent on forming tree is 14.5 s.

After the change, the execution time is

   Total: 51.4378
   Computation: 43.717 (84.9899 %)
   Regrid: 15.2881 (29.7216 %)
   Computation + Regrid: 59.0051 (114.712 %)

In particular

regridding
Regridded tree in 0.049329 seconds
rebalancing 196809 nodes with 172208 leaves
Rebalanced tree in 0.120191 seconds
forming tree connections
32248 amr boundaries
Formed tree in 6.408102 seconds
solving gravity
regrid done in 7.681241 seconds

@JiakunYan
Copy link
Contributor Author

JiakunYan commented May 18, 2023

More details:

I also implemented tracing for HPX. The blue bar shows the number of messages sent every 0.1 seconds. The orange line shows the total bytes sent every 0.1 seconds.

Rank 0:
Before the change:

After the change:

Rank 21:
Before the change:

After the change:

Rank 31:
Before the change:

After the change:

Before the change, there are time durations when Octo-tiger is sending almost no messages. With some prints, I found Octo-Tiger was performing “form tree” at that time. There are also small spikes of messages in these “form tree” duration. The time of these spikes changes from rank to rank. Therefore, I think Octo-Tiger is doing poorly in parallelizing the “form tree” task between ranks, and this PR greatly improves it.

@JiakunYan
Copy link
Contributor Author

I also tested with HPX MPI parcelport with max_level=6 (max_level=7 didn't finish within 5 minutes). The "total time" improved from ~14 seconds to 11.7 seconds.

@G-071
Copy link
Member

G-071 commented May 19, 2023

Thanks for all your work on this! I think we can merge this (the one failing test should be unrelated to this PR).

@G-071 G-071 merged commit 50f1325 into STEllAR-GROUP:master May 19, 2023
13 of 14 checks passed
@JiakunYan
Copy link
Contributor Author

@hkaiser Actually I am curious: what does it mean to have a sync launch policy for dataflow? Based on my understanding, the dataflow just creates a thread that will be ready to run once all the input futures are ready?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants