Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifies converse schedeuler to prioritize NodeGroup messages #3676

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

lvkale
Copy link
Contributor

@lvkale lvkale commented Dec 3, 2022

Modifies converse schedeuler's getNextMessage so nodeGroup messages can run with higher priority over local
closes #3674
As it is, nodeGroup messages are not checked until all local and regular Charm queue (prio Q) messages are checked, which cause issues when the applicaiton is using nodeGroup messages in the hope that some PE will attend to it quickly. The change makes getNextMessage check nodeGroup queue every 2^nodeGrpFreq iterations with high priority in addition to its usual check after exhasuting local queues (except task Q). This commit has not been tested at all. But pusing it to allow others to help me test/fix it.

Laxmikant Kale and others added 2 commits December 3, 2022 09:46
…an run with higher priority over local

As it is, nodeGroup messages are not checked until all local and regular Charm queue (prio Q) messages are
checked, which cause issues when the applicaiton is using nodeGroup messages in the hope that *some* PE will attend to it quickly. The change makes getNextMessage check nodeGroup queue every 2^nodeGrpFreq iterations with
high priority in addition to its usual check after exhasuting local queues (except task Q).
This commit has not been tested at all. But pusing it to allow others to help me test/fix it.
@lvkale
Copy link
Contributor Author

lvkale commented Dec 4, 2022

One thing to worry about is whether this change causes performance degradation by making the scheduler check the nodequeue too often (depend on whether the check is expensive, even for an empty queue, because of locking). It'd be nice if someone were to run a performance regression test.

@lvkale
Copy link
Contributor Author

lvkale commented Dec 13, 2022

@ericjbohm @ericmikida @ZwFink review please.

src/conv-core/convcore.C Outdated Show resolved Hide resolved
src/conv-core/convcore.C Show resolved Hide resolved
src/conv-core/convcore.C Outdated Show resolved Hide resolved
@lvkale
Copy link
Contributor Author

lvkale commented Nov 1, 2023

I thought this was already merged. @ZwFink @ericjbohm .. will you please take a look?

@nilsdeppe
Copy link
Contributor

@lvkale this is the code we discussed during the meeting today

@jbakosi
Copy link
Collaborator

jbakosi commented Dec 11, 2023

@lvkale this is the code we discussed during the meeting today

Sorry for the spam, but where is the meeting usually announced?

@ericjbohm
Copy link
Contributor

Should I consider all the older unresolved comments here as acceptably resolved?

@ericjbohm
Copy link
Contributor

In order for this to be mergeable it should be modified to no longer be a draft and the reviewer comments should be addressed and resolved. @lvkale

@ZwFink ZwFink changed the title Modifies converse schedeuler to prioritize so nodeGroup messages Modifies converse schedeuler to prioritize NodeGroup messages Apr 17, 2024
@ZwFink ZwFink marked this pull request as ready for review April 17, 2024 19:35
@ZwFink ZwFink self-requested a review April 17, 2024 20:03
@ericjbohm
Copy link
Contributor

One thing to worry about is whether this change causes performance degradation by making the scheduler check the nodequeue too often (depend on whether the check is expensive, even for an empty queue, because of locking). It'd be nice if someone were to run a performance regression test.

Was that performance analysis done? If so, what were the results?

src/conv-core/convcore.C Outdated Show resolved Hide resolved
src/conv-core/converse.h Outdated Show resolved Hide resolved
@ZwFink
Copy link
Contributor

ZwFink commented May 3, 2024

One thing to worry about is whether this change causes performance degradation by making the scheduler check the nodequeue too often (depend on whether the check is expensive, even for an empty queue, because of locking). It'd be nice if someone were to run a performance regression test.

Was that performance analysis done? If so, what were the results?

Results of a benchmark that has one chare sending 10 messages to itself 500,000 times (SMP mode, higher is better). The entry method just increments a counter to track whether it should send another 10 messages to itself, so it should stress the queueing system. This result shows an overhead of $0.44$% averaged over 50 runs. This translates to the overhead of a few $ns$ per message. Any realistic, small tasks are on the order of a few $us$ per message. I think we should pay more attention to the fact that the scheduler can only churn through ~1.2million local entry methods per second than the addition of a few nanoseconds of overhead.
boxplot_comparison

@nilsdeppe
Copy link
Contributor

Another possibly interesting point is that we run with 6 virtual nodes per physical node on our 192-cores-per-node machine because of better communication. This seems counterintuitive with the model I think Sanjay described that intranode communication shouldn't be affected by the comm thread. Maybe there's a lot of locking going on?

src/conv-core/convcore.C Outdated Show resolved Hide resolved
s->iter++;

#if CMK_NODE_QUEUE_AVAILABLE
// we use nodeGrpFreq == 0 to mean
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

src/conv-core/converse.h Outdated Show resolved Hide resolved
@@ -1720,7 +1722,19 @@ void CsdSchedulerState_new(CsdSchedulerState_t *s)
*/
void *CsdNextMessage(CsdSchedulerState_t *s) {
void *msg;
if((*(s->localCounter))-- >0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code should be below the localCounter branch, right above the CmiGetNonLocal() call. The default CsdLocalMax is 0 so normally it won't matter, but if the user says to prioritize local messages then they should come first.

@@ -1720,7 +1722,19 @@ void CsdSchedulerState_new(CsdSchedulerState_t *s)
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long expository comment preceeding the CsdNextMessage definition should be updated to match the new behavior, and any other behavior that it doesn't describe, such as csdLocalMax querying the PE onnode FIFO (localQ) and scheduler queues first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, break up the exposition so that the explanations are adjacent to the code they are intended to describe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nodegroup messages are too low a priority
8 participants