-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log loadbalancing info on rank 0 #448
Conversation
jenkins build this please |
Interesting approach. Do we need the sum?
I would have used `cc.gather(&num_cells, num_cells.data(), 1, 0);`
BTW: Is it me or is github not correctly working, currently?
I had problems logging in (HTTP 500) and cannot comment on the diff online.
|
There does seem to be something that's not quite right. The GitHub site status page also suggests similar experience. |
I'm just going to rerun the build test. The earlier test, though successful, did not propagate the successful status back to the PR. |
jenkins build this please |
thanks for the code review. I have fixed the typo. I will leave it up to Atgeirr to decide if I should change more. |
I think you are right, in that gather() is perhaps more like how it is usually done. It is interesting though that sum() is simpler to understand (do not need the 1, 0 arguments). Perhaps "single item from each -> array on root/all" overloads of gather()/allgather() could be added to the collective communication class for cases like this. As is I think that using sum() is fine here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, asking for minor fixes only.
Ok, squash the commits and we are done! |
9cd3878
to
1a3860b
Compare
jenkins build this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned out there were a few more things with the last changes.
opm/grid/cpgrid/CpGrid.cpp
Outdated
str += "----------------\n"; | ||
Opm::OpmLog::info(str); | ||
} | ||
for(const auto& cc_num_cells: nc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must put nc before cc_num_cells.
Also some nitpicks: space after for, space around colon, and move the definition of cc_num_cells to line 251 instead of 100 lines above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I think that formatting should be handled automatically via some hooks. |
removed unused variable my_num aborting all processes if at least one process has zero cells
fc6f756
to
bca8c36
Compare
jenkins build this please |
It appears you (currently?) don't have the ability to launch Jenkins build tests. Don't worry. I'll do it for you. |
jenkins build this please |
Thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, in that gather() is perhaps more like how it is usually done. It is interesting though that sum() is simpler to understand (do not need the 1, 0 arguments).
1 is the size and I think you are passing the size of the array to sum, too. In addition you need to tell it where to gather. That is the 0.
As is I think that using sum() is fine here.
I don't.
It is an all-to-all communication versus an all-to-one communication. That means that it sends twice as many message as gather (that might depend on the implementation of course). In addition, each message has size P
, the number of processes, versus size 1 for the gather.
Point taken, I am convinced. Let's use gather(). |
At the moment, I am using sum to throw an exception on all processes, if one process has zero cells. What I mean are these lines: for (const auto& nc : cc_num_cells) {
if (nc == 0) {
throw std::runtime_error("At least one process has zero cells. Aborting.");
}
} |
Was that throw there before and needed? Somehow I thought the bug with one process having no cells has been fixed somewhen. Am I mistaking? |
Yes, the check and throw was already there. I think adding it was the fix, it used to just crash. |
ok. then let us merge like it is, but please open issue, that we should retest
whether there is still a problem with zero elements on one process and if not
remove the sum
|
I think this was observed with MPI but not Zoltan, because then cartesian division of the domain is used. Of course, running a 3-cell case on 4 processes should also trigger it, Zoltan or not... |
Issue made as requested, I'll then merge. |
load balancing info is collected on rank 0 and logged with OpmLog
also removed unused variable my_num