-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up ComputeIndexOwner::Dictionary #8785
Comments
The second point is easily made by replacing: dofs_per_process = (size + n_procs - 1) / n_procs; by: dofs_per_process = std::max((size + n_procs - 1) / n_procs, min_dofs_per_process); But in a second step, we should distributed more smarter the load among nodes/islands... |
The question is how expensive that would be in terms of implementation and in terms of communication. My rationale is that it probably won't matter much even if communication is heavily skewed to the lower ranks: This is a setup routine and we won't need an optimal decision here; point-to-point is cheap also when 90% goes to rank 0 as in that case the global problem is small and not using too many ranks, either. But if you know something that is less than half an hour of work feel free to think about it. |
Essentially, we would need to inroduce unsigned int
dof_to_dict_rank(const types::global_dof_index i)
{
const unsigned int i_group = i / dofs_per_group;
const unsigned int i_local = (i % dofs_per_group) / dofs_per_process;
return i_group * group_size + i_local;
} Note: This took me 5 minutes (and I did not try it out)... Note: I will introduce in one of my next PRs a function to determine the size a compute node. |
There is some infrastructure here: You might want to look for something similar and/or adapt it so we can keep functionality in one place. |
While working on #8772, I noticed two things we should improve for the dictionary class of
ComputeIndexOwner
:reinit()
is slower than it needs to be (and slow enough to be noticed in profiles) because it expands the indices of the locally owned range one by one, and computes the rank one by one. We should do this in terms of intervals. Unfortunately a bit of interval intersection stuff, but still manageable.dofs_per_process
.Related to #8293.
The text was updated successfully, but these errors were encountered: