You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For applications that send many small messages, aggregation is critical to achieve high performance on commodity networks and improve performance on HPC networks as well. Copy aggregators are used extensively in Arkouda to improve small message performance. As a rough guideline when sending highly concurrent small (8-byte) messages:
Ethernet and non-HPC networks: generally have very low small messages rates. Aggregation can provide ~5000x speedup
InfiniBand: Chapel maps poorly to InfiniBand in this regard (Improve gasnet-ibv performance #14438 -- "Improve Fine-grained Comm Performance"). So while Infiniband hardware has decent small message rates Chapel over Infiniband does not. Aggregation can provide a ~1000x speedup
Cray Aries: Chapel maps well to Aries, which has high small messages rates, but there is still a 2-3x speedup from using aggregation
Arkouda only has copy aggregators (assignment between trivially copyable types). They must be created on a per-task basis and you have to specify if LHS (destination) or RHS (source) is remote. We would like to add aggregation to the standard library to be available for all users, but as part of that effort would like to improve the ergonomics and support arbitrary aggregators, not just copy aggregators.
Example Usage:
A copy of the aggregators was added to the test directory in #16726. For users who want to experiment. From https://github.com/chapel-lang/chapel/tree/master/test/studies/bale/aggregation, copy AggregationPrimitives.chpl and CopyAggregation.chpl and see ig.chpl (indexgather) for an example of how to use them. For a SrcAggregator (Src / RHS is remote) aggregation would look something like:
use BlockDist, Random, CopyAggregation;
const numTasks =numLocales*here.maxTaskPar;
configconst N =1000000; // number of updates per taskconfigconst M =10000; // number of entries in the table per taskconst numUpdates = N * numTasks;
const tableSize = M * numTasks;
proc main() {
const D = newBlockDom(0..#tableSize);
var A:[D]int= D;
const UpdatesDom = newBlockDom(0..#numUpdates);
var Rindex:[UpdatesDom]int;
fillRandom(Rindex, 208);
Rindex = mod(Rindex, tableSize);
var tmp:[UpdatesDom]int;
// Unaggregatedforall (t, r) inzip (tmp, Rindex) do
t = A[r];
// Aggregatedforall (t, r) inzip (tmp, Rindex) with (var agg =new SrcAggregator(int)) do
agg.copy(t, A[r]);
}
More info:
For a high level overview of existing aggregation efforts in arkouda see:
A primary difference between this work and that effort is that these aggregators are created a per-task basis so there's no contention between competing tasks, which is important for performance.
The text was updated successfully, but these errors were encountered:
#17657 proposes adding a Communication module. The user-facing aggregation should fit well to that interface when we add them. Under that issue, I proposed a direction that is slightly different than what's outlined in the OP, where instead of
forall (t, r) inzip (tmp, Rindex) with (var agg =new SrcAggregator(int)) do
agg.copy(t, A[r]);
we'd have
forall (t, r) inzip (tmp, Rindex) with (var agg =new SrcAggregator(int)) do
copy(t, A[r], aggregator=agg);
I don't think too strongly about this, but I think it'd be interesting. And will probably feel more unified with the copy function proposed in that issue.
Migrate copy aggregators to a package module
[reviewed by @bradcray and @e-kayrakli]
We've long wanted to make aggregators user-facing (#16963), but haven't
made much progress on that. Longer term we want better names, support
for third-party operations, arbitrary user-defined operations, and
probably some other things, but in the short term this just exposes the
implementation we have now and adds a short module doc with examples.
ClosesCray/chapel-private#3178
For applications that send many small messages, aggregation is critical to achieve high performance on commodity networks and improve performance on HPC networks as well. Copy aggregators are used extensively in Arkouda to improve small message performance. As a rough guideline when sending highly concurrent small (8-byte) messages:
Arkouda only has copy aggregators (assignment between trivially copyable types). They must be created on a per-task basis and you have to specify if LHS (destination) or RHS (source) is remote. We would like to add aggregation to the standard library to be available for all users, but as part of that effort would like to improve the ergonomics and support arbitrary aggregators, not just copy aggregators.
Example Usage:
A copy of the aggregators was added to the test directory in #16726. For users who want to experiment. From https://github.com/chapel-lang/chapel/tree/master/test/studies/bale/aggregation, copy AggregationPrimitives.chpl and CopyAggregation.chpl and see ig.chpl (indexgather) for an example of how to use them. For a SrcAggregator (Src / RHS is remote) aggregation would look something like:
More info:
For a high level overview of existing aggregation efforts in arkouda see:
And for a more detailed history of the code in Arkouda see:
For providing arbitrary aggregators I expect to draw upon previous work from CAL (Chapel Aggregation Library):
A primary difference between this work and that effort is that these aggregators are created a per-task basis so there's no contention between competing tasks, which is important for performance.
The text was updated successfully, but these errors were encountered: