dcr: too many users of an equivalence set #1125

rohany · 2021-08-09T16:07:10Z

I'm seeing these logs:

2741087.out:[192 - 20143958f8b0]  154.205468 {4}{runtime}: [warning 1098] LEGION WARNING: Internal runtime performance warning: equivalence set e00000000001d1b of region (1283,256,3) has 64 different users which is the same as the sampling rate of 64. Region requirement 3 of operation task_5 (UID 155584) triggered this warning. Please report this application use case to the Legion developers mailing list. (from file /g/g15/yadav2/taco/legion/legion/runtime/legion/legion_analysis.cc:11156)

Based on our conversations about this from last time, I have a pretty clear diagnosis of why this behavior is occuring. I'm trying to implement the algorithm here: https://ieeexplore.ieee.org/document/8425209, which performs a tensor contraction of the following form: A(i, l) = B(i, j, k) * C(j, l) * D(k, l). At a high level, the algorithm creates a 3-d processor grid, and partitions the B tensor onto each processor in the grid. Next, it partitions the A, C and D matrices into rows, and places them on different axes of the processor grid. The algorithm proceeds with a 3-d index launch over the grid, where each processor in a slice in the i dimension of the processor grid receives a piece of A, each j slice receives a piece of C, and each k slice receives a piece of D.

At 256 nodes, the processor cube I'm running on is 8x8x4. A slice in the k dimension has 64 processors in it, so if a single piece of a region is replicated among those 64 nodes then it seems like the equivalence set for that region will hit the 64 different users cap.

Let me know if my analysis sounds right here. I'm not sure there's anything that can be changed in my code here -- it's a flat index launch with tight bounds on the subregions that each task accesses. There probably isn't an easy fix here, but I thought to report the use case.

The text was updated successfully, but these errors were encountered:

lightsighter · 2021-08-18T09:48:39Z

That analysis looks accurate and will be resolved in the future by choosing to use collective instances, which will also permit the runtime to replicate the equivalence set meta-data to avoid unnecessary communication.

lightsighter self-assigned this Aug 18, 2021

rohany mentioned this issue Aug 18, 2021

Legion Issues Tracker rohany/taco#27

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dcr: too many users of an equivalence set #1125

dcr: too many users of an equivalence set #1125

rohany commented Aug 9, 2021

lightsighter commented Aug 18, 2021

dcr: too many users of an equivalence set #1125

dcr: too many users of an equivalence set #1125

Comments

rohany commented Aug 9, 2021

lightsighter commented Aug 18, 2021