Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcr: too many users of an equivalence set #1125

Open
rohany opened this issue Aug 9, 2021 · 1 comment
Open

dcr: too many users of an equivalence set #1125

rohany opened this issue Aug 9, 2021 · 1 comment
Assignees

Comments

@rohany
Copy link
Contributor

rohany commented Aug 9, 2021

I'm seeing these logs:

2741087.out:[192 - 20143958f8b0]  154.205468 {4}{runtime}: [warning 1098] LEGION WARNING: Internal runtime performance warning: equivalence set e00000000001d1b of region (1283,256,3) has 64 different users which is the same as the sampling rate of 64. Region requirement 3 of operation task_5 (UID 155584) triggered this warning. Please report this application use case to the Legion developers mailing list. (from file /g/g15/yadav2/taco/legion/legion/runtime/legion/legion_analysis.cc:11156)

Based on our conversations about this from last time, I have a pretty clear diagnosis of why this behavior is occuring. I'm trying to implement the algorithm here: https://ieeexplore.ieee.org/document/8425209, which performs a tensor contraction of the following form: A(i, l) = B(i, j, k) * C(j, l) * D(k, l). At a high level, the algorithm creates a 3-d processor grid, and partitions the B tensor onto each processor in the grid. Next, it partitions the A, C and D matrices into rows, and places them on different axes of the processor grid. The algorithm proceeds with a 3-d index launch over the grid, where each processor in a slice in the i dimension of the processor grid receives a piece of A, each j slice receives a piece of C, and each k slice receives a piece of D.

At 256 nodes, the processor cube I'm running on is 8x8x4. A slice in the k dimension has 64 processors in it, so if a single piece of a region is replicated among those 64 nodes then it seems like the equivalence set for that region will hit the 64 different users cap.

Let me know if my analysis sounds right here. I'm not sure there's anything that can be changed in my code here -- it's a flat index launch with tight bounds on the subregions that each task accesses. There probably isn't an easy fix here, but I thought to report the use case.

@lightsighter
Copy link
Contributor

That analysis looks accurate and will be resolved in the future by choosing to use collective instances, which will also permit the runtime to replicate the equivalence set meta-data to avoid unnecessary communication.

@lightsighter lightsighter self-assigned this Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants