-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzer assertion in refine_node (now with concrete mappings) #1663
Comments
The first actual bug in the physical analysis that's been found: |
Not suspicious at all. Depends on a lot on how read-only regions traverse the equivalence set look-up tree and end up making new equivalence sets. |
Confirmed the fix on the original reproducer. |
I'm not sure if this is the same issue or a different one, but in a branch with both
Command line:
Note this a newer commit of the fuzzer: StanfordLegion/fuzzer@3057d03 I would note that it does take longer to find crashes now, and I had to build Legion DEBUG with
|
Is this deterministic? I'm having trouble reproducing it. |
Seems pretty deterministic on my Mac. Keep in mind I modified the runtime so that |
I actually got it in debug mode first try without any optimizations on in Linux. |
Fortunately this is not a real bug. I'm forgetting to tighten a mask at some point, but that only results in a performance issue (which the assertion is flagging) and not a correctness issue. |
Pushed it as another commit to the fix invalidation branch since it falls under the same category: https://gitlab.com/StanfordLegion/legion/-/commit/e81a670057f854c89e0c2b40dd6c9f66b3e6389b |
Yes, it goes away now. |
Here's a new failure mode in the same function:
Fuzzer version: StanfordLegion/fuzzer@a0c55df Command line:
|
Running the fuzzer in debug mode I am no longer able to observe any failures with 10,000 tests on either my Mac or Sapling. |
I spoke too soon. Using a longer trace length, I was able to produce a freeze. This is non-deterministic, so run it in a loop until it hangs. Each non-frozen run should take about 2 seconds (on Sapling).
There is a hung process right now on Backtraces: bt.log A couple minutes later: bt2.log Fuzzer version: StanfordLegion/fuzzer@5018ce3 |
You can kill the process. I see what is going wrong. |
I ran the original reproducer 400 times and it did not freeze. I ran 10,000 tests × 100 ops with no errors and no freezes. I ran 10,000 tests × 1,000 ops with no errors and no freezes. I guess I'll keep fuzzing, but for now this seems pretty solid. |
There's more dimensions still to test:
|
|
I think we'll probably want more variability than that. Mixing precise and imprecise use of regions on the same instances.
Putting a mapping fence after all the tasks have been issued, and then issuing all the deletions after the mapping fence should be sufficient to work around the issue for now. |
Actually just issuing all the deletions at the end is probably sufficient I think. |
Ok, I worked around the issue by just not deleting fields eagerly. With this, and temporarily disabling inner tasks again (to avoid tripping #1668), I am unable to induce any failures with either:
That's on my laptop. I'll try on Sapling in a bit here to be sure. |
Probably time to bust out gcov and see what we're not hitting. |
I hit a new failure on Sapling: PID 2711882 is currently frozen on c0001. Backtrace:
Command line:
Fuzzer version: StanfordLegion/fuzzer@679604e |
Fortunately I already know what that one is. It is an overzealous assertion. I updated it in |
It's much less frequent now but I still hit:
You can take a look at process 3096943 on c0001. Command line (note I had to run 100,000 tests to hit this):
I'm still on a locally modified version of StanfordLegion/fuzzer@679604e that disabled inner tasks (and thus virtual mappings). |
Pull the |
I finished the following without any errors:
|
I'm about half-way through running 1M tests in the configuration above. No failures so far. I'll let it run overnight to be sure, but I think we got this one. |
Completed the following without failures:
|
I think this is different from the bug in #1659, but let me know if it is a duplicate.
Now, there should be no more virtual mappings anywhere in the fuzzer.
I am hitting this assert:
When reproducing, use this commit of the fuzzer: StanfordLegion/fuzzer@3d77d6c
Command to reproduce:
Note that adding
-lg:inorder
makes it stop reproducing, which is highly suspicious.Backtrace:
The text was updated successfully, but these errors were encountered: