Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic tracing TODO #407

Open
7 of 27 tasks
magnatelee opened this issue Jul 9, 2018 · 6 comments
Open
7 of 27 tasks

Dynamic tracing TODO #407

magnatelee opened this issue Jul 9, 2018 · 6 comments
Assignees
Labels
best effort indicates the milestone tag for an issue is a goal rather than a commitment enhancement Legion Issues pertaining to Legion

Comments

@magnatelee
Copy link
Contributor

magnatelee commented Jul 9, 2018

Here is the list of features that are missing and will likely be implemented in the current dynamic tracing. I'll check off the boxes when I add them to the code.

  • Integration with dynamic control replication
  • Remotely mapped tasks (for now the runtime will raise an error if you have them in a trace)
  • Index operations (e.g. IndexCopyOp and IndexFillOp)
  • TimingOp
  • Fill operations issued on restricted instances
  • Execution fences
  • Close operations mapped to physical instances
  • Gather/scatter copies
  • Predication (at least the runtime should reject a trace having predicated operations): see test case here: https://gitlab.com/StanfordLegion/legion/-/blob/master/language/tests/regent/run_pass/optimize_tracing_invalidate2.rg
  • AttachOp/DetachOp
  • Holding a reference to the region node in each instruction object (to handle the case where the region is deleted in the middle of a trace)
  • Checking restrictions on physical instances in the precondition check (to avoid replaying a recording from an execution with restrictions in a non-restricted context)
  • Precondition checks that use only physical states
  • Supporting non-replayable templates (and their dynamic extensions)
  • Tracing for atomic coherence: current tracing does not capture realm reservation acquire/releases
  • Future arguments passed between operations, both within a trace and across traces, which I think are handled "correctly" right now because they aren't traced at all and legion is still managing them, but should change if we wanted to trace them.
  • Restricted coherence on index space tasks are terribly broken, like really really broken and we will silently get wrong answers if people try to use it.
  • Dead code elimination for user event triggers with no preconditions.
  • Safety checks similar to dynamic control replication that will hash all the arguments passed to everything in the trace and ensure that the user isn't violating the conditions of tracing.
  • Compute preimage/image to compute more precise sets of source/destination instances on indirect gather/scatter copies.
  • Force copy aggregation across all equivalence sets on all nodes instead of just across all equivalence sets on a local node when performing trace capture.
  • Add ability to "forget" a trace if the application knows it's never going to use it again (from @opensdh)
  • Detect when two operations are independent and have different priorities and issue them to Realm in the appropriate order to ensure execution consistent with their priorities (@bandokihiro). @streichler also thinks we might be to get this functionality automatically by lowering to Realm graphs.
  • The tracing code is not currently general in it's capture mechanism. We need to check all the find_event calls because some of them are overly strict. Preconditions can come from outside the trace anywhere, not just in calls to merge events.

These features need some discussion before we decide to add them to the code.

  • Map operations (just to handle the case when the runtime issue them on behalf of the user)
  • Traces that end with reduction tasks (Regent PENNANT generate them when static control replication is turned off)
  • Must epoch tasks
@magnatelee magnatelee self-assigned this Jul 9, 2018
@lightsighter
Copy link
Contributor

This is not actually dead code:
https://gitlab.com/StanfordLegion/legion/blob/master/runtime/legion/legion_views.cc#L7933
It happens when we do an explicit copy operation and we virtual map the source region to construct a composite view and that composite view has reduction instances in it. My guess is that we don't have a test case for this scenario, although it shouldn't be too hard to construct one.

@lightsighter
Copy link
Contributor

I had to disable the memoize tests for circuit_sparse.rg and pennant_fast.rg in the nopaint branch because they both hit the "dead code" assertion from the previous comment.

@magnatelee magnatelee added the planned Feature/fix to be actively worked on - needs release target label Oct 3, 2019
@magnatelee
Copy link
Contributor Author

I checked the items that I know are handled by either me or Mike. I propose we revisit the list and create a new issue per outstanding item.

@magnatelee magnatelee added this to the 20.06 milestone Oct 3, 2019
@magnatelee magnatelee added the Legion Issues pertaining to Legion label Oct 3, 2019
@magnatelee magnatelee modified the milestones: 20.06, 20.09 Jun 8, 2020
@magnatelee magnatelee added best effort indicates the milestone tag for an issue is a goal rather than a commitment and removed planned Feature/fix to be actively worked on - needs release target labels Jun 8, 2020
@streichler streichler modified the milestones: 20.09, 20.12 Oct 1, 2020
@streichler streichler modified the milestones: 20.12, 21.03 Dec 27, 2020
@alexaiken
Copy link

Per Mike's suggestion, I'm adding a comment. We run into non-idempotent traces regularly with S3D when making small extensions (e.g., adding a new boundary condition). I don't think we can expect users to understand and debug this, so I'd suggest a mode under mapper control to allow non-repayable traces to be replayed by issuing copies to satisfy the precondition if necessary. There is a question whether the system would just pick some instances to move, or whether we should have mapper calls for establishing preconditions to pick which of multiple instances to use.

@rohany
Copy link
Contributor

rohany commented Nov 8, 2023

@lightsighter and I have been thinking about the composability of programs that use tracing (especially in a high-level context, such as when user programs in cuNumeric might try and use tracing). There seem to be two main problems around tracing in this area:

  1. If tracing annotations are added by the user, then when composing code, it's possible for a user to try to trace a loop that is calling another function that is tracing some internal part of the code, behind an API call.
  2. End users are generally not knowledgeable enough to put traces around code / understand when tracing is possible / understand when changes inside their loop structure may invalidate traces.

We've been thinking so far about two solutions to these problems.

The first solution is supporting nested traces, which fixes problem 1, but doesn't address problem 2. Supporting nested traces would allow for arbitrary composition of codes that use tracing, since composing two codes that use tracing corresponds to just a trace record/replay with an existing record/replay. I don't know that much about the implementation of tracing, but Mike says that something like this would not be the hardest thing to do.

The second solution is to move towards more automation inside Legion, where we automatically detect when programs are replaying the same sequence of operations, and replay traces when we identify memoizable operation sequences. This solution solves both problems 1 and 2. Mike is already planning on building infrastructure that would help with identifying when repeated sequences of operations occur. The main difficult portion in this aspect is understanding what to do when the runtime decides that a trace should be replayed, but then the application's operation stream diverges from what the runtime predicts will happen. A potential solution inspired by JIT compilers right now is the following:

  1. Given an operation stream $O_1, \ldots, O_n$, annotate the memoized trace graph with frontiers for each operation $O_i$
  2. When the application issues operation $O_j$ that diverges from the memoized operation stream, replay the memoized graph until operation $O_{j-1}$, which we know from the frontiers created in the prior step.
    Maintaining this information through optimizations on the trace graph seems possible, except for one optimization (I forget the name of this one, do you remember @lightsighter).

There is a potential to take this further, where we can push the granularity of memoization down to the operation level, where if the preconditions for an individual operation are satisfied, we could skip / replay the physical analysis for that operation. Since checking preconditions here is the expensive part, where if we see a prefix of some operations that we have seen before, followed by some operations that we haven't, we could replay the analysis of operations in the prefix, and just have the physical analysis effects (equivalence set updates etc) replayed on the final operation in the stream, so that everything after the prefix goes through the pipeline normally. Something like this is reminiscent of what @magnatelee wanted to see in Legate, where if the legate runtime is consistently making the same decisions, analysis costs should go down.

The second solution (and extension to it) are more forward looking than the first, but at the same time, we don't have any programs right now that would not be handled by nested tracing but would be handled by automatic tracing.

@lightsighter
Copy link
Contributor

Maintaining this information through optimizations on the trace graph seems possible, except for one optimization (I forget the name of this one, do you remember @lightsighter).

Dead code elimination. Just because it is dead code in an entire trace, doesn't mean it is actually dead when being replayed with different downstream operations.

I think we don't necessarily need to pick between the two approaches. The important thing is to create a framework for tracing that allows us to explore the trade-offs. The current implementation is too rigid for that. I think we can make the current implementation work just as efficiently with a more "operation-based" implementation that looks backwards at the operations that came before it and infer whether it can be replayed or needs to redo its analysis. If we do that then I think we can explore both nested tracing as well as dynamic discovery of traces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
best effort indicates the milestone tag for an issue is a goal rather than a commitment enhancement Legion Issues pertaining to Legion
Projects
None yet
Development

No branches or pull requests

6 participants