Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legion issues more frames than requested by mapper #1680

Closed
elliottslaughter opened this issue Apr 6, 2024 · 3 comments
Closed

Legion issues more frames than requested by mapper #1680

elliottslaughter opened this issue Apr 6, 2024 · 3 comments
Assignees

Comments

@elliottslaughter
Copy link
Contributor

elliottslaughter commented Apr 6, 2024

Here is a reproducer for an application that uses frames, along with a mapper that uses configure_context to control the number of outstanding frames.

The relevant code snippets are the application loop, which issues one frame per iteration:

  for i = 0, 20 do
    format.println("task main iter {}", i)
    x = f(x)
    regentlib.c.legion_runtime_complete_frame(__runtime(), __context())
  end

And the mapper, which sets 1 min and 2 max frames outstanding:

void TestFramesMapper::configure_context(MapperContext               ctx,
                                         const Task&                 task,
                                               ContextConfigOutput&  output) {
  output.min_frames_to_schedule = 1;
  output.max_outstanding_frames = 2;
}

The full code is here: https://gist.github.com/elliottslaughter/4497a197cb51b9ba0afca57313f3531c

Run it like this:

$ ./regent.py test_frames.rg
task main iter 0
task main iter 1
task main iter 2
task main iter 3
task f(0) going to sleep
task f(0) waking up, returning
task f(1) going to sleep
task f(1) waking up, returning
task f(2) going to sleep
task f(2) waking up, returning
task main iter 4
task f(3) going to sleep
task f(3) waking up, returning
task main iter 5
task f(4) going to sleep
task f(4) waking up, returning
task main iter 6
task f(5) going to sleep
task f(5) waking up, returning
task main iter 7
...

You can clearly see that the application immediately schedules 4 frames, contrary to the request of the mapper. We then complete 3 frames and issue 1 more, bringing us to the correct number 2. We then complete one, issue one, and repeat for the rest of the application (as expected).

@elliottslaughter
Copy link
Contributor Author

elliottslaughter commented Apr 6, 2024

While this minimal reproducer eventually hits the expected steady state, I want to mention that my real application does not.

Click here to expand a trace for the full application
main: before timestep 0
main: complete frame (end of timestep 0), group of 1
main: before timestep 1
main: complete frame (end of timestep 1), group of 1
main: before timestep 2
main: complete frame (end of timestep 2), group of 1
main: before timestep 3
main: complete frame (end of timestep 3), group of 1
00:48:35.055        1  6.0000E-09  6.000E-09  1.203E-06  1747.53    1.00
00:48:39.009        2  1.2000E-08  6.000E-09  1.175E-06  1747.62    1.00
main: before timestep 4
main: complete frame (end of timestep 4), group of 1
main: before timestep 5
main: complete frame (end of timestep 5), group of 1
00:48:43.565        3  1.8000E-08  6.000E-09  1.200E-06  1747.71    1.00
main: before timestep 6
main: complete frame (end of timestep 6), group of 1
main: before timestep 7
main: complete frame (end of timestep 7), group of 1
main: before timestep 8
main: complete frame (end of timestep 8), group of 1
main: before timestep 9
main: complete frame (end of timestep 9), group of 1
00:48:47.326        4  2.4000E-08  6.000E-09  1.177E-06  1747.80    1.00
00:48:51.177        5  3.0000E-08  6.000E-09  1.151E-06  1747.88    1.00
00:48:54.928        6  3.6000E-08  6.000E-09  1.084E-06  1747.97    1.00
00:48:58.735        7  4.2000E-08  6.000E-09  9.601E-07  1748.05    1.00
main: before timestep 10
main: begin trace (start of timestep 10)
main: before timestep 11
main: before timestep 12
main: before timestep 13
main: before timestep 14
main: before timestep 15
main: before timestep 16
main: before timestep 17
main: before timestep 18
main: before timestep 19
main: end trace (end of timestep 19)
main: complete frame (end of timestep 19), group of 10
main: before timestep 20
main: begin trace (start of timestep 20)
main: before timestep 21
main: before timestep 22
main: before timestep 23
main: before timestep 24
main: before timestep 25
main: before timestep 26
main: before timestep 27
main: before timestep 28
main: before timestep 29
main: end trace (end of timestep 29)
main: complete frame (end of timestep 29), group of 10
main: before timestep 30
main: begin trace (start of timestep 30)
main: before timestep 31
main: before timestep 32
main: before timestep 33
main: before timestep 34
main: before timestep 35
main: before timestep 36
main: before timestep 37
main: before timestep 38
main: before timestep 39
main: end trace (end of timestep 39)
main: complete frame (end of timestep 39), group of 10
main: before timestep 40
main: begin trace (start of timestep 40)
main: before timestep 41
main: before timestep 42
main: before timestep 43
main: before timestep 44
main: before timestep 45
main: before timestep 46
main: before timestep 47
main: before timestep 48
main: before timestep 49
main: end trace (end of timestep 49)
main: complete frame (end of timestep 49), group of 10
00:49:02.570        8  4.8000E-08  6.000E-09  8.320E-07  1748.12    1.00
00:49:06.408        9  5.4000E-08  6.000E-09  6.971E-07  1748.20    1.00
00:49:24.202       10  6.0000E-08  6.000E-09  6.394E-07  1748.27    1.00
00:51:21.627       20  1.2000E-07  6.000E-09  1.160E-06  1748.61    1.00

Summarizing what I see in the trace:

  • Issue 4 frames
  • Complete 2
  • Issue 2 frames
  • Complete 1
  • Issue 4 frames
  • Complete 4
  • Issue 4 frames (note these are longer frames now, they include 10 timesteps each, but there are still 4 frames)

In the rest of the application run (not shown in the excerpt), the number of outstanding frames seems to be unpredictable, and the application issues anywhere from 2 to 3 frames at a time. Note that these are quite long-running frames (about 30 seconds each), so there's plenty of time for Legion to get ahead. Note also that the full application has no blocking calls and that if I don't put in the frame calls Legion will happily run the entire top-level task to completion and exit before any timesteps execute.

In short, the application does not reach a predictable steady state and it seems we issue varying frames and the limit goes up to 4.

The configure_context call in the full application is very similar to what appears in the reproducer:

void RHSTMapper::configure_context(const MapperContext ctx,
                                 const Task& task,
                                 ContextConfigOutput& output)
{                                                                                                       
  output.min_tasks_to_schedule = 256;
  output.min_frames_to_schedule = 1;

  output.max_window_size = 1024;
  output.max_outstanding_frames = 2;

  output.hysteresis_percentage = 0;                                                                                                                                      
}

@lightsighter
Copy link
Contributor

This should be fixed now that inordercommit has merged into the master branch. Assigning back to @elliottslaughter to confirm.

@elliottslaughter
Copy link
Contributor Author

Confirmed that S3D is behaving properly with frames now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants