Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spans inside frames inside spans #6

Open
daniel-vainsencher opened this issue May 24, 2016 · 10 comments
Open

spans inside frames inside spans #6

daniel-vainsencher opened this issue May 24, 2016 · 10 comments

Comments

@daniel-vainsencher
Copy link

I am not writing a game, but numerical code, such that algorithms have loops with some interesting parts inside the loop and some outside.

So I want a span before the loop starts, a span on the whole loop, inside the loop each iteration is a frame, and more spans inside the loop, partitioning each frame.

Currently this results in errors like thread 'logisticregression3' panicked at 'flame::end("SublinearAveragingSolver") called without a currently running span!', /home/danielv/.cargo/registry/src/github.com-88ac128001ac3a9a/flame-0.1.5/src/lib.rs:257
(presumably because my next_frame inside the loop is hiding the loop scope flame::start("Sublinear...");

Does this make sense?

@TyOverby
Copy link
Collaborator

Yeah... Sadly frames are intended only to be used at the top level of performance collecting.

What is the pattern that you are trying to look at in your code, and what do you want frames to help you do?

@daniel-vainsencher
Copy link
Author

My optimization algorithms are typically some initialization and then a loop. The body of the loop has several actions. My main goal is to understand the cost of the in-loop actions, but I want to know (not guess) that the initialization isn't too expensive either.

Having removed the outer span, so that I have a next_frame at the beginning of each loop iteration and a bunch of span_of in it, I find the resulting flame graphs surprising: they have the spans sorted by name, but I expected the times for a particular named span to be summed over the frames, instead I have 10 copies (for ten loop iterations) of each. Is this the expected behavior?

@daniel-vainsencher
Copy link
Author

On second thought, what I should probably do is give up on wrapping the whole loop, but instead wrap the initialization code by its own span. Ok, that's reasonable.

@TyOverby
Copy link
Collaborator

The by-name sorting thing surprised me too. I've pushed 0.1.6 which fixes this.

Also, right now, none of the exporters support drawing frames. I'm writing my own viewer though, so when that's done, it'll support frames and viewing multi-threaded computation!

@daniel-vainsencher
Copy link
Author

It seems each user has different expectations about what frames mean... to me it was obvious we sum over frames: since frames are likely to be similar, I want to average over them to more precisely measure the different spans in it. Looking at hprof, they only show the spans in the current frame.

It is even not obvious what sort order is best: by size and by order of occurence both make sense.

@daniel-vainsencher
Copy link
Author

How do you plan to treat spans over a sequence of frames? summing as I'd prefer, or you have something else in mind?

@TyOverby
Copy link
Collaborator

For the flamegraph, I've opted to order by occurrence.

Honestly, the only reason I included the concept of a frame was to make visualizing perf for a game possible from inside the game itself. This is why I haven't built frames support into the dump visualizations yet.

Frames will probably always be the outermost unit of measurement in FLAME and they'll happen on a per-thread basis. What would it mean to have nested frames? Or a frame that doesn't

If you want to collapse multiple frames into each other, you can use flame::end_collapse(..) or my_guard.end_collapse(). This method only really works for leaf-nodes that share the same name but it can be handy though.

@daniel-vainsencher
Copy link
Author

Of course you will keep the design fit for your own purposes, but I will explain my point of view, maybe you'll find it useful.

Frames are a special case of loop bodies. Loops are sometimes nested. How should a profile of a program with (maybe nested) loops look like? (the visualization inside the game scenario is kind of orthogonal, will get back to it).

If you loop happens some small number of times, and each time is completely different because of differing input etc, you really want to ignore the loop entirely, and display the profiles of the loop bodies one after the other for comparison.

But it is pretty common that loops have many iterations (so the previous display proposed is impractical), and that different iterations are pretty similar to one another, and what we really care about is the relative costs of different loop-body-parts. In this case, you want to treat body-loop-passes (will call them frames for convenience now) as just samples from a process that you are trying to study. What can we do then? the most natural thing to do is average over them: construct a single profile, that has the union of all spans that occured over different frames, and divide the total measured time for each by the number of frames. But you can other statistics as well: show how many frames we are averaging over, show the +/- 25% percentiles in addition to the mean, pinpoint outliers etc.

What about nested loops? typically, you'll want to average over each of them separately. The inner loop will be its own part in the external one.

Of course, this applies to recurrences of different types, like iterators, not just loops.

In fact, a very natural way to implement all of this is to always "summarize" like this the contents of any span. When its children occur only once, you don't see it. When a span has children repeating many times, the summary is valuable.

To visualize inside the game: you can treat a particular span as the frame, and then for that span, either use only the last one as in hprof, or average over the last 30 for somewhat stabler display, or whatever is useful. But any in-frame loops, you still probably want the kind of summary I suggested above.

Sorry for the wall of text.

@TyOverby
Copy link
Collaborator

TyOverby commented May 26, 2016

Sorry for the wall of text.

Thanks for the wall of text; I understand your use case much more now.

I think that there are two (equally important) parts to FLAME: the API, and the viewer. With the alpha release, I'm proud of how small the API is, but the visualizer was thrown together at the last moment. I think in code with high amounts of repetition, there is certainly a need for the viewer to be intelligent; able to detect repetition, and have options for collapsing and summarizing.

Detection of the pattern produced by code like this

for _ in 0 .. 30 {
    ::flame::start("foo");
    do_something();
    ::flame::end("foo");
}

would be trivial, and could be very user-friendly without any API changes to account for it.

I'm going to start writing my own visualizer this weekend (the current one is 3rd party), and when I do, I'll keep loop-detection in mind.If the experiment works out well, I also might deprecate the "frame" API as it would be unnecessary.

@daniel-vainsencher
Copy link
Author

Cool, glad to clarify my use case, and glad to hear it is (IIUC) within
scope :)

On Thu, May 26, 2016 at 12:00 PM, Ty Overby notifications@github.com
wrote:

Sorry for the wall of text.

Thanks for the wall of text; I understand your use case much more now!

I think that there are two (equally important) parts to FLAME: the API,
and the viewer. With the alpha release, I'm proud of how small the API is,
but the visualizer was thrown together at the last moment. I think in code
with high amounts of repetition, there is certainly a need for the
viewer to be intelligent; able to detect repetition, and have options for
collapsing and summarizing.

Detection of the pattern produced by code like this

for _ in 0 .. 30 {
::flame::start("foo");
do_something();
::flame::end("foo");
}

would be trivial, and could be very user-friendly without any API changes
to account for it.

I'm going to start writing my own visualizer this weekend (the current one
is 3rd party), and when I do, I'll keep loop-detection in mind.
If the experiment works out well, I also might deprecate the "frame" API
as it would be unnecessary.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#6 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants