Include sampling profiler in julia Base #2597

Closed
ViralBShah opened this Issue Mar 18, 2013 · 49 comments

Comments

Projects
None yet
7 participants
Owner

ViralBShah commented Mar 18, 2013

Include @timholy 's sampling profiler in julia Base. This was the general consensus in #1865 and I am opening a new issue to track this.

@ghost ghost assigned timholy Mar 18, 2013

Owner

timholy commented Mar 18, 2013

I haven't forgotten this. There's at least one thing holding me back: I think to be really useful, sprofile_tree needs graphic visualization. I haven't implemented that yet, but it's on my TODO list.

Until we have graphics in base, I wonder if it's better left in a package.

Owner

ViralBShah commented Mar 18, 2013

Ok. I personally would be happy with a text based profiler, and can certainly use it from the package for now. Changing this to the 1.0 milestone.

Owner

JeffBezanson commented Mar 18, 2013

Agree; I am happy picking through obscure text output. And a simple rough summary of where time is spent can be quite valuable, even without call tree information.

Owner

ViralBShah commented Mar 18, 2013

The sprofile_flat example shown in the SProfile README is good enough for a lot of use cases. I would feel a lot better if this was available in base. The graphical version can come later when we have visualization.

Owner

timholy commented Mar 18, 2013

OK, I can push what's available to core in time for 0.2. Sometime later I'll experiment with graphical visualization in the package.

As far as interpreting the results of text-based output from sprofile_tree, the best trick is to store the results to a file and use a text editor's cursor to help follow the indentation. (I had to switch to single-character indentation just because the call tree is often so deeply nested, but that's made it quite hard to parse visually.)

Owner

timholy commented Mar 19, 2013

There's a new sprofile branch. Feel free to make changes, and merge into master when you deem it ready. (Or give feedback and I can do the cleanups.)

Owner

timholy commented Mar 19, 2013

I should have added: sprofile.c needs a timer implementation for Windows. All it needs to do is the same thing that the Darwin & Linux #if defined functions do.

Owner

ViralBShah commented Mar 20, 2013

I was able to compile on mac. Can we call this profile instead of sprofile? The original profile would then need a new name.

Owner

ViralBShah commented Mar 20, 2013

Can we use the libuv timers here?

Member

vtjnash commented Mar 20, 2013

@ViralBShah we need fully async timers, which libuv does not provide

I would be very annoyed to have this merged until windows support can be realized (since it tends to cause a bug report headache -- and windows already has enough open issues). I don't think that is feasible for 0.2.

I can probably help some there. However, brief google searching doesn't reveal the existence of any asynchronous timers, so we probably need a helper thread. But more brief google searching hasn't been very fruitful in finding a version of RtlCaptureStackBackTrace which accepts a thread PHANDLE or CONTEXT argument (although gdb is able to pretend to do it, and fail miserably at succeeding)

Owner

ViralBShah commented Mar 20, 2013

That is fair enough. It would be great to have this working on Windows, and we can certainly wait for one more release for that.

Owner

timholy commented Mar 20, 2013

Didn't realize that timers on Windows would be so problematic, given that it's a dozen lines on Unix. Sorry about that. I'll do some investigating myself.

If necessary this can sit and stew until 0.3.

Owner

timholy commented Mar 20, 2013

Wow, it's shockingly hard to find a "signaling timer" on Windows that has better than several-millisecond accuracy.

I can't quite be sure, but I wonder if this might do the trick:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686289(v=vs.85).aspx

But see:
http://stackoverflow.com/questions/12199872/resolution-of-waitforsingleobject-with-timesetevent-setwaitabletimer
The last reply about migrating between cores is interesting, but I wonder whether that would, say, reduce BLAS to using just one core?

Owner

timholy commented Mar 20, 2013

Can we call this profile instead of sprofile? The original profile would then need a new name.

The original profile has already been renamed iprofile for "instrumenting profiler." So this wouldn't be a problem. But I'm a little hesitant because there are in fact quite a few possible types of profilers:
http://en.wikipedia.org/wiki/Profiling_(computer_programming)

If, for example, we ever develop a call-graph or hypervisor profiler and want to include that in base (that would basically have to be in base), then having the sampling profiler that's simply called "profile" might be annoying.

Owner

timholy commented Mar 20, 2013

A couple more thoughts (then I'm going to be away for a couple days with uncertain access to WiFi):

  • If the timer occasionally has much longer gaps, that's not a catastrophe: it just means that you don't collect as many samples as you intended. It only needs to be decently accurate, and preferably able to run at relatively short intervals. (My measurements on Linux indicated that taking a backtrace consumes ~30 microseconds, so running once per millisecond yields a profiling penalty of just a few percent, which seems about the right balance.)
  • If it would help and if you think the function I identified will do the trick, I'm happy to put together a draft Windows implementation. But don't expect too much benefit from that, since it would be untested, and testing/fixing bugs would have to be done on a Windows machine.
  • Alternatively, in sysimg.jl we could wrap the include("sprofile.jl") in @unix_only. Don't know how you'd feel about that, Jameson; it would prevent "bug reports" but might be begging for "feature requests."

timholy added a commit that referenced this issue Mar 20, 2013

Minimal changes needed to un-break package version of sprofile
Until #2597 gets resolved, at least these changes are needed for the
0.2 branch of the package to work.
Member

vtjnash commented Mar 20, 2013

Those timers are possibly correct, but need to be used from a separate thread, as they are not asynchronous in the linux sense. Window's will only call the callback function when the application has gone into a sleep state. It wouldn't mess with openblas in that case to set processor affinity for that thread. But the timers aren't necessarily the critical issue -- I'm not sure how to record a backtrace from the another thread (other than doing code injection).

Announcing 0.2 has a new feature is bound to result in bug reports when people try to copy paste the example code into their repl and it doesn't work. I think they have a hard time accepting that linux might just be better.

Member

vtjnash commented Mar 20, 2013

edit: I think it can be accomplished with StackWalk64 being called from a timer thread

Owner

timholy commented Mar 21, 2013

Announcing 0.2 has a new feature is bound to result in bug reports when
people try to copy paste the example code into their repl and it doesn't
work.

You're absolutely right, of course. Let's just wait until it can be addressed
properly. The package exists for people who want to use it.

Owner

ViralBShah commented Apr 11, 2013

@vtjnash It seems that 0.2 should be released in the next month or so. So is 0.3 the right milestone?

Member

vtjnash commented Apr 11, 2013

@ViralBShah it still depends on when the windows support can be added to it. also, someone should pick a real date for 0.2 and update all the future milestones appropriately.

On another tangent, anyone know how to suggest features to github? I really want to be able to assign multiple people to issues.

Member

pao commented Apr 11, 2013

@vtjnash Contact support for that--https://github.com/contact

Member

vtjnash commented Jun 2, 2013

The windows backtrace code now uses StackWalk64, so we are half way to having this. Note that StackWalk64 & friends are not thread-safe, so we just need to be careful that we don't try to profile the backtrace.

Owner

timholy commented Jun 2, 2013

Exciting! Thanks, Jameson!

@timholy timholy referenced this issue in timholy/IProfile.jl Jun 14, 2013

Closed

Busy loop on profiling external Cmd's #16

Owner

ViralBShah commented Jul 10, 2013

Is it possible to include this in Base for 0.2, given the recent progress?

Owner

ViralBShah commented Jul 24, 2013

@timholy Trying to gently bring the profiler into Base, I am wondering if as a start, we can at least move the C code for sampling into Base? In the early phases, it may not be compiled on Windows, but it would make installing the profiler package much easier.

Owner

timholy commented Jul 24, 2013

I have no objections, but I will defer to @vtjnash on this. Certainly the status quo seems quite satisfactory on Linux, although I guess compilers are less common on other platforms.

Member

vtjnash commented Jul 24, 2013

It's probably easier for me if it is in base -- fewer places to update code for adding windows support. I just asked that it not be part of the advertised features in base until we can get windows support ready.

Owner

timholy commented Jul 24, 2013

We can leave the Julia code in the package, that way the advertised state won't be an issue.

Owner

ViralBShah commented Jul 24, 2013

Thanks for adding it. Mentally, it does feel like it is easier to make it work on Windows now that it is in Base. Any chance this can happen in the 0.2 release timeframe?

For the record, relevant commit is 159203b

Owner

StefanKarpinski commented Jul 24, 2013

I strongly feel that the sampling profiler should go in base and always be available. It should also be renamed to just @profile and maybe have a little API work done on it, but having it in base seems essential to me.

Owner

timholy commented Jul 24, 2013

I was originally reluctant to name it @profile because there are so many possible types of profilers. However, at this point to me this seems like the One True Profiler, and I have difficulty seeing the need for anything else. (Well, a call-graph analyzer would be nice, essentially @which on steroids.)

So I agree that it should probably be renamed. As for merging the Julia code into base, I'm sensitive to @vtjnash's concern that there's still work to be done on the Windows side (high-resolution timers are clearly a major pain on Windows). And since he's such a genius helping make Julia better in so many ways, his word is like, gospel, to me :-).

Owner

StefanKarpinski commented Jul 24, 2013

Well, @vtjnash's only objection was announcing this as a feature that's supported on Windows. In fact, he feels it would be easier for him to get the profiler working on Windows if it's part of Base. So I think we should include it and just be very clear that it doesn't yet work on Windows.

Owner

ViralBShah commented Jul 24, 2013

That is a useful point. We can merge the sampling profiler into base, but not document it or advertise it until it is ready. It is basically a new feature under development in master, and can be announced when complete.

Let's just call it @profile.

Member

quinnj commented Jul 24, 2013

I'd be happy to help test/work on the 64-bit Windows side if needed. This has been at the top of my wish list as of late :)

Owner

timholy commented Jul 24, 2013

Maybe I'm misinterpreting, but I think he's wanting it not to be an advertised feature of any base until the Windows support is ready. They've already worked hard to close a lot of gaps, and this just moves the finish line. So even though I want it in base too, I'm quite sympathetic. It's pretty easy to insert "using Profile" into your .juliarc.jl. (I'm not advocating this as a long-term solution, of course.)

One thing I don't see a reason to wait on, however: @StefanKarpinski, I'd be happy to start working on API changes in preparation for a merge to base. Care to file issues at Profile.jl, so I know what you have in mind? Then it will be ready whenever we decide to pull the trigger.

@karbarcca, certainly any help getting it implemented on Windows is the cleanest solution.

Owner

StefanKarpinski commented Jul 24, 2013

Let's just move it into Base and make it clear that it doesn't yet work on Windows. Not having it in Base doesn't make it work on Windows any more. Let's do API changes after that.

vtjnash added a commit that referenced this issue Jul 25, 2013

Member

vtjnash commented Jul 25, 2013

I'll leave it to @timholy to fix the implementation in the Profile package:

  • removing references to libprofile
  • fixing the ccall types status = ccall((:sprofile_init, libprofile), Cint, (Uint, Uint64), nsamples, delay)
Owner

ViralBShah commented Jul 25, 2013

Wow this landed much faster than expected. We can now move the sampling profiler to base!

Owner

ViralBShah commented Jul 25, 2013

I am marking this as milestone 0.2, since it may actually be possible. We can move it further out if necessary.

Member

vtjnash commented Jul 25, 2013

I'm not sure it should be 0.2, since we really should be tagging that immediately, if Pkg2 is ready.

Owner

ViralBShah commented Jul 25, 2013

This perhaps needs to be discussed elsewhere, but we should ideally go through a couple of release candidates, so that we do not have to scamper around and release updates soon after.

Owner

timholy commented Jul 25, 2013

Many thanks, @vtjnash, for writing the Windows timers.

The move of the Julia code is done. Given the question about whether this is arriving in time for 0.2, I pushed it to the new "profile" branch. (I'll delete the sprofile branch shortly.)

Quite a lot of API cleanup, too. See 39cd8dc. @StefanKarpinski, please do let me know of any other issues.

Owner

ViralBShah commented Jul 25, 2013

The thing is that until this is in master, it is unlikely to get tested out widely, especially on Windows. Perhaps that only happens after the 0.2 release.

Owner

timholy commented Jul 25, 2013

I'm not really in the loop about the plans for the 0.2 release; the Boston/IRC crowd probably knows more. So somebody else had better make a decision about this, sooner better than later.

At least outside of base, it is pretty well-tested (on Unix). Since there are still one or two more things left in Pkg2, I personally think this should be able to make it in. But I won't merge profile->master until someone tells me to.

Owner

timholy commented Jul 26, 2013

Merged. With docs, too!

@timholy timholy closed this Jul 26, 2013

Member

quinnj commented Jul 26, 2013

Awesome! Many thanks to @timholy and @vtjnash. I just gave it a test run on windows 64-bit and it's running great.

Quick question I didn't see mentioned in the docs:
If I get a result like the following, what does the "+1" and "+7" mean in the bottom two backtraces?

julia> Profile.print()
4    ...me.jl\src\Datetime.jl; setoffset; line: 120
2    profile.jl; anonymous; line: 9
3    profile.jl; anonymous; line: 14
 662  ...me.jl\src\Datetime.jl; datetime; line: 132
       541 ....jl\src\Datetime.jl; setoffset; line: 120
  10   ...me.jl\src\Datetime.jl; datetime; line: 132
        9 ....jl\src\Datetime.jl; setoffset; line: 120
   1    stream.jl; process_events; line: 481
                2526 none; test; line: 5
                       1   array.jl; splice!; line: 831
                       1   inference.jl; inlineable; line: 1642
                                     +1 3   ...ime.jl; datetime; line: 132
                                     +7 2 ...time.jl; setoffset; line: 120
Member

vtjnash commented Jul 26, 2013

Cool. I like when stuff actually works on windows, without first needing testing and modification (I only wrote it against wine32).

Owner

timholy commented Jul 26, 2013

Great to hear! Nice work, Jameson!

If I get a result like the following, what does the "+1" and "+7" mean in the bottom two backtraces?

It's briefly described in the section on the cols option. Looks like I should expand that a bit.

Member

quinnj commented Jul 26, 2013

Ah, got it. I think a quick code snippet showing what you were talking about is all that's needed. Reading through the first time, it didn't click and scanning the code examples I didn't see any examples of that. Great work though, I love having this functionality in Base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment