Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling native threads? #332

Open
SimonSapin opened this issue Dec 21, 2020 · 5 comments · May be fixed by #637
Open

Profiling native threads? #332

SimonSapin opened this issue Dec 21, 2020 · 5 comments · May be fixed by #637

Comments

@SimonSapin
Copy link

Does py-spy record ignore threads that don’t contain any Python stack frame by default?

I have a Python program with a native extension (that happens to be written in Rust). That extension starts a thread (with Rust’s std::thread::spawn) to do some CPU-intensive work in parallel with other work. The child thread never runs a Python interpreter. The SVG output of the profiler is missing everything in the second thread. --native does show Rust stack frames, but only in the parent thread. Adding --threads adds the ID of the parent thread to the output but nothing else. Adding --idle doesn’t seem to change anything for this program.

When using py-spy dump --pid (at the right time) however, the stack of both threads is printed correctly.

Can I use py-spy to profile both threads?

@benfred
Copy link
Owner

benfred commented Dec 31, 2020

Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread

@SimonSapin
Copy link
Author

That’s unfortunate. Can you say more about this merging? Does it need to happen?

@ogrisel
Copy link

ogrisel commented Aug 4, 2021

Indeed that would be very helpful to have py-spy handle native threads in the reporting to understand the performance of CPU intensive Python programs that use datascience libraries like numpy that rely on multi-threaded linear algebra native libraries such as OpenBLAS, MKL and co.

Same for machine learning libraries like scikit-learn, lightgbm and xgboost that use OpenMP threads in the CPU intensive sections of the code written in Cython or C++.

At the moment profiling with py-spy --native --threads --format speedscope and loading the results into the speedscope visualizer makes no sense to me...

@Jongy
Copy link
Contributor

Jongy commented Aug 6, 2021

We're using libunwind-ptrace in PyPerf and we just place native frames on top of the Python frames (stopping at the first native frame that is the PyEval_EvalFrame* which belong to the topmost Python function). For a truly native thread with no Python frames, we will just have its native stack.

IIRC py-spy uses libunwind-ptrace as well? So this rather simple scheme could work.

@ogrisel
Copy link

ogrisel commented Aug 18, 2021

Not right now =( We merge the native stack traces into python frames - but not vice versa. You'll have to profile with other native profiling tools like perf etc to get profile the native thread

@benfred It would be great to have native thread in py-spy: in my case, some of those native threads are managed by OpenMP via Cython prange loops: in this case they can call Cython functions and py-spy Cython support would be very handy.

Furthermore, if speedscope ever supports multitrack views with time-aligned traces, it would be very helpful to understand when those native threads come into play and interact with the calling Python code.

Would @Jongy's suggested solution above work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants