Skip to content
This repository has been archived by the owner on Dec 9, 2021. It is now read-only.

Signal handling may be delayed, which leads to dropped samples #5

Open
mgedmin opened this issue Jan 27, 2016 · 6 comments
Open

Signal handling may be delayed, which leads to dropped samples #5

mgedmin opened this issue Jan 27, 2016 · 6 comments

Comments

@mgedmin
Copy link

mgedmin commented Jan 27, 2016

I've previously mentioned this issue in the comments of PR #1, but now that I understand what the problem is, I think it's worth creating a separate issue, not tied to any particular PR.

djdt-flamegraph uses interval timers that periodically generate signals, and registers a signal handler to sample the current stack frame. Now, the way CPython implements signal handlers is that they set an internal flag, which is checked the next time the CPython interpreter enters the main eval loop.

A consequence of this is that if the main thread is blocked in some C code, signal handling gets delayed for an unbounded time, and you're getting a skewed profile picture because you're missing a significant number of samples. Example: executing an SQL query via psycopg2 may delay stack sampling for hundreds or thousands of milliseconds.

@mgedmin
Copy link
Author

mgedmin commented Jan 27, 2016

I've implemented a workaround in my fork that synthesizes fake stack samples for all the missed sampling intervals. Now all I need to do is clean it up and create a fresh PR, without the extra debugging gunk.

I seem to be taking unreasonable amounts of time doing this, because I'm busy with other things. :/

@blopker
Copy link
Owner

blopker commented Jan 27, 2016

Thank you for investigating this and the detailed explanation. Can't wait to see the PR!

@guettli
Copy link

guettli commented Feb 27, 2018

any progress on this one? Is this issue still valid, or do you use other tools day?

@mgedmin
Copy link
Author

mgedmin commented Feb 27, 2018

I'm sorry, I drifted away to other projects without finding the time to clean this up. :(

@guettli
Copy link

guettli commented Feb 28, 2018

@mgedmin I once wrote a similar tool called live-trace (https://github.com/guettli/live-trace) which dumps the current stacktrace every N milliseconds. But it has the same fundamental problem that you noted: if the main thread is blocked in some C code, signal handling gets delayed for an unbounded time. I tried to find a tool for profiling production environments with low impact, but up to now I could not find a solution. Do you have a hint?

@guettli
Copy link

guettli commented Feb 28, 2018

Just for the records, I asked here to find a solution: https://stackoverflow.com/questions/49030629/statistical-profiling-in-python

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants