Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clone() syscall infinitely restarts because of SIGPROF signals #97

Closed
advancedxy opened this issue Mar 20, 2018 · 7 comments
Closed

clone() syscall infinitely restarts because of SIGPROF signals #97

advancedxy opened this issue Mar 20, 2018 · 7 comments

Comments

@advancedxy
Copy link

When profiling a Spark application with large memory and subprocess execution(launch a subprocess in the JVM or native library side), the whole process was hanging at fork forever.

After some debugging, I believe it's similar with https://bugzilla.redhat.com/show_bug.cgi?id=645528 .
And the workaround is simple: increase interval to 20ms.

You can add this to the README or I can send a pr for this

@apangin
Copy link
Collaborator

apangin commented Mar 20, 2018

Interesting. Can you reproduce the issue on a reduced test case? (to make sure this is exactly the problem you referred to).
A paragraph in the troubleshooting section will be helpful then.
Thanks.

@advancedxy
Copy link
Author

Let's setup a minimal reproduce case first then. I will post back when I get one.

@advancedxy
Copy link
Author

advancedxy commented Mar 20, 2018

Let's setup a minimal reproduce case first then. I will post back when I get one.

Sorry, I tried to setup a minimal reproduce case in JVM only, however the scenario I described cannot be reproduced.

I reproduced the case by simplify my real workload, it's indeed hangs at clone. I cannot post my workload here, but the important debug process can be shared:

After I found the thread hanging forever, I use strace -p $lwpid to generate following output

--- SIGPROF (Profiling timer expired) @ 0 (0) ---
read(52, "w\322\3\0\0\0\0\0", 8)        = 8
gettid()                                = 32558
ioctl(52, 0x2403, 0)                    = 0
ioctl(52, 0x2402, 0x1)                  = 0
rt_sigreturn(0x34)                      = 56
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7feae49b19bc) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
read(52, "\345\323\3\0\0\0\0\0", 8)     = 8
gettid()                                = 32558
ioctl(52, 0x2403, 0)                    = 0
ioctl(52, 0x2402, 0x1)                  = 0
rt_sigreturn(0x34)                      = 56
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7feae49b19bc) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
read(52, "l\321\3\0\0\0\0\0", 8)        = 8
gettid()                                = 32558
ioctl(52, 0x2403, 0)                    = 0
ioctl(52, 0x2402, 0x1)                  = 0
rt_sigreturn(0x34)                      = 56
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7feae49b19bc) = ? ERESTARTNOINTR (To be restarted)

It looks that the clone is repeatedly interrupted by SIGPROF

@apangin apangin changed the title Provide a gotcha/troubleshooting experience clone() syscall infinitely restarts because of SIGPROF signals Mar 20, 2018
@apangin
Copy link
Collaborator

apangin commented Mar 20, 2018

Thank you for a great analysis! Feel free to add a README paragraph or leave it to me if you prefer.

@advancedxy
Copy link
Author

I will send a PR for for the README then.

@apangin
Copy link
Collaborator

apangin commented Mar 27, 2018

Let me close this one. Thanks again for pointing out this issue.

@apangin apangin closed this as completed Mar 27, 2018
ktoso pushed a commit to sbt/sbt-jmh that referenced this issue May 9, 2018
… to latest sbt version (#148)

* Add optional sampling interval parameter for Async profiler to avoid issues like: async-profiler/async-profiler#97

* Switch to latest sbt version
@apangin apangin mentioned this issue Jun 12, 2019
@Jongy
Copy link
Contributor

Jongy commented Jan 15, 2022

I was skimming through vmprof-python's code today, and I found this solution that they've implemented for the same issue. I suppose it is something we could add in async-profiler as well (as an opt-in feature).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants