ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression#153
Merged
ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression#153
Conversation
…race While investigating an empty trace from libsns_jxf22.so on hi3518ev200, two real bugs in the wait loop turned up that are worth fixing independently of jxf22's specific issue. * Signal forwarding. The loop ended every iteration with ptrace(PTRACE_SYSCALL, pid, 1, NULL). The fourth arg is the signal to inject when resuming the tracee, and NULL meant "drop the signal entirely". So if a child stopped on a real signal (anything other than SIGTRAP - SIGCHLD, SIGRT*, SIGUSR*, etc.), ipctool swallowed it instead of forwarding it. The HiSilicon SDK uses realtime signals heavily for video pipeline coordination; dropping them under trace can deadlock a streamer. Now: if the stop signal is SIGTRAP it stays at 0 (nothing to forward); if it's a genuine signal-delivery stop, the original signal gets re-injected when the tracee resumes. * PTRACE_EVENT_FORK / PTRACE_EVENT_VFORK weren't handled. #152 added the matching PTRACE_O_TRACEFORK/VFORK options but the wait loop only matched PTRACE_EVENT_CLONE. So a forked child fired PTRACE_EVENT_FORK in its parent (ignored), then on its first syscall stop the lookup against `pids` returned NULL and we hit the "BAD lookup" branch which `break`'d out of the wait loop - killing the whole trace. Now: the same CLONE handling block matches all three events (CLONE | FORK | VFORK). Plus the BAD-lookup case no longer breaks - it just continues, since under TRACEFORK there's a brief window where a child can hit a syscall stop before its parent's EVENT_FORK arrives and we register it. * Exit handling for unknown PIDs no longer breaks the loop either. If a child exits before we observed its creation event, we just skip the bookkeeping and keep tracing the rest. tools/sns_init_probe.c added: a tiny dlopen+dlsym wrapper that loads a libsns_*.so directly and calls its sensor init function. Lets a future researcher exercise sensor I/O paths in isolation from the streamer (handy for narrowing down "empty trace" issues to the .so vs the surrounding application). Build instructions in the file header. Verified: * SC2315E + Majestic regression: 100/100/100% diff against widgetii/smart_sc2315e unchanged. * hi3518ev200 + jxf22 still produces an empty trace despite the signal/fork fixes. Strace confirms the streamer DOES make 79 write() calls of 2 bytes to a /dev/i2c-0 fd (opened TWICE: first at fd 18 by the probe code, then a second open at fd 25 by libsns_jxf22.so itself - that second open is what we're missing). The bug is somewhere else in the trace path on this specific camera/build combo; tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b29dd89 to
db53495
Compare
This was referenced May 3, 2026
ipctool trace produces empty output on Hi3518EV200 + libsns_jxf22.so despite write() being used
#154
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While investigating an empty trace from libsns_jxf22.so on hi3518ev200, three real bugs in the wait loop turned up that are worth fixing independently of jxf22's specific issue.
Bugs fixed
1. Signal forwarding
The loop ended every iteration with
ptrace(PTRACE_SYSCALL, pid, 1, NULL). The fourth arg is the signal to inject when resuming the tracee, andNULLmeant "drop the signal entirely". So if a child stopped on a real signal (anything other than SIGTRAP — SIGCHLD, SIGRT*, SIGUSR*, etc.), ipctool swallowed it instead of forwarding it. The HiSilicon SDK uses realtime signals heavily; dropping them under trace can deadlock a streamer.Now: if the stop signal is SIGTRAP, the resume signal stays at 0; if it's a genuine signal-delivery stop, the original signal gets re-injected when the tracee resumes.
2. Group-stop / post-clone init-stop suppression
Subtle interaction with #1: the kernel SIGSTOPs a newly cloned tracee as part of
TRACECLONEbookkeeping. With my naive forwarding fix above, we'd forward SIGSTOP back to the new clone — keeping it permanently stopped and producing the exact "empty trace" symptom on multi-threaded streamers.Now: SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU are recognised as group-stops / job-control / clone-init stops and explicitly suppressed (signal=0 on resume). Other real signals still forward.
3. PTRACE_EVENT_FORK / PTRACE_EVENT_VFORK ignored
#152 added the matching
PTRACE_O_TRACEFORK/VFORKoptions but the wait loop only matchedPTRACE_EVENT_CLONE. So a forked child firedPTRACE_EVENT_FORKin its parent (ignored), then on its first syscall stop the lookup againstpidsreturned NULL and we hit the "BAD lookup" branch whichbreak'd out of the wait loop — killing the whole trace. The exit handler also broke on missing-pid lookup. Both cases now continue gracefully.Investigation tooling
tools/sns_init_probe.c— dlopen+dlsym wrapper to load alibsns_*.sodirectly and call its sensor init function. Lets future researchers exercise sensor I/O paths in isolation from the streamer.What this PR does NOT fix
The hi3518ev200 + jxf22 specific empty-trace remains. Diagnostic data:
Same kernel, same camera, same
libsns_jxf22.so, same ptrace primitive — strace gets the writes, ipctool doesn't. The bug must be in ipctool's specific event handling on this combo. Without on-target step-debug it's hard to narrow further; reserving for a follow-up.Test plan
widgetii/smart_sc2315eunchangedtest-extraction-pipelinepasses🤖 Generated with Claude Code