Skip to content

ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression#153

Merged
widgetii merged 1 commit intomasterfrom
feat/wait-loop-signal-fix
May 3, 2026
Merged

ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression#153
widgetii merged 1 commit intomasterfrom
feat/wait-loop-signal-fix

Conversation

@widgetii
Copy link
Copy Markdown
Member

@widgetii widgetii commented May 3, 2026

While investigating an empty trace from libsns_jxf22.so on hi3518ev200, three real bugs in the wait loop turned up that are worth fixing independently of jxf22's specific issue.

Bugs fixed

1. Signal forwarding

The loop ended every iteration with ptrace(PTRACE_SYSCALL, pid, 1, NULL). The fourth arg is the signal to inject when resuming the tracee, and NULL meant "drop the signal entirely". So if a child stopped on a real signal (anything other than SIGTRAP — SIGCHLD, SIGRT*, SIGUSR*, etc.), ipctool swallowed it instead of forwarding it. The HiSilicon SDK uses realtime signals heavily; dropping them under trace can deadlock a streamer.

Now: if the stop signal is SIGTRAP, the resume signal stays at 0; if it's a genuine signal-delivery stop, the original signal gets re-injected when the tracee resumes.

2. Group-stop / post-clone init-stop suppression

Subtle interaction with #1: the kernel SIGSTOPs a newly cloned tracee as part of TRACECLONE bookkeeping. With my naive forwarding fix above, we'd forward SIGSTOP back to the new clone — keeping it permanently stopped and producing the exact "empty trace" symptom on multi-threaded streamers.

Now: SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU are recognised as group-stops / job-control / clone-init stops and explicitly suppressed (signal=0 on resume). Other real signals still forward.

3. PTRACE_EVENT_FORK / PTRACE_EVENT_VFORK ignored

#152 added the matching PTRACE_O_TRACEFORK/VFORK options but the wait loop only matched PTRACE_EVENT_CLONE. So a forked child fired PTRACE_EVENT_FORK in its parent (ignored), then on its first syscall stop the lookup against pids returned NULL and we hit the "BAD lookup" branch which break'd out of the wait loop — killing the whole trace. The exit handler also broke on missing-pid lookup. Both cases now continue gracefully.

Investigation tooling

tools/sns_init_probe.c — dlopen+dlsym wrapper to load a libsns_*.so directly and call its sensor init function. Lets future researchers exercise sensor I/O paths in isolation from the streamer.

What this PR does NOT fix

The hi3518ev200 + jxf22 specific empty-trace remains. Diagnostic data:

strace -f /usr/bin/majestic  : 79 writes captured, all on TID 1671 (main thread)
strace    /usr/bin/majestic  : 0  writes (clone'd worker activity invisible)
ipctool trace ...            : 0  writes despite TRACECLONE | TRACEFORK
                                 and the three fixes in this PR

Same kernel, same camera, same libsns_jxf22.so, same ptrace primitive — strace gets the writes, ipctool doesn't. The bug must be in ipctool's specific event handling on this combo. Without on-target step-debug it's hard to narrow further; reserving for a follow-up.

Test plan

  • SC2315E + Majestic regression: 100/100/100% diff against widgetii/smart_sc2315e unchanged
  • CI test-extraction-pipeline passes
  • No regression on cv300+IMX291 / av200+IMX385

🤖 Generated with Claude Code

…race

While investigating an empty trace from libsns_jxf22.so on hi3518ev200,
two real bugs in the wait loop turned up that are worth fixing
independently of jxf22's specific issue.

* Signal forwarding. The loop ended every iteration with
  ptrace(PTRACE_SYSCALL, pid, 1, NULL). The fourth arg is the signal
  to inject when resuming the tracee, and NULL meant "drop the
  signal entirely". So if a child stopped on a real signal (anything
  other than SIGTRAP - SIGCHLD, SIGRT*, SIGUSR*, etc.), ipctool
  swallowed it instead of forwarding it. The HiSilicon SDK uses
  realtime signals heavily for video pipeline coordination; dropping
  them under trace can deadlock a streamer.

  Now: if the stop signal is SIGTRAP it stays at 0 (nothing to
  forward); if it's a genuine signal-delivery stop, the original
  signal gets re-injected when the tracee resumes.

* PTRACE_EVENT_FORK / PTRACE_EVENT_VFORK weren't handled. #152 added
  the matching PTRACE_O_TRACEFORK/VFORK options but the wait loop
  only matched PTRACE_EVENT_CLONE. So a forked child fired
  PTRACE_EVENT_FORK in its parent (ignored), then on its first
  syscall stop the lookup against `pids` returned NULL and we hit
  the "BAD lookup" branch which `break`'d out of the wait loop -
  killing the whole trace.

  Now: the same CLONE handling block matches all three events
  (CLONE | FORK | VFORK). Plus the BAD-lookup case no longer
  breaks - it just continues, since under TRACEFORK there's a brief
  window where a child can hit a syscall stop before its parent's
  EVENT_FORK arrives and we register it.

* Exit handling for unknown PIDs no longer breaks the loop either.
  If a child exits before we observed its creation event, we just
  skip the bookkeeping and keep tracing the rest.

tools/sns_init_probe.c added: a tiny dlopen+dlsym wrapper that
loads a libsns_*.so directly and calls its sensor init function.
Lets a future researcher exercise sensor I/O paths in isolation
from the streamer (handy for narrowing down "empty trace" issues
to the .so vs the surrounding application). Build instructions in
the file header.

Verified:
* SC2315E + Majestic regression: 100/100/100% diff against
  widgetii/smart_sc2315e unchanged.
* hi3518ev200 + jxf22 still produces an empty trace despite the
  signal/fork fixes. Strace confirms the streamer DOES make 79
  write() calls of 2 bytes to a /dev/i2c-0 fd (opened TWICE: first
  at fd 18 by the probe code, then a second open at fd 25 by
  libsns_jxf22.so itself - that second open is what we're missing).
  The bug is somewhere else in the trace path on this specific
  camera/build combo; tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii force-pushed the feat/wait-loop-signal-fix branch from b29dd89 to db53495 Compare May 3, 2026 18:25
@widgetii widgetii changed the title ptrace wait loop: forward signals, handle FORK/VFORK, don't break on race ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression May 3, 2026
@widgetii widgetii merged commit 6084896 into master May 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant