Skip to content

ipctool trace produces empty output on Hi3518EV200 + libsns_jxf22.so despite write() being used #154

@widgetii

Description

@widgetii

Summary

ipctool trace produces a near-empty trace when run against Majestic on a Hi3518EV200 camera with libsns_jxf22.so. Strace on the same camera, same binary, same ptrace primitive captures the full sensor I/O sequence (~79 register writes). Multiple ipctool fixes have been merged (#153 most recent) but the issue persists.

Affected configuration

Symptom

$ tools/capture_sensor.sh majestic --host openipc-hi3518ev200... --secs 30
$ wc -l tools/dumps/captured.log
10 tools/dumps/captured.log
$ cat tools/dumps/captured.log
[1283] child 1284 created
parent 1284 created child 1285
parent 1284 created child 1286
parent 1284 created child 1287
========================== i2c-19 ==========================
sensor_i2c_change_addr(0x100);
$

Trace stops after the first I2C_SLAVE_FORCE ioctl. No sensor_write_register lines despite Majestic logging ===soi_f22 sensor DVP 1080P30fps linear mode init success!===== to its own stderr.

Reference: what strace sees

Same camera, same Majestic invocation, with the static strace at /mnt/noc/sdk/strace:

$ strace -f -o trace.log -e trace=open,close,write,writev,ioctl /usr/bin/majestic
$ awk '{print $1}' trace.log | sort -u
1671
1675
1676
$ grep ' write(' trace.log | awk '{print $1}' | sort | uniq -c
     79 1671
$ grep -c 'i2c-0' trace.log
2
$ grep 'i2c-0' trace.log
1671  open(\"/dev/i2c-0\", O_RDWR|O_LARGEFILE) = 18    # probe fd
1671  open(\"/dev/i2c-0\", O_RDWR|O_LARGEFILE) = 25    # libsns's actual handle
  • Two distinct /dev/i2c-0 opens (fd 18 for Majestic's probe, fd 25 for libsns_jxf22.so's own handle).
  • 79 write(25, ...) calls of 2 bytes each (1-byte reg + 1-byte value, jxf22 is 8-bit/8-bit).
  • Three TIDs total visible (1671, 1675, 1676), all writes attributed to the main TID 1671.

Critical observation: strace without -f also shows zero writes. strace -f shows 79. This points at clone-following being the determining factor, even though the writes are attributed to the original TID. Either strace -f and strace use materially different ptrace setups (likely — strace uses PTRACE_SEIZE + PTRACE_O_TRACECLONE only with -f), or thread-vs-process semantics are confounding the analysis.

What's already been tried

All merged via #145#153:

change effect on jxf22 trace merged in
--output=PATH flag (decouple trace from streamer stdout) no change #145
Sony IMX init pattern in segmenter n/a (we never get to segmentation) #152
i2c_write_exit_cb handles 1+1 / 2+1 / 2+2 byte writes with chip-family endianness no change (decoder path never invoked) #152
hisi_gen2_ioctl_exit_cb decodes V2 I2C_SLAVE_FORCE gave us the one sensor_i2c_change_addr(0x100) line we do see #152
syscall_writev_exit for uClibc write→writev wrapping no change #152
`PTRACE_O_TRACEFORK TRACEVFORK` no change
broadcast_fd_open / broadcast_fd_close for CLONE_FILES siblings no change #152
Forward signals on PTRACE_SYSCALL resume no change (correct fix in general) #153
Suppress SIGSTOP/SIGTSTP/SIGTTIN/SIGTTOU to avoid re-stopping cloned children no change #153
Handle PTRACE_EVENT_FORK / PTRACE_EVENT_VFORK events no change #153
Don't break on missing-PID lookup race no change #153

Per-syscall instrumentation (temporarily) showed:

  • ipctool processes ~2400 syscall events in 30 s of capture (vs ~7000 strace sees in 18 s — 3x slower)
  • Of those 2400, exactly zero are __NR_write (4) or __NR_writev (146) to a /dev/i2c-0 fd
  • The single writev that fires is to fd=2 (libc stdio to stderr)
  • The trace caught only the first /dev/i2c-0 open at fd 19 (Majestic's probe). The second open at fd 25 (libsns_jxf22.so's own handle) is the one we're missing.

What's likely the bug

Best hypothesis after the investigation:

  1. After the first /dev/i2c-0 open + ioctl burst, Majestic does ~50 unrelated syscalls before the second /dev/i2c-0 open.
  2. Somewhere in those ~50 syscalls, ipctool's wait loop loses track of the tracee in a way strace -f does not.
  3. From that point on, our trace stream goes silent even though the kernel ptrace keeps generating events.

Areas to look at:

  • PTRACE_ATTACH vs PTRACE_SEIZE. ipctool uses PTRACE_ATTACH. strace uses PTRACE_SEIZE in modern builds. SEIZE has cleaner event-stop semantics (PTRACE_EVENT_STOP / PTRACE_LISTEN) and might be why strace -f succeeds where ipctool fails.
  • Group-stop handling under PTRACE_ATTACH. Even with the SIGSTOP suppression in ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression #153, real group-stops (SIGSTOP/SIGTSTP) on a child are subtle under ATTACH — see ptrace(2) NOTES section.
  • Whether __WALL is sufficient on this kernel. ipctool uses waitpid(-1, &status, __WALL); some old kernels need __WCLONE separately.

Reproduction

The OpenIPC lab has the camera at <your-hi3518ev200-camera>, NFS server 10.216.128.227:/srv/nfsroot (mountable as /mnt/noc on the host and /utils on the camera).

Build a fresh ipctool

# Toolchain
wget -qO- https://github.com/OpenIPC/firmware/releases/download/toolchain/toolchain.hisilicon-hi3516cv100.tgz \
    | tar xzf - -C /tmp/openipc-tc

# Build + UPX-pack (UPX is mandatory for the V2 kernel)
cd ipctool
PATH=/tmp/openipc-tc/arm-openipc-linux-musleabi_sdk-buildroot/bin:$PATH \
    cmake -H. -Bbuild -DCMAKE_C_COMPILER=arm-openipc-linux-musleabi-gcc -DCMAKE_BUILD_TYPE=Release
PATH=/tmp/openipc-tc/arm-openipc-linux-musleabi_sdk-buildroot/bin:$PATH cmake --build build
upx --best build/ipctool -o /mnt/noc/ipctool-upx

Capture under ipctool

ssh root@<your-hi3518ev200-camera> '
    mount -o nolock,vers=3 10.216.128.227:/srv/nfsroot /utils 2>/dev/null
    mkdir -p /utils/dumps
    killall -q majestic; sleep 1
    /utils/ipctool-upx trace --output=/utils/dumps/jxf22.log /usr/bin/majestic >/dev/null 2>&1 &
    TP=$!
    sleep 25
    kill $TP; killall -q ipctool-upx majestic; sleep 1
    /etc/init.d/S95majestic start
'
cp /mnt/noc/dumps/jxf22.log .
wc -l jxf22.log    # expect ~10

Capture under strace for comparison

ssh root@<your-hi3518ev200-camera> '
    cp /mnt/noc/sdk/strace /utils/strace-static    # already exists in lab
    killall -q majestic; sleep 1
    /utils/strace-static -f -o /utils/dumps/jxf22-strace.log /usr/bin/majestic >/dev/null 2>&1 &
    SP=$!
    sleep 18
    kill $SP; killall -q -9 strace-static majestic
    /etc/init.d/S95majestic start
'
cp /mnt/noc/dumps/jxf22-strace.log .
grep -c ' write(' jxf22-strace.log    # expect 79

Useful debug knobs (in order of cost)

  1. Add a counter to exit_syscall in src/ptrace.c to confirm syscall events keep flowing post-i2c-banner.
  2. Print every waitpid return + status in the wait loop — does the loop block on waitpid -1 indefinitely at some point?
  3. Also print every ptrace(PTRACE_SYSCALL, ...) return value — does it fail with ESRCH at some point?
  4. Try replacing PTRACE_ATTACH with PTRACE_SEIZE (changes the initial-stop and event semantics; closer to what strace does).
  5. Use tools/sns_init_probe.c (added in ptrace wait loop: signal forwarding, FORK/VFORK events, group-stop suppression #153) to load libsns_jxf22.so directly without Majestic — needs a uClibc-targeted toolchain matching the camera's ld-uClibc.so.0 exactly.
  6. Last resort: rebuild libsns_jxf22.so from OpenIPC/glutinium with extra printf instrumentation around each write() call, deploy, see whether the prints come through under ipctool but the writes don't.

Definition of done

The fix is complete when all of the following hold:

  • tools/capture_sensor.sh majestic --host openipc-hi3518ev200... --secs 25 produces a trace with at least 50 sensor_write_register lines (matching strace's 79-write count, modulo init incompleteness due to capture window).
  • The same trace, when fed through tools/trace_segment.py, produces a non-empty init phase with init_pattern set to a recognised family — or, if jxf22 uses a third pattern, a new entry in INIT_PATTERNS covers it.
  • tools/trace_to_driver.py emits a jxf22_linear_init (or similar) function that passes gcc -Wall -Wextra -fsyntax-only.
  • tools/trace_diff.py against OpenIPC/glutinium hi35xx_sensor_jxf22 (function sensor_linear_1080p30_init after collapsing the sensor_prog ROM table) reports ≥90% address match and ≥80% value match.
  • SC2315E regression diff stays at 100/100/100% against widgetii/smart_sc2315e.
  • CI test-extraction-pipeline job still passes; new test fixture or assertion added if appropriate.
  • Whatever ptrace-handling change made the difference is documented in docs/sensor-driver-extraction.md next to the existing "When the trace is empty anyway" section, with a one-paragraph explanation of why the previous code failed (so future researchers don't reintroduce the regression).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions