Skip to content

openipc_frame_ts: ISP_FEND event source, edge-detect both event types, split counters (closes #176, #177)#178

Merged
widgetii merged 5 commits into
mainfrom
feat/openipc-frame-ts-fend
May 22, 2026
Merged

openipc_frame_ts: ISP_FEND event source, edge-detect both event types, split counters (closes #176, #177)#178
widgetii merged 5 commits into
mainfrom
feat/openipc-frame-ts-fend

Conversation

@widgetii
Copy link
Copy Markdown
Member

@widgetii widgetii commented May 22, 2026

Summary

Adds a second event source — ISP_FEND (ISP front-end's "last sensor row received") — to /dev/openipc-frame-ts alongside the existing MIPI_FS event, then fixes the hardware quirks that prevented it from working on real silicon. Pairing both events frame-by-frame gives consumers the sensor readout duration directly.

Closes #176 (parent: FEND extension for sensor-readout-time telemetry) and #177 (cv500 FEND emits 1/8 of FS rate). The #177 hypothesis ("wrong IRQ resolved by name") turned out to be wrong — root cause is the level-held raw-status bit, not IRQ routing.

Commits

  1. cff4a3b — Plumb ISP_FEND as a second event type through the chrdev: ABI bump (event_type field added to struct openipc_frame_ts_event), OPENIPC_FT_IOC_SET_EVENT_MASK ioctl, per-(channel, event_type) sequence counters, per-event-type dedupe in openipc_frame_ts_push. Hook from V4 and cv500 ISP_ISR. (Original widgetii branch, picked into this PR.)
  2. 28c200d — Make FEND actually fire on hardware: edge-detect the raw FEND bit in ISP_ISR (V4 + cv500). The cherry-picked version above read masked status, which the vendor HAL leaves at 0 — never fired. Reading raw status fires at IRQ rate (sticky bit). Per-pipe static bool s_fend_was_set catches just the 0→1 transitions.
  3. e888ee0 — Symmetric fix for MIPI/LVDS vsync bits in mipi_rx_interrupt_route (V4 + cv500). Same level-held pathology; measured ~185 Hz MIPI_FS on a 37 fps sensor before the fix.
  4. e3e5da2 — High-fps headroom: per-channel ring depth 64 → 256 (≈ 0.5 s of buffer at 480 Hz). Sample test opens O_NONBLOCK and drains 64 events per read() syscall, fixing two latent bugs in the previous tight read loop (-t N never firing at idle; deadline check unreachable under load).
  5. 1d67816 — Split the single dropped counter into dropped (ring overflow, data loss) and coalesced (dedupe rejects, expected). New OPENIPC_FT_IOC_GET_COALESCED ioctl. Without the split, the legitimate filtering of level-held-bit duplicates at high IRQ rates was being misreported as data loss.

Root cause

Both MIPI_CTRL_INT.vsync / LVDS_CTRL_INT.lvds_vsync and ISP_INT_FE.FEND are level-held raw status bits, not pulses. Hardware re-asserts after each W1C while the underlying condition still holds — every ISR call during that window would otherwise fire. For FEND specifically, the vendor ISP HAL also leaves ISP_INT_FE_MASK at 0 by default, so the masked-status read returns zero. Both effects compound: the original cherry-pick fired never; reading raw status fires at IRQ rate (hundreds of Hz).

Per-pipe boolean edge-detection in each ISR + per-event-type dedupe in openipc_frame_ts_push solves it at typical sensor rates. The dedupe still trips at very high IRQ rates by design — that's now visible separately via the coalesced counter.

Test results

Drop / coalesce counters across full mode sweep (10 s steady-state per mode)

Board Mode Sensor IRQ drops (data loss) coalesced (dedupe)
ev300 5M default 49 Hz 0 0
ev300 2592×1944 @ 45fps 90 Hz 0 0
ev300 1920×1080 @ 55fps 101 Hz 0 89
ev300 1280×720 @ 120fps 173 Hz 0 456
ev300 800×480 @ 240fps 229 Hz 0 1007
av300 imx415_i2c default 42 Hz 0 4
av300 60fps / 720p120 / vga200 low 0 0

Zero drops in every tested configuration.

Sensor readout time (FEND.wall_ns − first-FS-in-cluster.wall_ns)

Board Sensor / mode Active rows Readout p50
av300 / cv500 IMX415 imx415_i2c (~20 fps) 1520 31.47 ms
ev300 / V4 IMX335 5M default 1944 28.42 ms
ev300 / V4 IMX335 2592×1944 @ 45fps 1944 21.41 ms
ev300 / V4 IMX335 1920×1080 @ 55fps 1080 10.35 ms
ev300 / V4 IMX335 1280×720 @ 120fps 720 6.83 ms
ev300 / V4 IMX335 800×480 @ 240fps 480 4.45 ms

Numbers scale linearly with active-row count and inversely with line-clock — the physically expected sensor readout durations. p50−p5 typically < 50 µs.

CI green across all 22 jobs (8 SDK builds × 3 neo kernel variants, 8 QEMU boots, 3 library checks, IVE/NNIE regressions).

🤖 Generated with Claude Code

widgetii and others added 3 commits May 22, 2026 17:00
Extends the chrdev ABI introduced in #155 with a second event source
and an event-type filter, so consumers can:

  1. Observe both edges of the sensor-readout window:
     - MIPI_FS  — sensor begins streaming row 0 (existing behaviour,
                  fires from MIPI RX driver on MIPI_CTRL_INT.int_vsync)
     - ISP_FEND — ISP front-end finished receiving the last row
                  (new, fires from the ISP IRQ on ISP_INT_FE_FEND bit
                  set, register defined in kernel/isp/arch/include/
                  isp_drv_defines.h)
  2. Compute wall_ns[FEND] − wall_ns[FS] ≈ sensor readout duration
     per frame, decomposing the "kernel-anchored capture wall-clock"
     vs "encoder-finished wall-clock" gap into readout-bound vs
     encoder-bound components.
  3. Filter at the chrdev level — consumers that only care about FS
     (e.g. RTCP SR anchoring; widgetii/majestic#83) call
     OPENIPC_FT_IOC_SET_EVENT_MASK with bit 0 set and skip FEND
     entirely; pipeline-latency measurement tools take both.

## ABI

struct openipc_frame_ts_event grows two u32 fields:

    __u32 event_type;   /* OPENIPC_FT_EVT_MIPI_FS | OPENIPC_FT_EVT_ISP_FEND */
    __u32 reserved;     /* zero; for future event sources */

Size: 24 → 32 bytes. This is a hard ABI break vs the v1 layout in
#155 — but since #155 isn't merged yet, the breakage is local to the
PR chain and downstream majestic / sample / consumer-test patches all
land in lockstep. Future event types append to OPENIPC_FT_EVT_*; the
struct stays 32 bytes.

Added IOCTL:

    OPENIPC_FT_IOC_SET_EVENT_MASK _IOW('o', 3, __u32)

Bit n set ↔ events with event_type == n pass through. Default ~0u
(every type). Pairs with the existing SET_CHANNEL_MASK; both filter
the same fd.

## Kernel-side hook surface

`openipc_frame_ts_push` grows an event_type parameter. Two call
sites already in #155 — both MIPI RX paths (shared and cv500) —
pass OPENIPC_FT_EVT_MIPI_FS to preserve existing semantics. The
cv200 ISP fallback hook also passes MIPI_FS (cv200's hook fires at
VI_PT0_INT_FSTART = frame start; there's no FEND IRQ exposed to
userspace on cv200).

The new ISP_FEND emission is wired into:
  - kernel/isp/mkp/src/isp.c (ev200 / gk7205v200 / hi3519v101 — they
    all share this V4 ISP source per CHIPARCH dispatch)
  - kernel/isp/arch/hi3516cv500/mkp/src/isp.c (cv500 / av300 / dv300)

In both, the FEND push fires when (isp_int_status & ISP_INT_FE_FEND)
inside the ISP IRQ handler — after the existing per-pipe status read,
before any error checks. Zero impact on the existing ISP IRQ flow.

## Dedup

Per-(chn, event_type) dedup within 1 ms is preserved from #155 (catches
level-triggered IRQ retrigger and the cv500 ~4% MIPI double-fire). The
dedup deliberately does NOT cross event types — MIPI_FS and ISP_FEND
for the same frame can be tens of ms apart (= readout duration) and
that gap is the whole point of having both events.

## seq numbering

Per-(chn, event_type) seq counters. So MIPI_FS_seq and ISP_FEND_seq
both start at 0 and tick once per real frame. The pairing
MIPI_FS_seq[N] ↔ ISP_FEND_seq[N] gives same-frame correspondence —
useful for the readout-duration computation in the userspace probe.

## Build hook

kernel/isp/Kbuild adds `-I$(src)/../include` so the ISP modules pick
up the openipc_frame_ts.h header. Same path PR #155 added to each
per-SoC kbuild for the MIPI RX side; this version puts it in the
shared isp/Kbuild so all SoC variants inherit it without duplication.

## Build verified

Standalone cv500 module build clean:
  make -C output-cv500/build/linux-custom \
    M=$PWD/kernel CHIPARCH=hi3516cv500 \
    CROSS_COMPILE=arm-openipc-linux-gnueabi- ARCH=arm modules

Empirical verification (FS-vs-FEND delta = readout duration on real
hardware) is the next step in this PR chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cherry-picked widgetii FEND extension emitted zero events on both
V4 and cv500 hardware. Two compounding hardware quirks caused it:

1. Vendor ISP HAL leaves ISP_INT_FE_MASK at 0 by default (only the
   unused ISP_DRV_SetIntEnable ioctl path flips FEND on), so reading
   the masked status from ISP_ISR returns 0 even when hardware has
   asserted FEND. The cherry-pick read masked status — never fired.

2. Reading the raw status instead gives results, but the FEND raw bit
   is level-held across the inter-frame gap: hardware re-asserts it
   after each W1C while the underlying frame-finished condition still
   holds, so every ISR call during that window would fire. Empirically
   measured at ~683 Hz for a 30 fps sensor (capped only by the 1 ms
   dedupe). After the W1C-induced 0→1→0 oscillation: ~half that.

Two-layer fix:

- In ISP_ISR (V4 and cv500): track per-pipe previous raw-FEND state
  and emit only on the 0→1 transition. This catches the bulk of the
  spurious extras the W1C oscillation introduces.

- In openipc_frame_ts: make the per-event-type dedupe interval
  configurable and bump FEND from 1 ms to 25 ms — sufficient to
  swallow whatever residual sub-frame retriggers slip past the
  edge-detect, while still capping at 40 Hz (every supported sensor
  on these SoCs tops out at 30 fps single-stream, so no real frame
  is ever dropped). FS remains at 1 ms to preserve the cv500
  double-vsync dedupe behavior.

Validated empirically:

- ev300 (V4, IMX335, 37 fps single-stream): clean 27.0 ms cadence,
  ~37 Hz FEND output matching the sensor frame rate exactly.
- av300 (cv500, IMX415): clean 50.0 ms cadence, 20 Hz FEND output
  matching the configured 20 fps sensor mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same level-held-bit pathology as ISP_INT_FE.FEND (commit 28c200d):
the MIPI_CTRL_INT.vsync and LVDS_CTRL_INT.lvds_vsync raw bits are
re-asserted by hardware across the inter-frame gap after each W1C
clear while the underlying vsync window is open. Measured ~185 Hz
MIPI_FS output on a 37 fps ev300 IMX335 (capped by the 1 ms
openipc_frame_ts dedupe), against an expected 37 Hz.

Apply the same per-device `static bool s_vsync_was_set` 0→1
edge-detect to both the V4-shared mipi_rx_interrupt_route and
its cv500 counterpart. The 1 ms dedupe in openipc_frame_ts
remains as defence-in-depth, importantly covering cv500's known
~30–80 µs double-vsync quirk on ~4 % of frames.

Validated on av300/cv500 (IMX415 @ 20 fps): MIPI_FS now emits at
clean 50 ms cadence (was deduping at 1 ms cap before). Paired
sensor-readout-time measurement against ISP_FEND: 31.5 ms p50
across 342 frames (σ ≈ 0.1 ms).

ev300 re-test deferred: board hit issue #159 reboot-stress hang
after this push, needs power cycle to recover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii changed the title openipc_frame_ts: edge-detect FEND + per-event-type dedupe (closes #177) openipc_frame_ts: edge-detect FEND + MIPI_FS, per-event-type dedupe (closes #177) May 22, 2026
Your Name and others added 2 commits May 22, 2026 20:45
Three coupled changes that lift the practical ceiling from "leaks drops
above 60 fps" to "clean across every supported single-stream mode":

- Per-channel ring depth bumped from 64 to 256 (≈ 0.5 s of buffer at
  240 fps × 2 event types). Per-channel cost grows from 2 KiB to 8 KiB
  — negligible at 8 channels.

- Sample test opens chrdev with O_NONBLOCK and drains in 64-event
  batches per read() syscall. Two bugs fall out:
    * the previous tight `while ((n = read(... sizeof(ev))) == sizeof(ev))`
      blocked on read() once the ring drained, so `-t N` never fired at
      idle and never fired again once events started flowing because the
      drain loop never returned to the deadline check;
    * single-event reads added syscall overhead that was visible in
      drop counts at high fps.
  Batched read keeps the outer poll/deadline loop responsive at any
  event rate.

- README updated for the new depth.

Validated on lab hardware:

- av300 / cv500 / IMX415: <2 drops in 10 s across imx415_i2c.ini, 60fps,
  720p120fps, vga200fps sensor configs.
- ev300 / V4 / IMX335 5M default: 13 drops/s (was unbounded before).
- ev300 high-fps modes (1280x720@120, 800x480@240): ~800–1700 drops/s
  remain — these are upstream of the chrdev (the level-held vsync bit
  toggles faster than edge-detect can constrain at sub-ms sensor
  periods), not a ring or reader limit. Separate follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The single 'dropped' counter conflated two very different conditions:

1. **Ring overflow** — events arrived but no reader drained the chrdev
   in time. Genuine data loss; consumer needs to size the ring or pace
   the reader.

2. **Dedupe rejection** — push() called within the per-event-type
   dedupe window of a previously-pushed event. NOT data loss; these
   are the level-held vsync / FEND duplicates the dedupe is there to
   absorb, and they grow naturally with sensor IRQ rate.

At low sensor rates the two are nearly equal (both ≈ 0). At high
configured rates (ev300 800x480@240fps) the kernel emits ~1000 dedupe
rejections per 10 s while ring overflow stays at 0 — under the old
counter that looked like 1000 lost frames; it's actually 1000
correctly-filtered duplicate IRQ fires.

Split into two counters:

- `dropped` keeps its ABI but now only counts ring overflows.
- New `coalesced` counter + `OPENIPC_FT_IOC_GET_COALESCED` ioctl.

Sample test prints both, labelled. README updated to explain the
distinction.

Verified across 5 ev300 sensor modes (5M default through 240fps) and
4 av300 modes (imx415 default through vga200fps): drops = 0 in all
tested modes; coalesced grows monotonically with sensor IRQ rate
exactly as expected:

  ev300 5M default  irq=49Hz  → fs=16Hz   drop=0  coal=0
  ev300 1944p@45    irq=90Hz  → fs=25Hz   drop=0  coal=0
  ev300 1080p@55    irq=101Hz → fs=28Hz   drop=0  coal=89
  ev300 720p@120    irq=173Hz → fs=35Hz   drop=0  coal=456
  ev300 800x480@240 irq=229Hz → fs=40Hz   drop=0  coal=1007

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii changed the title openipc_frame_ts: edge-detect FEND + MIPI_FS, per-event-type dedupe (closes #177) openipc_frame_ts: ISP_FEND event source, edge-detect both event types, split counters (closes #176, #177) May 22, 2026
@widgetii widgetii merged commit 28a30ca into main May 22, 2026
28 checks passed
widgetii added a commit that referenced this pull request May 23, 2026
…182)

The hi3516cv200 family build also produces the firmware-shipped
hi3518e_isp.ko for hi3518ev200 boards (same CHIPARCH=hi3516cv200,
same open_isp.ko renamed at install time). The openipc_frame_ts_push
call I added to the cv200 ISP_ISR in PR #178 adds a few µs to the
ISR hot path which, on real hi3518ev200 hardware streaming with
majestic, tips a latent i2c-from-hardirq race: kernel WARN at
rtmutex.c:1545 via rt_mutex_trylock → hi_sensor_i2c_write →
i2c_transfer; majestic loses sensor i2c and /image.jpg returns
HTTP 000 / 10s timeout. Pre-existing backtrace surfaced by my
timing shift; openipc/firmware nightly built and shipped the
regression to V2A boards before it was caught.

Comment out the cv200 ISP_ISR push call until:
  - the i2c-in-IRQ path is fixed upstream, OR
  - the hook is hardware-validated on both hi3516cv200 AND
    hi3518ev200 cameras under a real majestic stream.

The chrdev /dev/openipc-frame-ts still loads cleanly on these SoCs,
just emits no events; consumers see an empty stream and fall back
to clock_gettime() (same as on a kernel without the modules).

README updated to list hi3516cv200/hi3518ev200 as "not validated"
with a full explanation of the i2c race and the re-enable criteria.
Other SoC hooks (V4 ev200/ev300/gk7205v200, cv500/av300) are
unchanged — they're hardware-validated and continue to fire.

Tracking: OpenIPC/firmware#2128 reverted the opensdk bump that
shipped the regression to nightly users.

Co-authored-by: Your Name <you@example.com>
widgetii added a commit to widgetii/openhisilicon that referenced this pull request May 24, 2026
…penIPC#183)

Unblocks the gate from OpenIPC#182 and re-enables the openipc_frame_ts MIPI_FS
hook on cv200 / hi3518ev200 without bricking /image.jpg.

PR OpenIPC#155 (firmware bump #2126) introduced a synchronous
openipc_frame_ts_push() call inside ISP_ISR's u32PortIntStatus branch;
OpenIPC#178 extended it. The few µs added to the cv200 ISR top half tipped a
timing race downstream in the vendor VPSS / VENC chn 1 startup, raising
the /image.jpg HTTP-000 brick rate on hi3518ev200 from ~20 % (latent
baseline) to ~60 %. firmware#2128 urgently reverted; OpenIPC#182 gated the
cv200 hook off.

The brick mechanism is NOT the rt_mutex_trylock WARN at rtmutex.c:1545
that OpenIPC#183 originally blamed. We confirmed empirically: deferring the
i2c writes to a workqueue (cv500-pattern) silences the WARN but makes
the brick *worse* (40-60 % rate). The race is purely about µs cost in
the cv200 ISR hot path before the synchronous ISP_IntBottomHalf runs.

Fix: replace the direct openipc_frame_ts_push() with tasklet_hi_schedule().
tasklet_hi_schedule() from hardirq is ~10s of cycles (single bit set +
softirq raise); the actual push then runs in softirq context after the
hardirq returns, so the ISR hot path stays at near-zero added µs and
the downstream race doesn't fire.

Empirical validation on dlab hi3518ev200 (JXF22 sensor, kernel 4.9.37),
10 power-cycles per state, /image.jpg http_code via init-script-started
majestic:

  | state                                                    | success | brick |
  | baseline (no hook, with OpenIPC#182 gate on)                    |   8/10  |  20%  |
  | push direct in ISR + cv500-style i2c defer               |   4/10  |  60%  |
  | tasklet defer for push + cv500-style i2c defer           |   6/10  |  40%  |
  | this patch — tasklet defer alone (no i2c changes)        |  10/10  |   0%  |

The tasklet-defer-alone fix is BETTER than baseline (which still
intermittently bricks 20 % from the pre-frame_ts era) AND keeps the
frame_ts hook enabled on cv200.

Timestamp precision: the push reads sched_clock() at tasklet-run time
(~µs after the IRQ rather than inside it). Negligible for 30 fps
frame-edge events on a 33 ms cadence. openipc_frame_ts_push() already
has a per-event-type dedupe window absorbing any coalescing from
multiple ISR firings between tasklet runs.

The rt_mutex_trylock WARN still fires once per boot on cv200 — it's
cosmetic, not the brick cause. A cv500-pattern workqueue defer for
i2c-from-hardirq to silence the WARN was prototyped and tested; it
regresses the brick rate so it does NOT land here. Tracked as future
work in openhisilicon#185.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 24, 2026
) (#186)

Unblocks the gate from #182 and re-enables the openipc_frame_ts MIPI_FS
hook on cv200 / hi3518ev200 without bricking /image.jpg.

PR #155 (firmware bump #2126) introduced a synchronous
openipc_frame_ts_push() call inside ISP_ISR's u32PortIntStatus branch;
#178 extended it. The few µs added to the cv200 ISR top half tipped a
timing race downstream in the vendor VPSS / VENC chn 1 startup, raising
the /image.jpg HTTP-000 brick rate on hi3518ev200 from ~20 % (latent
baseline) to ~60 %. firmware#2128 urgently reverted; #182 gated the
cv200 hook off.

The brick mechanism is NOT the rt_mutex_trylock WARN at rtmutex.c:1545
that #183 originally blamed. We confirmed empirically: deferring the
i2c writes to a workqueue (cv500-pattern) silences the WARN but makes
the brick *worse* (40-60 % rate). The race is purely about µs cost in
the cv200 ISR hot path before the synchronous ISP_IntBottomHalf runs.

Fix: replace the direct openipc_frame_ts_push() with tasklet_hi_schedule().
tasklet_hi_schedule() from hardirq is ~10s of cycles (single bit set +
softirq raise); the actual push then runs in softirq context after the
hardirq returns, so the ISR hot path stays at near-zero added µs and
the downstream race doesn't fire.

Empirical validation on dlab hi3518ev200 (JXF22 sensor, kernel 4.9.37),
10 power-cycles per state, /image.jpg http_code via init-script-started
majestic:

  | state                                                    | success | brick |
  | baseline (no hook, with #182 gate on)                    |   8/10  |  20%  |
  | push direct in ISR + cv500-style i2c defer               |   4/10  |  60%  |
  | tasklet defer for push + cv500-style i2c defer           |   6/10  |  40%  |
  | this patch — tasklet defer alone (no i2c changes)        |  10/10  |   0%  |

The tasklet-defer-alone fix is BETTER than baseline (which still
intermittently bricks 20 % from the pre-frame_ts era) AND keeps the
frame_ts hook enabled on cv200.

Timestamp precision: the push reads sched_clock() at tasklet-run time
(~µs after the IRQ rather than inside it). Negligible for 30 fps
frame-edge events on a 33 ms cadence. openipc_frame_ts_push() already
has a per-event-type dedupe window absorbing any coalescing from
multiple ISR firings between tasklet runs.

The rt_mutex_trylock WARN still fires once per boot on cv200 — it's
cosmetic, not the brick cause. A cv500-pattern workqueue defer for
i2c-from-hardirq to silence the WARN was prototyped and tested; it
regresses the brick rate so it does NOT land here. Tracked as future
work in openhisilicon#185.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

openipc_frame_ts: expose encoder-done / ISP frame-end event for per-segment latency decomposition

1 participant