Skip to content

feat(lan): expand identification — v1.7.0#119

Merged
chenchaoyi merged 11 commits into
mainfrom
feature/expand-lan-identification
May 23, 2026
Merged

feat(lan): expand identification — v1.7.0#119
chenchaoyi merged 11 commits into
mainfrom
feature/expand-lan-identification

Conversation

@chenchaoyi
Copy link
Copy Markdown
Owner

Summary

Ships v1.7.0expand-lan-identification change. Layered identification stack over the existing ARP + ICMP + OUI + Bonjour LAN view, plus a scene-gated active-discovery layer (NBNS / SSDP / mDNS-meta), TTL fingerprint, device classifier, public-scene consent flow, and Bonjour ↔ LAN cross-references.

11 commits across four phases + audit follow-ups:

Commit Concern
265385a P1 — Multi-tier IEEE OUI registry (MA-L 24-bit / MA-M 28-bit / MA-S 36-bit, longest-prefix wins) + vendor normalization (NEW H3C TECHNOLOGIES CO., LTDNew H3C) + LANHost.vendor_raw preservation
c2a9436 P1 data — Populate MA-M / MA-S via Wireshark manuf fallback (IEEE CDN unreachable from CN); 57 211 vendor mappings bundled
9dcf57f Design — adopt Fing UX patterns (class-first column, camera / smart-home split, Model row)
c2d0495 P2 — Active discovery layer: NBNS Status Query / SSDP M-SEARCH / mDNS browse query, scene-gated (home/office/audit on, public off); DITING_LAN_PROBE + DITING_LAN_UPNP_FETCH env vars; new lan_probes.py
947b3cb P3 — TTL fingerprint (50-64 → unix, 100-128 → windows, 200-255 → router) + device-class classifier (lan_classify.py) with ~30 vendor + Bonjour-category rules
93ebf7d P4 — LAN row layout (Fing-inspired class column + [new] chip with poller-grace) + LANProbeConsentScreen modal (2-sec cooldown) + LANActiveProbeConsentedEvent JSONL event + Active discovery section in detail modal
97656cd Audit fixes — 7 findings: OUI tokenizer handles macOS stripped-zero MACs, events panel renders local time, [new] chip grace, multicast destination MACs filtered from ARP, classifier rule ordering, gateway TTL display
e8c0f3a Audit fix — Bonjour needles must match category strings (AirPlay audio, Apple Companion), not raw service-type names
965dd0a Audit fix — PS5 / PlayStation routed to gaming (was matching "sony" in the tv-vendor needle)
eee5f60 Cross-reference (Tier A + B) — Bonjour rows borrow OUI vendor from LAN side; Bonjour detail modal gains a LAN host section; Apple model code (Mac14,2) extracted from Bonjour TXT
d6cc2ea Tablet + safety — Add tablet class (12 classes total); iPads route to tablet; pull model code from _companion-link rpMd and _raop am TXT keys (NOT from user-renameable bonjour_name)

Headline outcomes on real CN home network

  • Previously (unknown) LAN rows now resolve: Hikvision cameras, Tuya / Aqara / Imilab smart-home, multiple Apple devices with stripped-zero MACs.
  • HomePods → speaker, MacBooks → laptop, iPads → tablet, PS5 → gaming (all previously mis-classified at different points).
  • Detail modal Model row shows MacBook Air 13-inch (M2, 2022) (Mac14,2) for known Apple model codes (Fing-level identification, via the existing _APPLE_MODELS table now wired to LAN side).
  • Events panel timestamps in local clock (was 8h offset on this user's machine).
  • [new] chip suppressed on initial-sweep hosts via 5-min grace (was firing on every row for 24h after first LAN view entry).
  • Multicast destination MACs (01:00:5e:*, 33:33:*) filtered out of the panel.

Audit-tool stance

Classifier uses authoritative signals only — vendor OUI, Bonjour TXT model codes written by the device's mDNS daemon, scene-gated probe results, ICMP TTL. Never bonjour_name or reverse-DNS hostname (both user-controllable; would be spoofing surfaces). Three adversarial tests prove this:

  • test_renamed_homepod_to_macbook_still_classifies_correctly — HomeKit-bearing host renamed to "MacBook" stays speaker
  • test_bonjour_name_ipad_pattern_does_NOT_signal_tablet — name says iPad, no model code → falls to phone (not guessed)
  • test_apple_model_code_still_wins_over_misleading_name — authoritative > misleading

Public-scene safety

Active probing is scene-gated. public defaults to passive. Override via uppercase P opens LANProbeConsentScreen with packet enumeration + 2-second cooldown + JSONL audit event. One-shot — re-confirm every time, no sticky state.

Spec deltas

lan-inventory, scenes, events, event-log, tui-shell, i18n, cli.

Deferred

project-shared-host-registry — promote pairwise LAN↔Bonjour enrichment to a shared registry. Recorded in design.md + memory for the next change in this area.

Test plan

  • uv run pytest — 1047/1047 pass
  • uv run python scripts/tui_snapshot.py --mode regression — all scenarios pass
  • openspec validate expand-lan-identification --strict — valid
  • openspec validate --specs --strict — 22/22 canonical specs valid
  • Real-environment audit on author's home network — verified across 30+ hosts, 11 device classes
  • CI green
  • Post-merge: archive OpenSpec change (apply deltas to canonical specs)
  • Post-archive: tag v1.7.0

🤖 Generated with Claude Code

chenchaoyi and others added 11 commits May 23, 2026 11:50
…-lan-identification)

Phase 1 of the expand-lan-identification OpenSpec change. Adds the
passive-only enrichments that strengthen LAN host identification
without any new wire-protocol behaviour.

- Multi-tier IEEE OUI lookup (MA-L 24-bit → MA-M 28-bit → MA-S
  36-bit), longest prefix wins. `load_ouis_layered()` returns the
  three dicts; `lookup_oui_vendor` accepts both the legacy single-
  tier signature (back-compat) and the new layered kwargs form.
- `scripts/refresh_ouis.py` extended to fetch all three IEEE
  registries and partition by the Registry CSV column. MA-M / MA-S
  output paths added.
- `_normalize_vendor()` in lan.py: strips trailing corporate-form
  noise (CO., LTD, CORPORATION, INC, TECHNOLOGIES …), strips
  leading Chinese-city prefixes (SHENZHEN, HANGZHOU, BEIJING …),
  titlecases while preserving `_ACRONYM_OVERRIDES` (HP, IBM, H3C,
  TP-Link, ASUS, …), truncates to 16-cell column width.
- `LANHost.vendor` is now the normalized display form; new
  `LANHost.vendor_raw` preserves the raw IEEE registry string.
- `LANDetailScreen` surfaces the raw IEEE string on a dim
  continuation line when normalization changed the name, so the
  user can reconcile odd cases.
- 35 new tests across `test_oui_multitier.py`, `test_vendor_normalize.py`,
  `test_lan.py` (vendor_raw integration), `test_tui_helpers.py`
  (continuation-line behaviour). Full suite 895/895 passes;
  regression snapshot passes; openspec validate --strict passes.

Bundled OUI data state: MA-L stays at the existing 2026-05-19
freshness (the IEEE CDN was unreachable from the build host at
implementation time); MA-M / MA-S ship as `_meta`-only stubs so
the graceful-degradation path is exercised at runtime. A `uv run
python scripts/refresh_ouis.py` from a network with IEEE access
will populate all three.

Phases 2–4 (active discovery, heuristics, UX) land in follow-up
commits on the same branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the Phase-1 caveat about empty MA-M / MA-S stubs.

The IEEE Registration Authority CDN (standards-oui.ieee.org) is
consistently unreachable from CN networks — every TLS handshake
ends mid-flight with `SSL_ERROR_SYSCALL` / `UNEXPECTED_EOF_WHILE_READING`,
on both Python urllib and macOS curl, on 8+ retries. Not a
transient. The Wireshark project's `manuf` file at
`https://www.wireshark.org/download/automated/data/manuf` is a
community-maintained mirror of the same IEEE OUI data, regenerated
regularly, exposes all three tiers in one file via `/28` / `/36`
prefix-bit annotation, and reaches CN networks reliably.

- `scripts/refresh_ouis.py` now supports `--source ieee|wireshark|auto`
  (default `auto`: IEEE direct first, Wireshark fallback on failure).
  Also `--manuf-file <path>` for offline re-ingest.
- New `parse_wireshark_manuf()` partitions the single `manuf` file
  back into the three-tier shape `_key_for_assignment` already
  emits. Wireshark column 3 carries the IEEE vendor string verbatim.
- `write_ouis()` gains `source_override` / `source_url_override`
  kwargs so the resulting `_meta.source` line records which
  upstream was actually used.
- Bundled data now populated:
    MA-L: 39,223 entries (was 39,445 — minor IEEE-vs-Wireshark
          dedup differences; Apple / Cisco / etc. all present)
    MA-M:  6,404 entries (was 0 — stub)
    MA-S: 11,584 entries (was 0 — stub)
- `test_network.py::test_load_wifi_ouis_ships_full_ieee_registry`
  loosened to case-insensitive substring assertions on vendor
  strings; Wireshark titlecases where IEEE direct all-caps, both
  forms normalize to the same display via `_normalize_vendor`.
- New tests `test_oui_refresh_script_parses_wireshark_manuf_all_three_tiers`
  and `test_oui_refresh_script_wireshark_manuf_skips_unknown_widths`.

895/895 tests pass. Graceful-degradation path (`empty MA-M / MA-S`)
still covered by `test_oui_multitier.py::test_lookup_falls_back_to_*`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t-home split, Model row

Reviewed Fing Desktop 4.0 as a UX benchmark against the existing
expand-lan-identification design. Adjustments before Phase 2/3/4
implementation begins, so the design and spec deltas record the
final intent.

design.md changes:
- D10 device-class vocabulary: drop `iot` (too coarse) in favour
  of `camera` (Hikvision / Dahua / Axis / Tapo / Imou) and
  `smart-home` (Tuya / Xiaomi / Aqara / Mijia). Fing's `IP Camera`
  vs `Smart Device` taxonomy makes "how many cameras are silently
  on my Wi-Fi" answerable, which the broad `iot` did not.
- D13 column ordering: class moves to the leftmost data column
  (before vendor). Fing's leftmost column is Type — same insight,
  applied to our row layout. Final layout: `[new] class vendor
  name IP MAC last_seen`.
- New D14 Fing UX reference section: records the benchmark
  patterns adopted (type-first, multi-protocol identification,
  class granularity, Model in detail view) and the ones rejected
  (icons, sidebar nav, People view, active TCP probing, filter
  dropdowns, status pill).

lan-inventory spec:
- device_class vocabulary updated everywhere it appears.
- New scenarios: Hikvision/Dahua/Axis/Tapo/Imou vendor signals
  `camera`; Tuya/Xiaomi/Aqara/Mijia vendor signals `smart-home`.

tui-shell spec:
- Detail modal Identity section gains a `Model:` row sourced from
  `upnp_model` with `upnp_friendly_name` fallback.
- Class column position moved to leftmost data column; row layout
  table added to the requirement text.
- New scenarios: camera row, smart-home row.

i18n spec:
- Drop the `iot` row, add `camera` (`摄像头`) and `smart-home`
  (`智能家居`) entries.
- Add `Model:` modal label (`型号:`).

openspec validate --strict passes for both the change and the 22
canonical specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ted (P2)

Phase 2 of expand-lan-identification. Adds the on-the-wire
enrichment layer that the passive ICMP+ARP poller depends on for
identifying smart-home devices (Hikvision / Tuya / Hisense / TP-Link
/ NAS / Windows hosts) that publish no Bonjour and no reverse DNS.

New module `src/diting/lan_probes.py`:
- `encode_nbns_status_query(txn_id)` — RFC 1002 §4.2.18 wildcard
  NBSTAT query, 50 bytes flat. Pure function.
- `parse_nbns_status_response(data)` → `[NBNSNameEntry]` — parses
  the name table; tolerates compressed pointers (0xC00C) and
  length-prefixed answer names; truncated / malformed data yields
  `[]` rather than raising. `workstation_name` picks the unique
  `0x00`-suffix entry.
- `probe_nbns(ips)` — bounded 30-way concurrency, 100ms per host;
  returns `{ip: name_or_None}`.
- `SSDP_MSEARCH_PACKET` byte template + `parse_ssdp_response(data, ip)`
  → `SSDPResponse | None` (rejects non-200; tolerates malformed).
- `probe_ssdp()` — single multicast to 239.255.255.250:1900, 3 s
  listen window, dedups by source IP.
- `parse_upnp_location_xml(xml)` → `(friendly_name, model_name)`
  via stdlib ElementTree with no external-entity resolution.
- `fetch_upnp_location(url)` async wrapper around urllib GET capped
  at 500ms / 4KB; swallows all URLError / OSError / TimeoutError.
- `resolve_lan_active_probe(env, scene_default)` / `resolve_upnp_fetch_enabled(env)`
  — env var → bool resolution; invalid values fall through to the
  default.

`scene.py`:
- `scene_defaults()` gains `lan_active_probe` — True for home /
  office / audit, False for public. Documented in the docstring.

`lan.py`:
- `LANInventoryPoller.__init__` gains `active_probe_enabled` and
  `upnp_fetch_enabled` kwargs; new `_one_shot_probe_armed` flag.
- `LANHost` gains `nbns_name`, `upnp_server`, `upnp_friendly_name`,
  `upnp_model` (all default None).
- `_do_sweep_and_emit` calls `_run_active_probes` when enabled or
  one-shot armed; clears the one-shot flag after the sweep.
- `_run_active_probes` runs NBNS + SSDP + mDNS-meta concurrently
  via `asyncio.gather`; each phase fail-soft on exception.
- `_apply_probe_results` merges enrichments into `_state` keyed
  by IP, preserving prior values when the new value is None
  (silent host doesn't clobber a previously-captured name).

`mdns.py`:
- `BonjourPoller.send_meta_query()` sends one PTR for
  `_services._dns-sd._meta._tcp.local.`; returns True/False;
  swallows zeroconf internals exceptions.

`cli.py` + `tui.py`:
- DitingApp `__init__` accepts `lan_active_probe` + `lan_upnp_fetch`
  kwargs; threads them through to `LANInventoryPoller`.
- CLI resolves both env vars at startup; `_resolve_lan_active_probe_with_warning`
  prints a stderr warning when the env value is non-empty and
  outside `0`/`1`, then falls through to the scene default.
- `--help` documents both env vars under global options.

Tests (45 new, 942/942 pass):
- `test_lan_probes.py` — 30 tests covering NBNS encode/parse,
  SSDP packet shape, SSDP response parse, UPnP XML parse
  (including external-entity DOCTYPE), async fetch wrapper
  fail-soft, env var resolution.
- `test_scene.py` — 3 new tests for `lan_active_probe` per scene.
- `test_lan.py` — 8 new tests for `_apply_probe_results` /
  `_run_active_probes` exception swallow / `_one_shot_probe_armed`
  consumption.
- `test_mdns.py` — 3 new tests for `send_meta_query`.

TESTING.md (EN + ZH) updated with 12 new coverage rows.

Validates: openspec validate expand-lan-identification --strict ✓
           openspec validate --specs --strict (22/22) ✓
           regression snapshot ✓
           pytest 942/942 ✓

Phase 3 (TTL fingerprint + classifier) and Phase 4 (UX: chip,
class column, consent modal) follow on the same branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of expand-lan-identification. Adds two pure read-side
heuristics over the fields populated by Phases 1+2:

1. **TTL fingerprint** — `_ping_one` now also parses `ttl=N` from
   ping stdout; `LANHost.ttl` carries the raw value and
   `LANHost.ttl_class` carries the coarse OS-family bucket
   (`unix` = 50-64, `windows` = 100-128, `router` = 200-255, None
   otherwise). Same packet, zero additional traffic.
2. **Device-class inference** — new module `src/diting/lan_classify.py`
   with a documented rules table consuming (vendor_raw,
   bonjour_services, nbns_name, upnp_server, upnp_friendly_name,
   ttl_class, is_gateway). Returns one of the documented class
   strings (`phone | laptop | desktop | tv | camera | smart-home
   | printer | nas | gaming | speaker | router`) or None.
   Pure function; total over input; never raises.

`lan.py`:
- `_ping_one` return shape `(reachable, rtt_ms)` → `(reachable,
  rtt_ms, ttl)`. `_sweep` updated accordingly. New
  `_unpack_sweep_entry` helper tolerates both 2-tuple (legacy
  test fixtures) and 3-tuple shapes so the migration is
  transparent.
- `ttl_class_for(ttl)` helper exposed module-level.
- `LANHost` gains `ttl`, `ttl_class`, `device_class` fields (all
  default None).
- `_merge_arp_into_state` populates the TTL fields from the
  sweep result; preserves TTL across silent ticks; runs
  `classify()` on every constructed LANHost.
- `_apply_probe_results` re-runs `classify()` after merging the
  active-discovery enrichments, so the `tv` / `camera` rules
  that depend on `upnp_server` / `upnp_friendly_name` fire after
  the SSDP phase lands.

`tui.py` + `i18n.py`:
- `LANDetailScreen._render_body` renders a `Class:` row in the
  Identity section when `device_class` is non-None, and a `TTL:`
  row in the Network section (formatted as `<value> (<class>)`
  when ttl_class is known, raw value otherwise) when ttl is
  non-None.
- i18n catalog gains EN keys + ZH values for `Class`, `TTL`, the
  11 class strings (`phone` / `laptop` / `desktop` / `tv` /
  `camera` / `smart-home` / `printer` / `nas` / `gaming` /
  `speaker` / `router`), and the 2 TTL-class strings (`unix` /
  `windows`). Class values pass through `t()` at render time so
  the JSONL stream carries the EN tokens.

Classifier rule highlights:
- Gateway always wins `router` regardless of vendor.
- AirPrint / IPP / LPD Bonjour → printer; printer-vendor → printer.
- UPnP SmartTV / Hisense / Samsung / WebOS / Tizen server header →
  tv; AirPlay + GoogleCast Bonjour → tv; Hisense / LG / Sony /
  TCL / Skyworth / Konka / Vizio / Roku vendor → tv.
- Hikvision / Dahua / Axis / Tapo / Imou / Reolink / EZVIZ /
  Amcrest / Uniview vendor → camera; "Hikvision-Webs" server
  header → camera.
- SMB / AFP / NFS / `_adisk` Bonjour → nas; Synology / QNAP / WD /
  Drobo / Asustor / TerraMaster vendor → nas.
- `_companion-link` / `_apple-mobdev2` Bonjour → phone.
- Sonos / Bose / Harman / JBL / Anker vendor + `_spotify-connect`
  Bonjour → speaker.
- Nintendo / Sony Interactive vendor → gaming.
- TP-Link / Asus / Netgear / Linksys / Ubiquiti / Mikrotik / H3C /
  Huawei / Ruijie / OpenWrt vendor → router.
- Tuya / Xiaomi / Aqara / Mijia / Lumi / Espressif / Imilab
  vendor → smart-home.
- Windows TTL fallback → desktop (weakest, last).

Tests (48 new, 990/990 pass total):
- `test_device_class.py` — 29 tests covering every class branch
  + None fallback + pure-function safety (rogue predicate skip).
- `test_lan.py` — 14 new tests: `_unpack_sweep_entry` shape
  tolerance, `ttl_class_for` buckets, LANHost TTL population +
  silent-tick preservation, classifier wired into merge + probe
  re-classify path.
- `test_tui_helpers.py` — 5 new tests for Class / TTL row
  rendering (present + omitted variants).

TESTING.md (EN + ZH) updated with 8 new coverage rows.

Validates: openspec validate expand-lan-identification --strict ✓
           openspec validate --specs --strict (22/22) ✓
           regression snapshot ✓
           pytest 990/990 ✓

Phase 4 (UX: [new] chip, class column on LAN row, public-scene
consent modal, README + CHANGELOG) follows on the same branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t modal (P4 v1.7.0)

Phase 4 of expand-lan-identification — final layer. Ties Phases
1-3 together into the user-facing surface.

Event + logger:
- `LANActiveProbeConsentedEvent` dataclass in events.py with
  `timestamp` / `scene` / `ssid` / `nbns_packets` / `ssdp_packets`
  / `mdns_packets`. Audit-only — never emitted for scene-default
  or env-forced probing.
- `EventLogger.emit_lan_active_probe_consented` serializes one
  JSONL line with stable type `lan_active_probe_consented`;
  omits `ssid` when None; no-op when sink is None.

LAN row layout (Phase 4 / Fing UX benchmark):
- New `_COL_LAN_CLASS = 8` slot for the device-class column.
  Layout: `[new]  ★  class  vendor  name  IP  MAC  last_seen`
  — class placed LEFTMOST of the data columns per Fing's Type-
  first convention (it disambiguates faster than vendor —
  H3C OUI can be router / AP / switch / IoT bridge).
- `[new]` chip in dim cyan when `(now - first_seen) < 24 h`;
  self / gateway never carry the chip.
- `_lan_header_line` updated with new `class` column header
  before `vendor`.

LANProbeConsentScreen modal:
- Modal centered with heavy-bordered $warning box, ~78 cells wide.
- Body: scene + SSID header (`(disassociated)` when SSID is None),
  packet enumeration (NBNS 137 unicast / SSDP 1900 multicast /
  mDNS 5353 multicast), three-line consequences statement,
  one-shot disclaimer.
- Footer: `[esc cancel]   [wait 2s]` during 2-second cooldown,
  flips to `[y probe now]` after — uses Textual's `set_timer`
  to refresh.
- `action_confirm` is a silent no-op during cooldown. After
  cooldown: hands off to `App._consent_one_shot_lan_probe`
  which logs the JSONL event, arms `_one_shot_probe_armed`,
  calls `force_now()`, refreshes subtitle so the `[probing]`
  chip lights up.

`P` keybinding (uppercase, hidden from footer) — three gates:
must be on the LAN view, scene must be `public`, and
`DITING_LAN_PROBE` must not have forced probing on. Outside any
of those, the key is a silent no-op (no point opening the modal
where it can't change anything).

`[probing]` subtitle chip:
- Added to `_build_subtitle` when `_one_shot_probe_armed=True`.
- Cleared automatically when the consumer task receives the
  resulting `LANInventoryUpdate` — the poller clears the flag
  inside `_do_sweep_and_emit` before yielding, so by the time
  the task refreshes the subtitle, the chip is gone.

i18n catalog:
- `[new]` / `[probing]` chip strings (EN + ZH).
- Full consent modal copy (EN + ZH): `Active LAN probing`,
  `Scene:`, `Network:`, `(disassociated)`, packet enumeration
  preamble, three consequence bullets, one-shot disclaimer,
  footer button labels.
- `class` column header.

Help text (`?` modal):
- New `P` binding entry under Bindings.
- New "LAN view" section describing the multi-tier OUI,
  enrichment stack, scene-gated probing, `DITING_LAN_*` env
  vars, and the uppercase-P consent flow.

Docs:
- `README.md` + `docs/zh/README.md` gain a `## LAN
  identification` section: multi-tier OUI, enrichment stack,
  scene-gating matrix, ASCII mock of the consent modal.
- `CHANGELOG.md` + `docs/zh/CHANGELOG.md` get a v1.7.0 entry
  summarising all four phases.
- `tests/TESTING.md` + `docs/zh/TESTING.md` gain four new
  coverage rows for Phase 4.

Version: `pyproject.toml` 1.6.0 → 1.7.0 (minor — new CLI env
vars, new keybinding, new JSONL event type, new bundled data
files; no breaking changes).

Tests (15 new, 1005/1005 pass total):
- `test_events.py` — 4 new tests for the consent event
  dataclass + EventLogger emit method + ssid omission +
  None-sink no-op.
- `test_tui_helpers.py` — 11 new tests for LAN row class
  column position, `[new]` chip presence/absence, header
  ordering, consent modal body contents, footer cooldown
  state, cooldown press-through no-op.

Regression snapshot: passes.
openspec validate expand-lan-identification --strict: passes.
openspec validate --specs --strict: 22/22 passes.

End of `expand-lan-identification` change. Ready for archive +
PR review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….7.0

Six fixes from the 2026-05-23 audit against the developer's real
home network. Each is independently validated; bundled into one
commit because they all sit in the change-area and ship together
in the v1.7.0 PR.

1. OUI lookup mis-keyed MACs with stripped leading zeros
   (`src/diting/ble.py`)

   macOS `arp -an` strips leading zeros per octet
   (`24:f:9b:29:c:56`, `a0:92:8:f6:4b:e2`). The old tokenizer
   concatenated-and-sliced `cleaned[0:6]`, mis-aligning whenever
   any of the first three octets was one hex char. On the
   developer's live ARP cache this affected 10 of ~50 hosts; 5
   of those 10 (a Hikvision camera, 3 Apple devices, an HP
   printer) silently rendered `(unknown)` for a vendor that IS
   in the bundled `wifi_ouis.json`.

   New `_split_mac_octets()` splits on colons / dashes, pads
   each octet with `.zfill(2)`, then composes the lookup keys
   from the padded form. Handles colon-separated, dash-
   separated, and no-separator forms. Pre-existing bug — the
   single-tier lookup shipped with it; Phase 1 inherited it.

2. Multicast destination MACs leaked into the LAN panel
   (`src/diting/lan.py`)

   The kernel ARP cache picks up `01:00:5e:*` (IPv4 multicast)
   and `33:33:*` (IPv6 multicast) destination MACs as a side
   effect of any UDP send to a multicast group — diting's own
   SSDP M-SEARCH triggers `01:00:5e:7f:ff:fa` (239.255.255.250)
   and mDNS triggers `01:00:5e:00:00:fb` (224.0.0.251). They
   showed up as ghost rows with vendor=None, class=None,
   never-reachable.

   `_is_multicast_dest_mac()` checks both ranges (with
   zero-padding so the stripped-zero arp form matches);
   `_read_arp_cache()` filters them out. Two were visible in
   the audit capture and disappear after the fix.

3. Events panel rendered UTC timestamps instead of local time
   (`src/diting/tui.py:_ev_ts`)

   Event constructors use `datetime.now(timezone.utc)`. The
   `_ev_ts` helper called `.strftime("%H:%M:%S")` without
   `.astimezone()`, so a 16:19 Beijing-local event showed as
   `08:19` in the events modal — exactly the 8 h CN offset
   from UTC. The JSONL `_iso` helper in `event_log.py` already
   does the right thing; only the TUI helper missed.

   Added `.astimezone()`, made `_ev_ts` the single point of
   truth, replaced the 5 inline `event.timestamp.strftime(...)`
   call sites with `_ev_ts(event)`. Pre-existing bug —
   user-flagged live during the audit.

4. Classifier mis-classified HomePod + iPad + iPhone as `tv`
   (`src/diting/lan_classify.py`)

   AirPlay alone is too weak a signal — HomePods publish
   AirPlay + `_raop`, iPads publish AirPlay + `_companion-link`,
   Apple TVs publish AirPlay (sometimes + `_companion-link` for
   pairing). The rules table had `airplay → tv` first, so
   everything-with-airplay landed in tv.

   Reordered:
   - Speaker rule (`_raop`) moves BEFORE tv → HomePod ✓
   - Strong-TV signals (`googlecast`, `_androidtvremote2`) keep
     their direct tv match
   - Standalone `airplay → tv` now requires absence of phone
     companion signal — Apple TV (airplay only, sometimes
     companion-link) still tv; iPad / iPhone (airplay +
     companion-link) falls through to the phone rule ✓
   - User flagged HomePod live during the audit; iPad serial
     `L19L6JC6Q2` was also in the captured frame.

5. LAN detail modal was missing the spec-mandated Active
   discovery section + Model row
   (`src/diting/tui.py:LANDetailScreen._render_body`)

   Phase 4 spec required the modal to surface NBNS / UPnP
   server / friendly name / model. Implementation gap — the
   spec, TESTING.md, and tests covered it, but the actual
   render code jumped from `Bonjour services` → `Activity`
   with no Active discovery section in between.

   Added:
   - `Model:` row in Identity (prefers `upnp_model`, falls
     back to `upnp_friendly_name`).
   - `Active discovery` section header + rows for NBNS / UPnP
     server / friendly name / model. `(not probed)` placeholder
     when none of the four fields is set.
   - i18n entries (EN + ZH) for the new labels.

6. `[new]` chip fired on every LAN row for 24 h after first
   LAN-view entry (`src/diting/lan.py`, `src/diting/tui.py`)

   LAN poller is lazy-constructed on first `n`-cycle to the
   LAN view; at that moment `first_seen=now` for every host
   in the kernel ARP cache. The chip predicate
   `(now - first_seen) < 24 h` was then unconditionally true
   for every host, making `[new]` universal noise on first
   launch.

   Added a 5-minute grace anchored to the poller's
   `_constructed_at`. Hosts whose first_seen lands within the
   grace are session baseline — chip is suppressed. Hosts that
   join later (truly new devices) still trip the chip.
   `_lan_row_line` gained an optional `chip_anchor` kwarg;
   `LANPanel.update_hosts` threads it through; the App reads
   `_lan_inventory_poller._constructed_at`. Back-compat
   preserved: calls without `chip_anchor` retain the 24-h-only
   behavior.

7. TTL row showed `(windows)` class for gateways
   (`src/diting/tui.py:LANDetailScreen._render_body`)

   CN consumer routers (H3C / Huawei / some TP-Link firmwares)
   ship with TTL=128. The class heuristic correctly maps it to
   "windows", but rendering `TTL 128 (windows)` on a router is
   misleading. The classifier already gives those rows
   `class=router` via the `is_gateway` rule, so the parenthesised
   TTL class label adds confusion without signal.

   Suppress the class label for `is_gateway=True` rows only;
   non-gateway rows still show it as a useful OS-family hint.

Tests: 27 new tests across `test_oui_multitier.py`,
`test_lan.py`, `test_device_class.py`, `test_tui_helpers.py`.
Full suite 1027/1027 passes; regression snapshot passes;
`openspec validate --strict` passes for the change and all 22
canonical specs.

TESTING.md (EN + ZH) extended with 7 new coverage rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aw service types

Follow-up to 97656cd. The 2026-05-23 re-audit showed Apple HomePods
(Blue-Pod / Red-Pod / Yellow-Pod in the user's home network) were
still being classified as `phone` instead of `speaker` even though
my prior fix moved the speaker rule above tv and added the
`_raop` needle.

Root cause: the classifier's Bonjour rules used **raw service-type
strings** (`_raop`, `_companion-link`, `_spotify-connect`, `googlecast`,
`smb`, `_adisk`, `airprint`, `ipp`) as substring needles. But
`LANHost.bonjour_services` actually stores the **human-readable
category names** the mdns module derives from
`src/diting/data/bonjour_services.json`:

  _raop._tcp.local.            → "AirPlay audio"
  _companion-link._tcp.local.  → "Apple Companion"
  _ipp._tcp.local.             → "Printer"
  _smb._tcp.local.             → "File share"
  _googlecast._tcp.local.      → "Chromecast"
  …

So every Bonjour-based classifier rule was silently dead code.
HomePods, iPads, printers, NAS units — anything whose class
depended on a Bonjour signal — fell through to whatever rule
landed later (often vendor-based; for HomePods that meant the
"Apple Companion" → phone fallback).

The tests passed because they used the same wrong-format needles
(`("_raop",)`, `("smb", "_adisk")`) — self-consistent but
inconsistent with real Bonjour data flowing through the live
poller. The audit caught it because the actual category strings
came through in real-environment captures and didn't match.

Real-data HomePod signature observed in the user's home network:
`AirPlay + AirPlay audio + Apple Companion + HomeKit`. The
"AirPlay audio" category (from _raop._tcp) is the speaker-
specific signal that distinguishes a HomePod from an iPad
(both publish AirPlay + Apple Companion).

Changes (`src/diting/lan_classify.py`):

- Rewrote all Bonjour needle tuples as named module-level
  constants for clarity:
    _BONJOUR_SPEAKER_NEEDLES = ("airplay audio", "sonos")
    _BONJOUR_PHONE_NEEDLES   = ("apple companion",)
    _BONJOUR_PRINTER_NEEDLES = ("printer",)
    _BONJOUR_NAS_NEEDLES     = ("file share",)
    _BONJOUR_TV_NEEDLES      = ("chromecast",)
- Added a long header comment over `_RULES` documenting the
  service-type → category mapping for future maintainers.

Tests (`tests/test_device_class.py`):

- Updated 5 existing tests that used raw service-type needles:
    test_airprint_bonjour_signals_printer    (AirPrint/IPP → Printer)
    test_smb_bonjour_signals_nas             (smb/_adisk → File share)
    test_sonos_bonjour_signals_speaker       (_spotify-connect → Sonos)
    test_apple_companion_signals_phone       (_companion-link → Apple Companion)
    test_ipad_airplay_plus_companion_signals_phone_not_tv  (real categories)
- Replaced test_homepod_airplay_plus_raop_signals_speaker_not_tv
  with test_homepod_airplay_audio_signals_speaker_not_tv using
  the actual category strings.
- Added test_homepod_full_apple_signature_signals_speaker_not_phone
  using the live-data signature observed in the audit:
  `("AirPlay", "AirPlay audio", "Apple Companion", "HomeKit")`
  → speaker.

TESTING.md (EN + ZH) updated to explicitly call out the
needle-convention contract — needles must match the category
strings produced by mdns, never the raw service-type names.

Full suite 1028/1028 passes. openspec validate --strict ✓ for
the change and 22/22 canonical specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sony Interactive Entertainment Inc. (the PlayStation vendor)
and Sony Corporation (the Bravia / TV vendor) are separate IEEE
registrants with separate OUIs. `_TV_VENDOR_NEEDLES` used to be
`"sony"`, which is a substring of both — so a PS5's vendor
`"Sony Interactive Entertainment Inc."` matched the tv rule
first and never reached the gaming rule with the matching
`"sony interactive entertainment"` needle.

User flagged on 2026-05-23 with a screenshot of their PS5 Pro
sitting in 192.168.124.210 classified as `电视` (tv) in the
ZH LAN detail modal.

Narrowed the needle from `"sony"` to `"sony corporation"`:
- Sony Bravia TVs (registrant "Sony Corporation") still match
  the tv rule.
- PS5 / PS4 (registrant "Sony Interactive Entertainment Inc.")
  fall through to the gaming rule via the existing
  `"sony interactive entertainment"` needle.

Two new tests in `test_device_class.py`:
- `test_sony_interactive_entertainment_signals_gaming_not_tv`
- `test_sony_corporation_still_signals_tv`

Same root cause as the earlier `airplay → tv` mis-class (too
broad a needle wins over a more specific later rule). General
lesson noted in the rules table comments — vendor needles
should be the IEEE registrant's full name fragment, not a
brand-family abbreviation that collides.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ification

Three concerns landed together because they share infrastructure:

1. **Tier A** — Bonjour rows fall back to LAN OUI vendor when their
   own name-pattern + service-hint resolver returned None.
2. **Tier B** — Bonjour detail modal gains a `LAN host` section
   surfacing MAC / OUI vendor / device class / TTL / NBNS / UPnP
   for the LAN row at the same IP.
3. **Apple model code identification** — Bonjour TXT records
   carrying `model=Mac14,2` etc. flow through to `LANHost.bonjour_model`,
   drive a high-priority classifier rule, and render in the LAN
   detail modal's Identity Model row as
   `MacBook Air 13-inch (M2, 2022) (Mac14,2)` via the existing
   `_APPLE_MODELS` table in `mdns_txt_decoders.py`.

User flagged on 2026-05-23 PM: their M2 MacBook Air was classified
as `音箱` (speaker) under the prior fix (97656cde8c0f3a). Same
root cause as the earlier AirPlay-as-tv mis-class: a Bonjour
category was diagnostic of TWO different device classes. A Mac
running with AirPlay receiver enabled publishes `_raop._tcp` →
"AirPlay audio" — the same category my previous "speaker" rule
keyed on. HomePods publish AirPlay audio TOO, but ALSO publish
`HomeKit` (via the HomePodSensor service). HomeKit is the
discriminator.

This is the same trap as `"sony"` matching both Bravia and
PlayStation: a too-broad needle hit a more-specific later rule
that never got to fire.

Classifier changes (`src/diting/lan_classify.py`):

- New `_apple_model_class(host)` maps Apple's hardware product
  code (`Mac14,2`, `AudioAccessory6,1`, `iPhone16,1`, `AppleTV14,1`)
  to laptop / desktop / speaker / phone / tv via the
  `_APPLE_MODEL_PREFIXES` table. Apple's own product code is the
  highest-fidelity signal — it can't disagree with itself.
- `classify()` now applies the Apple-model-code rule BEFORE the
  rules-table walk. Resolves Mac-vs-HomePod ambiguity directly.
- Speaker rule tightened: `AirPlay audio` alone no longer fires;
  must be paired with `HomeKit` (HomePod), or the host must
  match an explicit `_SPEAKER_VENDOR_NEEDLES` brand (Sonos /
  Bose / JBL / Harman / Anker).
- New laptop rule fires on Mac-specific Bonjour categories
  (`Mac` from `_workstation._tcp`, `Screen sharing` from `_rfb._tcp`).
- New Apple-vendor + AirPlay-audio fallback routes Macs without
  Mac / Screen-sharing services to laptop instead of falling
  through to phone via Apple Companion.

State plumbing (`src/diting/lan.py`):

- `_build_bonjour_index` return tuple grows from `(host, services)`
  to `(host, services, apple_model)`. Pulls `dev.txt.get("model")`
  from each BonjourDevice on each IP; first-wins.
- `LANHost` gains `bonjour_model: str | None`.
- `_merge_arp_into_state` consumes the new tuple shape, populates
  `bonjour_model`, then calls `classify` — the Apple model code
  is in scope when the classifier runs.

Bonjour → LAN cross-reference (`src/diting/tui.py`):

- `DitingApp._lan_host_at_ip(ip)` / `_lan_index_by_ip()` —
  symmetric helpers to the existing Bonjour-into-LAN enrichment.
- `_bonjour_borrow_vendor(d, lan_lookup)` — Bonjour rows whose
  vendor is None lift the LAN-side OUI vendor for the same
  IPv4. Rendered in dim cyan to mark "borrowed from LAN".
- `BonjourPanel.update_devices` accepts `lan_lookup` and threads
  it through both `_bonjour_row_line` and `_bonjour_by_host_rows`.
- `_refresh_mdns_panel` passes the App's per-render LAN index.
- `BonjourDetailScreen` accepts `lan_host` kwarg and renders a
  new `LAN host` section (Tier B) with MAC, OUI vendor, device
  class, TTL (with gateway-suppression), NBNS name, UPnP
  server / model. `sync_to_app_selection` re-resolves on
  cursor-move.
- `DitingApp._bonjour_lan_host_for(device)` matches by IPv4
  address.

LAN detail modal (`src/diting/tui.py`):

- Identity Model row source priority: bonjour_model (via
  `_APPLE_MODELS` friendly-name lookup) → upnp_model → upnp_friendly_name.
  Mac14,2 → `MacBook Air 13-inch (M2, 2022) (Mac14,2)`.
- Unknown model codes still render the raw string so users can
  match Apple's published identifier tables externally.

i18n (`src/diting/i18n.py`): one new entry `"vendor (OUI)"` for
the Bonjour modal's LAN cross-reference section. `LAN host`
and `class` reused from the LAN modal's existing catalog.

Tests (+22 new, 1041/1041 total):

- `test_device_class.py` (+7):
    - `test_mac_with_airplay_receiver_enabled_signals_laptop_not_speaker`
      — direct regression for the 2026-05-23 PM user-flagged case
    - `test_homepod_airplay_audio_plus_homekit_signals_speaker_not_tv`
      (renamed; HomeKit now required)
    - 6 Apple-model-code tests: laptop / speaker / phone / tv /
      unknown-prefix fall-through / prefix-ordering
- `test_lan.py` (+2): bonjour_index 3-tuple shape verification;
  apple model code extraction from TXT.
- `test_tui_helpers.py` (+2): LAN modal Identity Model row
  prefers bonjour_model + friendly-name resolution; unknown
  codes fall back to raw.

TESTING.md (EN + ZH) extended with 6 new coverage rows for the
classifier changes, model-code path, and Bonjour ↔ LAN
cross-reference.

Tier C — promote pairwise enrichment to a shared host registry —
deferred. Recorded in `project-shared-host-registry` memory note
+ a "Deferred" section in the change's design.md so the next
maintainer can pick it up when a third source (BLE-RPA
correlation, `lan.yaml`, edge-hardware sidecar) needs to join.

Validates: openspec validate expand-lan-identification --strict ✓
           openspec validate --specs --strict (22/22) ✓
           pytest 1041/1041 ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m rpMd / am TXT keys (no name-based classification)

User flagged 2026-05-23 PM: their `Situs-iPad-Pro-M4` (random-MAC
iPad) was classified as `phone` because the only Bonjour signal
was the catch-all "Apple Companion" category.

Two pieces, one principle.

**Principle (user-reinforced):** device names (`bonjour_name`,
reverse-DNS `hostname`) are user-controllable. A renamed device
must NOT change its class — anything else is a spoofing surface
in an audit tool. So no name-pattern matching.

**Authoritative signal:** Apple Continuity protocols carry the
hardware model identifier in different TXT keys for different
services:

- `_airplay._tcp.local.`        → `model=` (e.g. `Mac14,2`)
- `_companion-link._tcp.local.` → `rpMd=` (e.g. `iPad14,3`)
- `_raop._tcp.local.`           → `am=`   (e.g. `AudioAccessory6,1`)

The random-MAC iPad in the user's network publishes only
`_companion-link`, so the previously-extracted `model` key
missed it. `_bonjour_extract_apple_model` now walks
``("model", "rpMd", "am")`` in order, first-wins. Random-MAC
iPads get classified via `rpMd=iPad14,3` in companion-link TXT
without any user-controllable string entering the decision.

Changes:

- New `tablet` class in the taxonomy (12 classes total). iPads
  are tablets, not phones — distinct form factor.
- `_APPLE_MODEL_PREFIXES`: `iPad` → `tablet` (was `phone`).
- `src/diting/lan.py`: `_build_bonjour_index` walks Continuity
  TXT keys via `_bonjour_extract_apple_model` helper. Documents
  the per-service-type key conventions in a header comment.
- `src/diting/lan_classify.py`: Apple model-code path runs
  BEFORE the rules table. Deliberate non-rule: name patterns
  removed; replaced with a long header comment explaining the
  audit-tool reasoning.
- i18n: `tablet` → `平板` (EN + ZH).
- README + CHANGELOG (EN + ZH) + spec deltas + design.md
  vocabulary lists updated 11 → 12 classes.

Tests (+10 net new, 1047/1047 total):

- `test_apple_model_ipad_signals_tablet_not_phone` — direct
  regression for the user-flagged case.
- `test_bonjour_name_ipad_pattern_does_NOT_signal_tablet` —
  proves the spoofing surface is closed.
- `test_renamed_homepod_to_macbook_still_classifies_correctly`
  — adversarial: HomeKit-bearing host renamed to "MacBook" stays
  speaker.
- `test_apple_model_code_still_wins_over_misleading_name` —
  authoritative > misleading.
- `test_bonjour_cross_ref_pulls_apple_model_code_from_rpmd_txt`
  + `_from_am_txt` — verify the new TXT-key extraction for
  companion-link and raop services.
- `_VALID_CLASSES` set updated to include `tablet`.

Validates: openspec validate expand-lan-identification --strict ✓
           openspec validate --specs --strict (22/22) ✓
           pytest 1047/1047 ✓

For random-MAC iPads on networks where the user has firewalled
mDNS or the iPad has Continuity disabled, the `_companion-link`
TXT model code won't be visible and we genuinely have no
authoritative signal. The honest answer is the host falls
through to `phone` and the user can see the name in the modal
to decide for themselves — better than a name-based guess that
can be spoofed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chenchaoyi chenchaoyi merged commit 5a76d00 into main May 23, 2026
5 checks passed
@chenchaoyi chenchaoyi deleted the feature/expand-lan-identification branch May 23, 2026 11:27
chenchaoyi added a commit that referenced this pull request May 23, 2026
Archive of `expand-lan-identification` change after PR #119 merged
to main as commit 5a76d00.

Delta specs applied to canonical specs at `openspec/specs/`:

- **cli** — added `DITING_LAN_PROBE` + `DITING_LAN_UPNP_FETCH` env vars
- **events** — added `LANActiveProbeConsentedEvent` dataclass
- **event-log** — added serialiser + `emit_lan_active_probe_consented` method
- **i18n** — added ~30 EN→ZH strings for the new LAN UX
- **scenes** — `scene_defaults` gained `lan_active_probe` knob
- **tui-shell** — modal-stack list grows with `LANProbeConsentScreen`;
  `LANDetailScreen` expands from 4 to 5 sections (Active discovery
  added, Class / Model / TTL rows added); new LAN row Class column
  + `[new]` chip; new `P` keybinding for public-scene consent
- **lan-inventory** — `_ping_one` / `_sweep` return 3-tuple
  `(reachable, rtt_ms, ttl)`; LANHost gains 8 new fields
  (vendor_raw, nbns_name, upnp_*, ttl, ttl_class, device_class,
  bonjour_model); 7 new requirements (multi-tier OUI, vendor
  normalization, scene-gated active discovery, public-scene
  one-shot override, TTL fingerprint, device classifier,
  consent JSONL event)

Change moved to `openspec/changes/archive/2026-05-23-expand-lan-identification/`.

Validates: openspec validate --specs --strict → 22/22 passed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant