feat: location/entity name-resolution accuracy (fix system-token shadowing + fuzzy/noise matching) by ntatschner · Pull Request #147 · TheCodeSaiyan/StarStats

ntatschner · 2026-06-01T02:09:34Z

Summary

Overhauls entity name-resolution accuracy after auditing the real LIVE Game.log + tray DB against the live wiki catalogue. The headline is a pre-existing bug: ~91% of location events were collapsing to their bare system name.

🔴 Location system-token shadowing (the big one)

The exact per-token catalog loop in classify() matched the leading bare system token (Stanton) against the catalogued system row before reaching the specific body, so OOC_Stanton_2b_Daymar resolved to "Stanton", not "Daymar". Measured on a real 43k-event tray DB: specific-place resolution 8.7% → 92.7%. Fix: skip bare-system tokens in the exact pass, defer them to a second pass after fuzzy so a bare system identifier still resolves with full taxonomy.

Distinctive-token fuzzy matcher (`LocationCatalog::fuzzy_match`)

idf-weighted token-overlap fallback, gated by a rarity anchor (df ≤ 4, non-digit, non-affiliation) + a system-consistency guard. Recovers real wiki rows the engine names differently (Stanton4a_RayariHydro_Kaltag → "Rayari Kaltag Research Outpost") while rejecting uncatalogued places that only share an operator/affiliation word (RayariHydro_McGarth). Measured: +18 distinct locations recovered, 1 low-volume FP (Pyro4…→Pyro3, 3 events).

Noise classification

match_dynamic_marker + match_procedural_node give honest generic labels to procedural/dynamic engine ids (ab_mine_*, *.socpak clusters, mission/nav markers) instead of title-casing them into fake proper-noun places.

Vehicle/item variant-suffix strip + item noise filter (web + tray)

resolveReferenceEntry(): exact lookup, then strip loaner suffixes (_Teach) so ARGO_MOLE_Teach → ARGO_MOLE (vehicles ~93% → ~100%); skip avatar/structural item classes (Default, Head_*, body_*, …) so the attachment_received firehose stops rendering body parts as linkable items. Mirrored into the tray (findEntityInBundles + TrayEntityLink).

Test plan

starstats-core: 289 tests (new fuzzy/noise/shadow tests), cargo fmt --check + clippy -D warnings clean
apps/web: 39 tests + tsc --noEmit clean
apps/tray-ui: 170 tests + tsc --noEmit clean
workspace cargo check (server + client) green
validated fuzzy precision/recall against the live wiki API + real tray DB

Out of scope

Combat events (actor_death/vehicle_destruction) are absent from Game.log at default verbosity (confirmed zero across the live log + 40 archives) — a game log-CVar matter, not a code issue.

No migrations — pure query-time resolution, honoring the "derive classification at query time" invariant.

Roadmap-Item: location-entity-name-resolution-accuracy

…assification The exact per-token catalog loop matched the bare system token ("Stanton") against the catalogued system row before reaching the specific body that follows it, collapsing ~91% of real location events to their system name (OOC_Stanton_2b_Daymar -> "Stanton"). Defer bare-system tokens to a second pass so the specific body/place resolves first; a bare system identifier still hits the system row. Add a distinctive-token fuzzy fallback (LocationCatalog::fuzzy_match): idf-weighted token overlap, gated by a rarity anchor (df <= 4, non-digit, not an affiliation word) plus a system-consistency guard. Recovers real wiki rows the engine names differently (Stanton4a_RayariHydro_Kaltag -> "Rayari Kaltag Research Outpost") while rejecting uncatalogued places that only share an operator word (RayariHydro_McGarth). Add noise matchers (match_dynamic_marker, match_procedural_node) for procedural/dynamic engine identifiers (ab_mine/ab_collector, *.socpak clusters, mission/nav markers) so they get honest generic labels instead of being title-cased into fake proper-noun places. Measured on a real 43k-event tray DB: specific-place resolution 8.7% -> 92.7%; +18 distinct locations recovered by fuzzy.

…solution Add resolveReferenceEntry(): exact case-insensitive lookup, then strip loaner variant suffixes (_Teach/_loaner) so ARGO_MOLE_Teach resolves to the catalogued ARGO_MOLE (vehicles ~93% -> ~100%). For items, skip avatar/structural classes (Default, Head_*, body_*, Shared_Scalp_*, *LensDisplay*, ...) so the attachment_received firehose stops rendering body parts as linkable items. Add isCosmeticItemPort() for port-based suppression. Wire EntityLink (web) + TrayEntityLink and findEntityInBundles (tray) through the shared resolver; keep the two mirrors in sync.

Nigel Tatschner added 2 commits June 1, 2026 02:30

ntatschner added the roadmap/location-entity-name-resolution-accuracy PR ships work on roadmap item location-entity-name-resolution-accuracy label Jun 1, 2026

ntatschner merged commit 3d64c64 into next Jun 1, 2026
11 checks passed

ntatschner deleted the feat/advanced-entity-matching branch June 1, 2026 02:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: location/entity name-resolution accuracy (fix system-token shadowing + fuzzy/noise matching)#147

feat: location/entity name-resolution accuracy (fix system-token shadowing + fuzzy/noise matching)#147
ntatschner merged 2 commits into
nextfrom
feat/advanced-entity-matching

ntatschner commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ntatschner commented Jun 1, 2026

Summary

🔴 Location system-token shadowing (the big one)

Distinctive-token fuzzy matcher (LocationCatalog::fuzzy_match)

Noise classification

Vehicle/item variant-suffix strip + item noise filter (web + tray)

Test plan

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Distinctive-token fuzzy matcher (`LocationCatalog::fuzzy_match`)