Skip to content

feat: location/entity name-resolution accuracy (fix system-token shadowing + fuzzy/noise matching)#147

Merged
ntatschner merged 2 commits into
nextfrom
feat/advanced-entity-matching
Jun 1, 2026
Merged

feat: location/entity name-resolution accuracy (fix system-token shadowing + fuzzy/noise matching)#147
ntatschner merged 2 commits into
nextfrom
feat/advanced-entity-matching

Conversation

@ntatschner
Copy link
Copy Markdown
Collaborator

Summary

Overhauls entity name-resolution accuracy after auditing the real LIVE Game.log + tray DB against the live wiki catalogue. The headline is a pre-existing bug: ~91% of location events were collapsing to their bare system name.

🔴 Location system-token shadowing (the big one)

The exact per-token catalog loop in classify() matched the leading bare system token (Stanton) against the catalogued system row before reaching the specific body, so OOC_Stanton_2b_Daymar resolved to "Stanton", not "Daymar". Measured on a real 43k-event tray DB: specific-place resolution 8.7% → 92.7%. Fix: skip bare-system tokens in the exact pass, defer them to a second pass after fuzzy so a bare system identifier still resolves with full taxonomy.

Distinctive-token fuzzy matcher (LocationCatalog::fuzzy_match)

idf-weighted token-overlap fallback, gated by a rarity anchor (df ≤ 4, non-digit, non-affiliation) + a system-consistency guard. Recovers real wiki rows the engine names differently (Stanton4a_RayariHydro_Kaltag → "Rayari Kaltag Research Outpost") while rejecting uncatalogued places that only share an operator/affiliation word (RayariHydro_McGarth). Measured: +18 distinct locations recovered, 1 low-volume FP (Pyro4…→Pyro3, 3 events).

Noise classification

match_dynamic_marker + match_procedural_node give honest generic labels to procedural/dynamic engine ids (ab_mine_*, *.socpak clusters, mission/nav markers) instead of title-casing them into fake proper-noun places.

Vehicle/item variant-suffix strip + item noise filter (web + tray)

resolveReferenceEntry(): exact lookup, then strip loaner suffixes (_Teach) so ARGO_MOLE_TeachARGO_MOLE (vehicles ~93% → ~100%); skip avatar/structural item classes (Default, Head_*, body_*, …) so the attachment_received firehose stops rendering body parts as linkable items. Mirrored into the tray (findEntityInBundles + TrayEntityLink).

Test plan

  • starstats-core: 289 tests (new fuzzy/noise/shadow tests), cargo fmt --check + clippy -D warnings clean
  • apps/web: 39 tests + tsc --noEmit clean
  • apps/tray-ui: 170 tests + tsc --noEmit clean
  • workspace cargo check (server + client) green
  • validated fuzzy precision/recall against the live wiki API + real tray DB

Out of scope

Combat events (actor_death/vehicle_destruction) are absent from Game.log at default verbosity (confirmed zero across the live log + 40 archives) — a game log-CVar matter, not a code issue.

No migrations — pure query-time resolution, honoring the "derive classification at query time" invariant.

Roadmap-Item: location-entity-name-resolution-accuracy

Nigel Tatschner added 2 commits June 1, 2026 02:30
…assification

The exact per-token catalog loop matched the bare system token
("Stanton") against the catalogued system row before reaching the
specific body that follows it, collapsing ~91% of real location
events to their system name (OOC_Stanton_2b_Daymar -> "Stanton").
Defer bare-system tokens to a second pass so the specific
body/place resolves first; a bare system identifier still hits the
system row.

Add a distinctive-token fuzzy fallback (LocationCatalog::fuzzy_match):
idf-weighted token overlap, gated by a rarity anchor (df <= 4,
non-digit, not an affiliation word) plus a system-consistency guard.
Recovers real wiki rows the engine names differently
(Stanton4a_RayariHydro_Kaltag -> "Rayari Kaltag Research Outpost")
while rejecting uncatalogued places that only share an operator word
(RayariHydro_McGarth).

Add noise matchers (match_dynamic_marker, match_procedural_node) for
procedural/dynamic engine identifiers (ab_mine/ab_collector, *.socpak
clusters, mission/nav markers) so they get honest generic labels
instead of being title-cased into fake proper-noun places.

Measured on a real 43k-event tray DB: specific-place resolution
8.7% -> 92.7%; +18 distinct locations recovered by fuzzy.
…solution

Add resolveReferenceEntry(): exact case-insensitive lookup, then
strip loaner variant suffixes (_Teach/_loaner) so ARGO_MOLE_Teach
resolves to the catalogued ARGO_MOLE (vehicles ~93% -> ~100%).
For items, skip avatar/structural classes (Default, Head_*, body_*,
Shared_Scalp_*, *LensDisplay*, ...) so the attachment_received
firehose stops rendering body parts as linkable items. Add
isCosmeticItemPort() for port-based suppression.

Wire EntityLink (web) + TrayEntityLink and findEntityInBundles
(tray) through the shared resolver; keep the two mirrors in sync.
@ntatschner ntatschner added the roadmap/location-entity-name-resolution-accuracy PR ships work on roadmap item location-entity-name-resolution-accuracy label Jun 1, 2026
@ntatschner ntatschner merged commit 3d64c64 into next Jun 1, 2026
11 checks passed
@ntatschner ntatschner deleted the feat/advanced-entity-matching branch June 1, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

roadmap/location-entity-name-resolution-accuracy PR ships work on roadmap item location-entity-name-resolution-accuracy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant