Skip to content

Generate the explicit MEOS object model (class lattice, methods, error contract)#10

Open
estebanzimanyi wants to merge 1 commit into
masterfrom
feat/object-model
Open

Generate the explicit MEOS object model (class lattice, methods, error contract)#10
estebanzimanyi wants to merge 1 commit into
masterfrom
feat/object-model

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

@estebanzimanyi estebanzimanyi commented May 18, 2026

What

Makes the implicit MEOS object model explicit as a first-class
pipeline stage, so every binding/engine (PyMEOS, JMEOS, MEOS.NET,
MobilityDuck, MobilitySpark, …) derives the identical class hierarchy
and methods from one mapping instead of re-curating the C prefix
convention by hand.

MEOS is C — no classes. The model is encoded by convention in (1) the
Temporal/TInstant/TSequence/TSequenceSet struct family (template
axis), (2) the temptype discriminator whose base type is the missing
template parameter (type-family axis), and (3) the function-name prefixes
that bind a function to the class it is a method of.

How (mirrors the PR #8 portable-aliases pattern)

  • meta/object-model.json — curated source of truth: the
    single-inheritance lattice (Temporal → TAlpha/TNumber/TSpatial →
    leaves), the geometry/geodetic trait axis, the Box and
    Collection companion hierarchies required to type the closed
    algebra, the errorCode contract, and the irregularity worklist.
  • Reconciled to the authoritative MobilityDB manual (Ch. 7
    Figure 7.1 "Hierarchy of spatiotemporal types", doc/images/tspatial.svg):
    TSpatial → TGeo → {TGeometry, TGeography, TPoint → {TGeomPoint, TGeogPoint}} with TCbuffer/TNpoint/TPose/TRGeometry under
    TSpatial. TGeo is the broad parent of all PostGIS-derived
    types (= the C tgeo_type_all predicate); TPoint is added as an
    API-level intermediate under TGeo (the figure omits it but the
    tpoint_* family must bind to a class). Class names use the manual
    spelling (TGeo, TNpoint, TCbuffer, TPose, TRGeometry); C
    prefixes unchanged. A test gates the model against the figure so the
    reconciliation cannot regress.
  • parser/object_model.py + run.py stage 3 — derive the class
    tree and assign every catalog function to the class it is a method of
    by longest-prefix match (the function is its own backing symbol —
    equivalence by construction). Additive idl["objectModel"];
    functions[] untouched. Per-function raised errorCodes derived by a
    static scan of the MobilityDB sources (direct meos_error + one
    ensure_* indirection), honest source-unavailable when absent.
  • object_model_parity.py — audits the derived lattice against the
    PyMEOS factory oracle (parsed, never hard-coded).
  • tests — a drift gate re-derives every membership set from the
    MobilityDB predicates / MEOS_TEMPTYPE_CATALOG / enums (TGeo gated
    against tgeo_type_all); a manual-figure gate; a parity gate
    (every divergence has a stated correction). python3 tests/test_*.py,
    no pytest/libclang needed.
  • setup.py also fetches meos/src; docs/object-model.md + README.

Live result

2672 functions → 72 classes; every class-method prefix is 100%
classified
(temporal_ 137/137, tnumber_ 32/32, tgeo_ 28/28,
tpoint_ 25/25, tgeometry_ 5/5, tcbuffer_ 14/14, tnpoint_ 20/20,
all leaves/subtypes, trgeo_ 37/37). The remaining functions are
cross-type operators (the portable-aliases / RFC-920 domain), base
PostGIS/datum_ helpers and runtime — the object-model and
portable-aliases models cleanly partition the API into methods vs
operators.

Parity vs PyMEOS: 18 concrete classes aligned, 41 divergences, 0
needs-correction
(all explained by curated corrections). Error scan:
scanned, 359 functions with a derived raises set. All 16 + 5 tests
green (incl. drift, manual-figure and parity gates).

Irregularities surfaced (corrections worklist, report-only)

MEOS: OM-M1 (the class TGeo is broad per the manual, but the C
tgeo_type() / tgeo_* API is narrower — rejects points); OM-M2
(tgeometry_type() misnomer); OM-M3 (TRGeometry base = T_POSE
name); OM-M4 (talpha_type has no class name); OM-M5 (tpose_to_tpoint
rename strain); OM-M6 (manual Figure 7.1 is partial — spatial-only,
no TPoint; model is the superset); OM-M7 (tpcpoint/tpcpatch
planned but absent from master MEOS and the figure — out of the
drift-gated SoT until MEOS adds them; derived automatically once
present). PyMEOS: OM-P1/P3/P4/P5/P6/P7 (missing 6 leaves, the full
Collection hierarchy, the TSpatial/TGeo abstracts, dummy
constructors, shapely-base inconsistency, ad-hoc bearing()).

Fixes land as separate PRs in those repos by their own sessions.

Notes

Extend objectModel.dispatch to the complete editorial-member set, transcribed verbatim from PyMEOS RFC #94 §7 (AST-extracted 1:1 from the hand-written oracle; do not re-derive per §6): geo += minus, nearest_approach_distance (single-block); temporal restructured to the adopted per-concrete contract dispatch.temporal.{tfloat,tint,tbool,ttext}.<member> with always_/ever_(not_)equal, temporal_(not_)equal, at, minus (tbool: only temporal_(not_)equal/at/minus; no temporal distance — none in the oracle). Schema additions: scalarType field, argTransform += textsetMake, py:"list[str]"; attach_object_model passthrough unchanged. Verified: JSON valid, non-dispatch facets byte-identical, generator emits it, zero #10 test regression vs the prior tip (identical 20 passed; the lone parity failure is the pre-existing cross-file test-ordering issue, independent).
estebanzimanyi added a commit that referenced this pull request May 19, 2026
Makes ecosystem-wide 100% parity provable at one point instead of
asserted from per-PR isolation-green. meta/integration-train.json is the
PR dependency DAG + per-wave gates + merge order; verify-train.sh
composes the train and runs each wave's gate (PASS only when just-run
green here, else BLOCKED with the exact gate it needs). Operationalizes
MobilityDB discussion #895 (wave-based merge plan). Stacked on
feat/object-model (the catalog anchor): Wave 0 verifies here (2699 fns,
PR #10 21/21, from_mfjson + constructors uniform); Waves 1-3 are
honestly gated on the MEOS-1.4 bump (the single universal unblock).
estebanzimanyi added a commit to estebanzimanyi/MEOS-API that referenced this pull request May 21, 2026
Makes ecosystem-wide 100% parity provable at one point instead of
asserted from per-PR isolation-green. meta/integration-train.json is the
PR dependency DAG + per-wave gates + merge order; verify-train.sh
composes the train and runs each wave's gate (PASS only when just-run
green here, else BLOCKED with the exact gate it needs). Operationalizes
MobilityDB discussion #895 (wave-based merge plan). Stacked on
feat/object-model (the catalog anchor): Wave 0 verifies here (2699 fns,
PR MobilityDB#10 21/21, from_mfjson + constructors uniform); Waves 1-3 are
honestly gated on the MEOS-1.4 bump (the single universal unblock).
estebanzimanyi added a commit to estebanzimanyi/MEOS-API that referenced this pull request May 21, 2026
Makes ecosystem-wide 100% parity provable at one point instead of
asserted from per-PR isolation-green. meta/integration-train.json is the
PR dependency DAG + per-wave gates + merge order; verify-train.sh
composes the train and runs each wave's gate (PASS only when just-run
green here, else BLOCKED with the exact gate it needs). Operationalizes
MobilityDB discussion #895 (wave-based merge plan). Stacked on
feat/object-model (the catalog anchor): Wave 0 verifies here (2699 fns,
PR MobilityDB#10 21/21, from_mfjson + constructors uniform); Waves 1-3 are
honestly gated on the MEOS-1.4 bump (the single universal unblock).
estebanzimanyi added a commit to estebanzimanyi/MEOS-API that referenced this pull request May 21, 2026
Makes ecosystem-wide 100% parity provable at one point instead of
asserted from per-PR isolation-green. meta/integration-train.json is the
PR dependency DAG + per-wave gates + merge order; verify-train.sh
composes the train and runs each wave's gate (PASS only when just-run
green here, else BLOCKED with the exact gate it needs). Operationalizes
MobilityDB discussion #895 (wave-based merge plan). Stacked on
feat/object-model (the catalog anchor): Wave 0 verifies here (2699 fns,
PR MobilityDB#10 21/21, from_mfjson + constructors uniform); Waves 1-3 are
honestly gated on the MEOS-1.4 bump (the single universal unblock).
@estebanzimanyi
Copy link
Copy Markdown
Member Author

Coordination note from a sibling session — proposing a third per-function facet for this PR's object model.

objectModel.dispatch (D1, on this PR feat/object-model) and shape.nullable (D2, on PR #2) already classify each MEOS public function by its routing and argument-/return-nullness. This proposes a sibling facet objectModel.streamingSemantics that classifies each function by its streaming-execution semantic — so per-engine bindings (MobilityFlink, MobilityKafka, MobilityNebula, future) can codegen tier-appropriate wrappers mechanically.

Grounded backing: a v4 mechanical classifier (~250 LOC, deterministic) has already been applied to the catalog this PR emits — it classifies 2,240 public-API functions into 6 streaming tiers with 97.4% mechanical coverage (2.6% ambiguous = 59 functions). The classifier output has already driven a working codegen wedge on MobilityFlink and MobilityKafka (1 PR each, OPEN, 57 generated Java classes × 2,097 methods compile clean against JMEOS PR #19).

This RFC formalizes the classifier as a catalog facet so every binding consumes the same tier assignments, and resolves the 59 ambiguous corner cases + 2 streaming-semantic nuances per-function so the facet ships complete with zero ambiguity remaining.


Motivation (one paragraph)

The MEOS public API has 2,240 functions. The first per-engine streaming binding (MobilityFlink) currently hand-wires 10 distinct operators to support the 9 BerlinMOD benchmark queries — 0.4% API coverage by operator. A mechanical classification (the v4 streaming-relevance baseline, deterministic name + signature + role rules) lifts that ceiling to 97.4% covered (87% streamable, 9.7% I/O-meta, 0.6% honestly marked non-streamable; 2.6% / 59 functions ambiguous under the mechanical rules). This RFC adopts the v4 baseline as the catalog's objectModel.streamingSemantics facet and closes the 59 ambiguous cases by per-function override.

Vocabulary

Closed 7-value set. Every public MEOS function has exactly one tier.

Tier Meaning Default per-engine wiring
stateless Pure per-event, no state Scalar UDF
bounded-state Per-event with bounded per-key state (MEOS handle) Scalar UDF (state lives in MEOS handle)
windowed Output cardinality changes; needs window Windowed AggregateFunction
cross-stream Pairwise across two streams; needs interval-overlap join Join + windowed UDAF
sequence-only Requires the full sequence offline; not streamable Explicit "not-streamable" marker per engine
io-meta I/O, type catalog, MEOS lifecycle Per-engine helper / format clause
internal Not in the public binding surface Excluded

Classification rules (deterministic, 18 rules applied in order; first match wins)

Each rule fires off objectModel.classes[*].methods[*].role, function-name regex, or signature shape (returnType / params arity / temporal-arg count). The rule list is reproduced verbatim from classify_v4.py (the reference classifier; ~250 LOC; ambiguous rate 2.6% on the 2,240-fn public-API surface). Summary:

# Rule Tier
1 Header not public internal
2-4 role=output / I/O-name pattern / MEOS infra io-meta
5 role=conversion / _to_<type> stateless
6 role=accessor bounded-state
7 role=restriction / ^(minus|at|delete)_ bounded-state
8 role=aggregate windowed
9 ^ever_ / ^always_: 2+ temporals → cross-stream else windowed
10 ^[ea]<rel>_ spatial-rels: 2+ temporals → cross-stream else bounded-state
11 role=predicate: 2+ temporals → cross-stream else bounded-state
12 role=constructor: returns sequence/seqset → sequence-only else stateless
13 Position/topology rels (adjacent_*, contained_*, left_*, …): box/span → stateless; 1 temporal → bounded-state; 2 temporals → cross-stream
14 Temporal numeric/text/bool ops (tfloat_add, tbool_not, …) stateless
15 Temporal comparison (teq_, tne_, …) stateless
16 Scalar comparison / hash stateless
17 Sequence-derived metrics (*_length, *_speed, *_cumulative_*, *_twavg, …) windowed
18 Distance ops (nad_, nai_, tdistance_, …): 2+ temporals → cross-stream else bounded-state

Full ordered rule list reproducible from the reference classifier; the catalog ships it as meta/streaming-semantics-rules.md for audit.

Per-function overrides

Two override sources:

A. The 59 ambiguous cases (post-hoc deterministic resolution)

# Function pattern Resolution Reasoning
1 tand_* / tor_* / tnot_* (5) stateless Per-instant boolean lift, zero state
2 mult_* (5) stateless Alternate spelling of *_mul_*
3 tfloatbox_*_tiles / tintbox_*_tiles / *_time_tiles (6) stateless Pure tile generator
4 spatialset_transform[_pipeline] / spatialset_set_srid (3) stateless Set in, set out
5 tgeoseq_from_base_* / tpointseq_from_base_* / tgeoseqset_from_base_* / tpointseqset_from_base_* / tpointseq_make_coords (7) sequence-only Builds full TSequence/TSeqSet from time set/span/spanset
6 geomeas_to_tpoint (1) sequence-only Measure column carries time
7 bearing_tpoint_point (1) bounded-state 1 temporal + scalar
8 bearing_tpoint_tpoint (1) cross-stream 2 temporals
9 geo[m]point_make2d / geo[m]point_make3dz (4) stateless Pure scalar → geometry
10 line_point_n / line_interpolate_point / line_locate_point / line_substring (4) stateless Static-geometry ops
11 geompoint_to_npoint (1) stateless Type cast
12 nsegment_*_position / route_geom (3) bounded-state Per-value accessor
13 tgeoarr_tgeoarr_mindist / mindistance_tgeo_tgeo (2) cross-stream Pairwise across temporals
14 intersection_*_set / union_*_set for cbuffer/npoint/pose (6) stateless Set algebra
15 meosoper_from_string / interptype_from_string / settype_basetype / spantype_basetype / *_spansettype / basetype_* (8) io-meta Type catalog / enum

Total: 57 entries covering all 59 ambiguous functions. Post-override ambiguous count: 0. Net additions: +33 stateless · +4 bounded-state · +3 cross-stream · +7 sequence-only · +8 io-meta.

B. The 2 streaming-semantic nuances (per-function semantic overrides)

Functions where the mechanical tier is technically correct but a streaming engineer needs a different operational reading. The override exposes BOTH so codegen can pick by mode.

Function pattern Mechanical tier Streaming-semantic tier Mode key
e<rel>_tgeo_geo (9 fns: eintersects, edwithin, edisjoint, etouches, econtains, ecrosses, eoverlaps, ewithin, ecovers) bounded-state (per-event evaluable, carry MEOS handle) windowed (OR-fold over trajectory lifetime) streamingMode.eRelOverWindow
<class>_trajectory / <class>_time / <class>_timespan / <class>_periods / <class>_periodset and family accessors that read the full sequence bounded-state (role=accessor) windowed (closure-of-stream) streamingMode.seqAccessorAtClose

Both modes are valid; the catalog ships them as a dual shape; codegen picks per consumer engine.

JSON schema (proposed)

{
  "functions": {
    "eintersects_tgeo_geo": {
      "objectModel": {
        "role": "predicate",
        "classes": ["TGeo"],
        "dispatch": { /* D1 facet, unchanged */ },
        "nullable": { /* D2 facet, unchanged */ },
        "streamingSemantics": {
          "tier": "bounded-state",
          "tierConfidence": "high",
          "classificationRule": "rule-10: ever-rel on 1 temporal",
          "alternateMode": {
            "tier": "windowed",
            "modeKey": "streamingMode.eRelOverWindow",
            "reasoning": "OR-fold over trajectory lifetime"
          }
        }
      }
    }
  }
}

For functions with no alternateMode (the 1,890 majority), the streamingSemantics block has only tier, tierConfidence, and classificationRule.

Generator integration

Add to MEOS-API:

meta/
├── object-model.json                       # existing (D1 dispatch)
├── meos-meta.json                          # existing (manual annotations)
└── streaming-semantics.json                # NEW: per-function overrides (59 + 2)

parser/
└── streaming_semantics.py                  # NEW: 18-rule classifier + overrides

run.py
└── 4th step: attach_streaming_semantics(idl, STREAMING_PATH)   # NEW

streaming_semantics.py adapts the reference v4 classifier to write into idl['functions'][name]['objectModel']['streamingSemantics'] instead of a separate file. The override file ships the table above as JSON.

Adoption path

Wave Step Owner
1 (this RFC) Land streamingSemantics facet + 18-rule classifier + override JSON on this PR's branch MEOS-API session
2 Per-binding codegen consumers switch from name-pattern heuristics to reading objectModel.streamingSemantics.tier directly (MobilityFlink #5 + MobilityKafka #3 are already running on the v4 baseline; switching to the catalog facet is a one-line input change) each binding session
3 alternateMode consumers (engines preferring the windowed reading of e<rel>_tgeo_geo) opt in via modeKey per-engine policy

The Wave-1 PR is small: ~250 LOC classifier + ~150 LOC override JSON + one-line run.py addition. Audit-by-regeneration — reviewer evaluates the classifier rules + override table; the emitted facet on every function is mechanically derived.

Reference artifacts

  • streaming-relevance-baseline.json (full per-function classification, 2,240 public rows)
  • streaming-relevance-ambiguous.json (the 59-fn corner-case set)
  • classify_v4.py (reference classifier, ~250 LOC, deterministic)
  • streaming-semantics-facet-rfc.md (the long-form draft this comment summarizes)

Happy to provide the JSON files / classifier source as a follow-up PR off feat/object-model if useful for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant