Skip to content

Fix physical monotonicity analysis for MFP and FlatMap#36778

Merged
antiguru merged 3 commits into
mainfrom
claude/inspiring-davinci-VZmDT
May 29, 2026
Merged

Fix physical monotonicity analysis for MFP and FlatMap#36778
antiguru merged 3 commits into
mainfrom
claude/inspiring-davinci-VZmDT

Conversation

@antiguru
Copy link
Copy Markdown
Member

@antiguru antiguru commented May 29, 2026

Motivation

CLU-68: The SingleTimeMonotonic physical-monotonicity interpreter propagated
the input's monotonicity through mfp and flat_map without consulting the
MFP's predicates or the table function's properties. The MIR monotonicity
analysis (src/transform/src/analysis/monotonic.rs) already accounts for both,
so the LIR interpreter was inconsistent with it.

Severity

The practical correctness impact is low, and this is best understood as
defense-in-depth / consistency rather than a live bug fix:

  • This analysis only ever relaxes the safe default. Single-time peeks set
    must_consolidate = true on hierarchical reductions / TopK by default; a
    positive physical-monotonicity result is what lets that be relaxed to
    false. A spuriously monotonic result is therefore the only way to cause
    trouble.
  • The only LIR-reachable operator that can actually emit negative diffs is
    repeat_row with a negative count, which requires the unsafe
    enable_repeat_row flag and is not meaningfully representable in a
    single-time peek (a consolidated -1 row has no SQL meaning).
  • preserves_monotonicity() also returns false for AclExplode,
    MzAclExplode, and GuardSubquerySize, but those classifications are
    conservative — they do not emit negative diffs.

So no known query produces wrong results today; the change keeps the LIR
interpreter honest and aligned with the MIR analysis.

Description

Corrects mfp and flat_map in SingleTimeMonotonic:

  1. flat_map — now requires func.preserves_monotonicity() (the
    substantive check; e.g. repeat_row does not preserve it) in addition to
    propagating the input status. Mirrors MirRelationExpr::FlatMap.

  2. Temporal-predicate checks (mfp, and the post-MFP of flat_map) — now
    reject MFPs containing temporal predicates (mz_now()). Note this is
    conservative: an MFP cannot itself negate diffs, and temporal predicates
    only introduce retractions across timestamps, while a one-shot dataflow
    runs at a single time. We keep the check as defense-in-depth and to mirror
    MirRelationExpr::Filter.

The code comments document this severity nuance inline.

Verification

Added unit tests covering:

  • flat_map with a monotonicity-preserving function (GenerateSeriesInt64)
  • flat_map with a non-preserving function (RepeatRow)
  • flat_map with a temporal predicate in the post-MFP
  • mfp with and without temporal predicates

https://claude.ai/code/session_013EEMmczcmK98tcSa9B9vvQ

claude added 3 commits May 29, 2026 02:28
The single-time physical monotonicity interpreter
(`SingleTimeMonotonic`) blindly propagated its input's monotonicity
through `FlatMap` and `Mfp` nodes, ignoring two cases the MIR
monotonicity analysis already handles:

- A `FlatMap`'s table function may not preserve the append-only
  property of its input (e.g. `repeat_row` can emit retractions). The
  interpreter now ANDs in `TableFunc::preserves_monotonicity()`.
- An `Mfp` (and the after-MFP fused into `FlatMap`) may contain
  temporal predicates (`mz_now()`), which can result in the future
  removal of records. The interpreter now requires the absence of
  temporal predicates via `MapFilterProject::has_temporal_predicates()`.

Previously this caused `RelaxMustConsolidate` to incorrectly strip
downstream consolidation, leading to errors like "tried to build a
monotonic reduction on non-monotonic input" for queries such as
`SELECT MAX(body) FROM mono_src, repeat_row(-1)`.

Adds unit tests covering both the monotonicity-preserving and
-breaking cases for `flat_map` and `mfp`.
The single-time physical monotonicity interpreter only ever relaxes the
safe must_consolidate=true default. An MFP cannot itself emit negative
diffs, and temporal predicates only introduce retractions across
timestamps, so the temporal-predicate check is defense-in-depth rather
than a correctness requirement for one-shot dataflows. Document this on
the mfp() interpreter method.

https://claude.ai/code/session_013EEMmczcmK98tcSa9B9vvQ
Distinguish the substantive func.preserves_monotonicity() check from the
conservative post-MFP temporal-predicate check in flat_map, matching the
framing already used in mfp().

https://claude.ai/code/session_013EEMmczcmK98tcSa9B9vvQ
@antiguru antiguru marked this pull request as ready for review May 29, 2026 15:41
@antiguru antiguru requested a review from a team as a code owner May 29, 2026 15:41
@antiguru antiguru requested review from DAlperin and def- May 29, 2026 15:41
Comment on lines +131 to +139
// In a single-time context, we propagate the monotonicity status of the
// input, conservatively treating an MFP with temporal predicates as
// breaking physical monotonicity. Note that an MFP cannot itself produce
// negative diffs (map/filter/project never negate), so this is stricter
// than strictly necessary for a single-time dataflow: temporal predicates
// (`mz_now()`) only introduce retractions *across* timestamps, and a
// one-shot dataflow runs at a single time. We keep the check anyway as
// defense-in-depth and to mirror the judgment made for
// `MirRelationExpr::Filter` in the MIR monotonicity analysis.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as correct, but it is possible to imagine a case where this causes noticably worse performance for things like COPY TO where the query uses a temporal predicate. Or even just peeks with temporal predicates… Since what we were doing before wasn't strictly wrong either.

This is a good change, just slightly unfortunate it can introduce a regression

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, once we come here, we should've replaced mz_now with a concrete timestamp so the temporal filter essentially disappears. Let me check, but I think there shouldn't be a performance regression (and otherwise, it's the right thing as it'd be a correctness issue otherwise).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — your reasoning holds, so there's no performance regression from the temporal-predicate checks. Tracing it through:

  • SingleTimeMonotonic runs only via RelaxMustConsolidate, which finalize_dataflow invokes solely under if dataflow.is_single_time() (compute-types/src/plan.rs:564-593) — i.e. one-shot peeks and COPY TO.
  • In those pipelines, ExprPrepOneShot::prep_scalar_expr (adapter/src/optimize/dataflows.rs:216) const-folds every CallUnmaterializable, including MzNow, to a literal. This runs before finalize_dataflow in both paths (peek.rs: prep at 305, finalize at 417; copy_to.rs: prep at 306, finalize at 354).
  • So mz_now() < col has become <literal> < col by the time we get here — no temporal predicates survive into LIR, and has_temporal_predicates() is always false. The new mfp/flat_map temporal checks are provable no-ops in this context (pure defense-in-depth).

This is the same invariant already pinned in compute/src/render/top_k.rs:71-81 ("ExprPrepOneShot constant-folds mz_now() to the dataflow as_of before lowering, so no temporal predicates survive into LIR"), backed by a soft_assert_or_log!.

The only behavioral change is the flat_map func.preserves_monotonicity() check, which forces consolidation for non-monotone table functions (repeat_row et al.) feeding a monotonic reduce/TopK — that's the correctness fix, and it doesn't touch the temporal-predicate cases @DAlperin was concerned about.


Generated by Claude Code

@antiguru antiguru merged commit 0d8eb02 into main May 29, 2026
123 checks passed
@antiguru antiguru deleted the claude/inspiring-davinci-VZmDT branch May 29, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants