This repository was archived by the owner on Apr 30, 2026. It is now read-only.
Follow-ups to #8: variadic and/or in WHERE, sym_atom_cmp invariant, INT_MIN→null DAG, glob escape docs#9
Merged
singaraiona merged 6 commits intoRayforceDB:masterfrom Apr 29, 2026
Conversation
Previous implementation used unsigned-wrap negation, so:
(abs -32768h) -> -32768h ❌ negative result from abs!
(abs (- (neg 32767h) 1h)) -> -32768h ❌ same path
(neg INT_MIN) -> INT_MIN ❌ wrap, silent
Math: -INT_MIN doesn't fit in the same signed width. Per k/q
convention, surface the overflow as a typed null of the same width:
(neg INT16_MIN) -> 0Nh
(neg INT32_MIN) -> 0Ni
(neg INT64_MIN) -> 0Nl
(abs INT_MIN) -> typed null (same per width)
Properties:
✓ Type preserved (no widening to i64)
✓ No UB (overflow detected before negation)
✓ Detectable via `nil?`
✓ Consistent with existing null propagation: (neg 0Ni) → 0Ni
The check is wrapped in RAY_UNLIKELY since INT_MIN is extremely rare
in real data; modern branch predictors will treat it as a
near-zero-cost test on the hot path.
DAG inline loop in expr.c:811-812 still uses unsigned wrap (separate
hot path for OP_NEG/OP_ABS over i64 vectors in select-with-derived-
columns). Inconsistency between atom and DAG paths to be addressed
in a follow-up.
Tests in test/rfl/arith/{neg,abs}.rfl now pin:
- (neg/abs INT16_MIN/INT32_MIN/INT64_MIN) → typed null of same type
- adjacent values (INT_MAX, -INT_MAX) still negate normally
Addresses Anton's PR RayforceDB#8 review blocker 2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fused expression path (`(select x: (neg col) from: t)` and the fallback `exec_elementwise_unary`) used unsigned-wrap negation and let INT64_MIN survive in the result vector with no null bit set — same silent-overflow shape as the atom path that a9e2223 fixed, but in the hot vector path. Add a single-threaded post-pass `mark_i64_overflow_as_null` that, for last-instruction OP_NEG/OP_ABS over i64, scans the result, zeroes any INT64_MIN slot (preserving the "data at null position is zero" invariant) and sets the null bit through `ray_vec_set_null` (which also drops any attached accelerator index, per existing semantics). Wired into `expr_eval_full`, `expr_eval_full_parted`, and the `exec_elementwise_unary` fallback. No-op for f64 / non-NEG-ABS / inputs without INT_MIN. RAY_UNLIKELY hint lets the compiler keep the scan branch-predicted as cold. Tests added in test/rfl/arith/{neg,abs}.rfl exercise both the DAG-fused select-from path and the eval-path vec via concat-built INT_MIN columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous fallback (82fae9d) "compare raw i64 ids when ray_sym_str returns NULL" was logically inconsistent: with no backing string we have no defensible answer for *lexicographic* ordering — returning ±1 by raw id fabricates a non-lex total order, the same shape of lie as the original `return 0` that Anton flagged, just lying about "unequal" instead of "equal". NULL from ray_sym_str only fires under corruption (uninitialised intern table, out-of-range id, evicted slot) — none of these are reachable via the normal RFL API. v1 (core/sort.c::compare_symbols) trusts the same invariant and would SIGSEGV on strcmp(NULL,...) if it broke. Replace the fake fallback with a documented assert. Honours Anton's "never silently say 'equal'" directive without inventing a fake ordering: in debug we abort with a clear diagnostic; in release the assert is elided and we fall through to ray_str_cmp(NULL,...) which matches v1's UB-crash semantics — both modes loud, neither silent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lenient-parsing note in glob.h covered "what happens with a broken pattern" but not the closely related question of how to *intentionally* match a metacharacter. There is no backslash escape — the only way to match a literal `*`/`?`/`[`/`]`/`-` is to wrap it in a one-element character class (`[*]`, `[?]`, `[[]`, `[]]`, `[-]`). This is the standard fnmatch idiom but not obvious to a Rayforce user coming from SQL `LIKE` (which has no metachars at all) or from a regex background (which assumes `\` works). Document the idiom in glob.h alongside the supported metacharacters, and pin the behaviour in test/rfl/strop/like.rfl with five concrete patterns. No code change to the matcher itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR RayforceDB#8 dropped `register_binary_op("and", ..., OP_AND)` and switched `and`/`or` to eval-time variadic via `register_vary`. `compile_expr_dag` still recognised the 2-arg form via `resolve_binary_dag` (binary shortcut path) but bailed out with NULL on n>=4 — and the WHERE-clause caller upgraded that to "WHERE predicate not supported by DAG compiler". Anton flagged this as a silent perf cliff; in practice it is a correctness regression (queries that worked in v1 / pre-PR fail outright). Fold variadic AND/OR into a balanced binary tree of OP_AND/OP_OR nodes before lowering: `(and a b c d)` → `(and (and a b) (and c d))`, depth log2(N). The fused-expr executor evaluates this as a flat sequence of binary AND/OR instructions sharing scratch registers, so no extra column allocations vs the hand-nested form. Iterative pairwise reduction inside `compile_expr_dag`; bail at >64 inputs (avoids unbounded stack array, no realistic query needs more). Balanced shape (rather than left-fold) keeps the tree symmetric and minimises dependency-chain depth, leaving room for a future predicate-parallel executor without rewriting this pass. Tests in dag_binary_ops.rfl pin: 3/4-arg AND in WHERE matches the nested-binary form; 3/5-arg OR in WHERE; mixed `(and (or...) (or...))`; fold semantics agrees with composing two filtered selects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR RayforceDB#8 changed `and`/`or` from `register_binary_op` (with OP_AND/OP_OR opcodes) to `register_vary` — but registered them with `RAY_FN_NONE`, losing the `RAY_FN_SPECIAL_FORM` flag that v1 carried. Result: every arg is eagerly evaluated even when the result is already determined by a prior scalar — `(and false (heavy-fn))` runs `heavy-fn`, `(or true (slow-query))` runs `slow-query`. v1 short-circuited; v2 silently regressed. Promote both to special form so the kernel receives unevaluated AST nodes and calls `ray_eval` itself — same dispatch shape as `if`, `do`, `let`, `select` etc. Short-circuit triggers only when the running accumulator is a *scalar* with the determining truth value: scalar falsy for AND, scalar truthy for OR. When the accumulator is a vector we fall through to `ray_and_fn` / `ray_or_fn` for proper element-wise broadcast (preserves `(or [t f t] true) → [t t t]` shape semantics that the existing test asserts). Tests in cmp/and.rfl and cmp/or.rfl pin: short-circuit prevents evaluation of subsequent args (probe via `undefined-name`), and short-circuit does *not* trigger when the result is undetermined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Six follow-up commits on top of #8 (now merged) — picked up issues that surfaced after that PR landed plus the two non-blockers from @singaraiona's review I'd flagged for follow-up.
Commits
399b3cc fix(arith): neg/abs INT_MIN → typed null (k/q convention)— the original PR fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout #8 fix used unsigned-wrap negation, but(abs INT16_MIN)returning a negative value is mathematically wrong. Restored k/q convention: INT_MIN → typed null of the same width, detectable vianil?. Atom path only.aaa4e61 fix(expr): DAG path neg/abs INT_MIN → typed null (i64)— extends the same contract to the DAG-fused(select x: (neg col) from: t)path via a single-threaded post-pass that runs afterray_pool_dispatchjoins.mark_i64_overflow_as_nullzeroes the data slot (preserving the "null position is 0" invariant) and sets the null bit throughray_vec_set_null(which drops any attached accelerator index inline).103132a fix(cmp): sym_atom_cmp asserts intern invariant instead of fake fallback— the raw-id fallback I added in fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout #8 to address the NULLray_sym_strcase was logically inconsistent: with no backing string we have no defensible lexicographic answer, and returning ±1 by interned id fabricates a non-lex total order — same shape of lie as thereturn 0it replaced. Replaced withassertmatching v1's "trust the invariant, crash if it breaks" stance.8f5e04f docs(glob): document literal-metachar escape via one-element class—glob.hdocumented the lenient unterminated-[policy but not how to intentionally match*/?/[/]/-. There is no backslash escape; the idiom is one-element class wrap ([*],[?],[[],[]],[-]). Documented inglob.hand pinned inlike.rfl.f0afb20 fix(query): variadic and/or in WHERE — balanced auto-fold to DAG binary— turned out to be a correctness regression, not the silent perf cliff I initially thought:compile_expr_dagerrored with "WHERE predicate not supported by DAG compiler" rather than falling back to eval. Fixed by folding variadic AND/OR into a balanced binary tree before lowering — same fused-expr path as hand-nested binary, no extra column allocations.868a1ff fix(cmp): restore v1 short-circuit semantics for and/or— discovered this was a v1 regression, not just absent feature: v1 registered both withFN_SPECIAL_FORM, PR fix(8 bugs) + test(coverage +1.3pp): probes, type-preservation, broad workout #8 dropped it. Restored. Short-circuit triggers only when the accumulator is a scalar with the determining truth value, so vector broadcast semantics ((or [t f t] true) → [t t t]) are preserved.Tests
974 of 975 passing under ASan + UBSan (1 skipped, 0 failed).