Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
96ff41d
Phase 1: NULL_* sentinel constants + BOOL/U8 lockdown
hetoku May 15, 2026
0f59981
Phase 2a: ray_typed_null writes NULL_F64 into F64 atom slot
hetoku May 17, 2026
f45a24f
Phase 2bc: CSV F64 parallel + serial paths dual-encode NULL_F64
hetoku May 17, 2026
eb9c8e5
Phase 2d: I64->F64 UPDATE cast writes NaN to null slots
hetoku May 17, 2026
302342e
Phase 2 regression: F64 dual-encoding RFL test
hetoku May 17, 2026
ec868e7
docs: mark Phase 2 (F64 NaN sentinel) complete in NULL_* comment
hetoku May 17, 2026
fc200e7
Phase 2e: close F64 dual-encoding gaps across kernels
hetoku May 17, 2026
a9ab778
Phase 2f: da_accum_row skips NaN-encoded null F64 rows
hetoku May 17, 2026
b1fde85
Phase 2g: close remaining F64 producer gaps + flag Phase 3 work
hetoku May 17, 2026
18cad9d
Phase 2: F64 NaN sentinel migration
hetoku May 17, 2026
0bbc880
Phase 3a-1: ray_typed_null writes integer sentinels into atom slots
hetoku May 17, 2026
c19de16
Phase 3a-2: CSV parser dual-encodes integer/temporal nulls
hetoku May 17, 2026
eb93750
Phase 3a-3: cast_vec_copy_nulls fills integer sentinels
hetoku May 17, 2026
b8615d7
Phase 3a-4: set_all_null + store_typed_elem write integer sentinels
hetoku May 17, 2026
4d0e450
Phase 3a-5: UPDATE-WHERE numeric promo fills dest-width sentinel
hetoku May 17, 2026
cb7858b
Phase 3a-6: UPDATE atom broadcast fills integer sentinels
hetoku May 17, 2026
2c351cf
Phase 3a-7: group-by null integer key scatter fills sentinels
hetoku May 17, 2026
bf856a6
Phase 3a-8: pivot null integer key fills sentinels
hetoku May 17, 2026
5ce5a67
Phase 3a-9: linkop deref fills integer null sentinels
hetoku May 17, 2026
bf591e4
Phase 3a-10b: TOP_N/BOT_N null-key emit fills width-correct sentinel
hetoku May 17, 2026
1d601f9
Phase 3a-12: integer dual-encoding RFL regression
hetoku May 17, 2026
7524d5f
docs: mark Phase 3a (integer/temporal NULL_* sentinels) complete in N…
hetoku May 17, 2026
1a70e0c
Phase 3a-13: close remaining producer-side dual-encoding gaps
hetoku May 17, 2026
d31a375
Phase 3a: integer / temporal NULL_* sentinel migration
hetoku May 17, 2026
1769c25
Merge origin/master (perf: row-form group operators) into Phase 2+3a …
hetoku May 17, 2026
6ba5e0b
Phase 3b: per-(group, agg) non-null counts + all-null finalization
hetoku May 17, 2026
717feba
Phase 3 follow-up: per-(group, agg) non-null counts + all-null finali…
hetoku May 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions include/rayforce.h
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,51 @@ ray_t* ray_timestamp(int64_t val);
ray_t* ray_guid(const uint8_t* bytes);
ray_t* ray_typed_null(int8_t type);

/* ===== Null Sentinel Values =====
*
* Per-type null encoding for nullable scalar types. Callers compare values
* directly (e.g. `x == NULL_I64`, `x != x` for NaN); there are no predicate
* macros or aliases. Temporal types (DATE/TIME/TIMESTAMP) reuse NULL_I32 or
* NULL_I64 based on their storage width. SYM null = sym ID 0; STR null =
* empty string (length 0); BOOL and U8 are non-nullable.
*
* Phase 1 added the constants and locked BOOL/U8 down as non-nullable.
* Phase 2 wired NULL_F64 into the CSV parser, ray_typed_null, and the
* I64→F64 UPDATE cast — null F64 slots now hold NaN alongside the
* nullmap bit.
* Phase 3a generalized this to integer / temporal types (I16, I32, I64,
* DATE, TIME, TIMESTAMP). Producer surface mirrors Phase 2 — CSV
* parser, ray_typed_null, cast_vec_copy_nulls, set_all_null,
* store_typed_elem (lang/internal.h), UPDATE atom broadcast (3 sites),
* UPDATE WHERE numeric-promo cast, group-by key scatter (serial +
* parallel + grpt TOP_N), pivot key scatter, linkop deref. The
* grouped-aggregation consumer (da_accum_row + scalar_accum_row) gained
* per-agg integer-null guards in the SUM/AVG/STDDEV/VAR/PROD/MIN/MAX/
* FIRST/LAST arms — sentinel-compare (`v != precomputed_sentinel`)
* rather than nullmap consultation for cache-line efficiency; the
* tradeoff (a user-stored INT_MIN in a HAS_NULLS column is dropped)
* is bounded by dual encoding keeping the bitmap as source of truth.
* Phase 3b closed the documented finalization gaps in the
* scalar and direct-array (DA) grouped accumulators: per-(group, agg)
* non-null counts (`nn_count[gid * n_aggs + a]`) drive AVG / VAR /
* STDDEV divisors and gate MIN / MAX / PROD / FIRST / LAST result
* emission — all-null groups now produce a typed null (NULL_F64 /
* NULL_I64 plus the nullmap bit) instead of leaking the accumulator
* seed (DBL_MAX / -DBL_MAX / 0 / product identity). FIRST/LAST also
* gained "skip null rows" semantics: a null prefix no longer advances
* acc->first_row[gid]. The multi-key radix HT (accum_from_entry,
* ~line 2155) still inherits the pre-existing nullable-agg gap noted
* at the sparse-path fallback (~line 5728).
* Through Phase 7 (full cutover) the bitmap bit `nullmap[0] & 1` is
* kept in sync with the sentinel value for atoms ("dual encoding"), so
* legacy bitmap-aware readers and new sentinel-aware readers agree.
* After Phase 7 the bitmap arm is reclaimed for inline stats and the
* bit becomes a pure optimization hint. */
#define NULL_I16 ((int16_t)INT16_MIN)
#define NULL_I32 ((int32_t)INT32_MIN)
#define NULL_I64 ((int64_t)INT64_MIN)
#define NULL_F64 (__builtin_nan(""))

/* Null bitmap check for atoms — bit 0 of nullmap[0] marks typed nulls.
* Also matches RAY_NULL_OBJ (the untyped null singleton). */
#define RAY_ATOM_IS_NULL(x) (RAY_IS_NULL(x) || ((x)->type < 0 && ((x)->nullmap[0] & 1)))
Expand Down
118 changes: 70 additions & 48 deletions src/io/csv.c

Large diffs are not rendered by default.

19 changes: 17 additions & 2 deletions src/lang/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -277,8 +277,23 @@ static inline int64_t elem_as_i64(ray_t* elem) {
* Returns 0 on success, -1 if the element type doesn't match. */
static inline int store_typed_elem(ray_t* vec, int64_t i, ray_t* elem) {
if (RAY_ATOM_IS_NULL(elem)) {
int esz = ray_elem_size(vec->type);
memset((char*)ray_data(vec) + i * esz, 0, esz);
/* Phase 2/3a dual-encoding: payload must carry the width-correct
* sentinel alongside the nullmap bit. */
switch (vec->type) {
case RAY_F64:
((double*)ray_data(vec))[i] = NULL_F64; break;
case RAY_I64: case RAY_TIMESTAMP:
((int64_t*)ray_data(vec))[i] = NULL_I64; break;
case RAY_I32: case RAY_DATE: case RAY_TIME:
((int32_t*)ray_data(vec))[i] = NULL_I32; break;
case RAY_I16:
((int16_t*)ray_data(vec))[i] = NULL_I16; break;
default: {
int esz = ray_elem_size(vec->type);
memset((char*)ray_data(vec) + i * esz, 0, esz);
break;
}
}
ray_vec_set_null(vec, i, true);
return 0;
}
Expand Down
8 changes: 7 additions & 1 deletion src/lang/parse.c
Original file line number Diff line number Diff line change
Expand Up @@ -634,8 +634,14 @@ static ray_t* parse_vector(ray_parser_t *p) {
}
vec->len = count;
for (int32_t i = 0; i < count; i++) {
if (RAY_ATOM_IS_NULL(elems[i]))
if (RAY_ATOM_IS_NULL(elems[i])) {
ray_vec_set_null(vec, i, true);
/* Phase 2 dual-encoding: a non-F64 typed null (0Nl/0Ni/0Nh)
* carries i64 = 0, so the cast above wrote 0.0 to the slot.
* Overwrite with NULL_F64 so raw-payload consumers see NaN.
* Null F64 atoms already carry NULL_F64 from ray_typed_null. */
d[i] = NULL_F64;
}
ray_release(elems[i]);
}
return vec;
Expand Down
62 changes: 60 additions & 2 deletions src/ops/builtins.c
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,41 @@ static ray_t* cast_vec_copy_nulls(ray_t* vec, ray_t* val) {
if (le[j] && RAY_ATOM_IS_NULL(le[j]))
ray_vec_set_null(vec, j, true);
}
/* Phase 2/3a dual encoding: when the destination has nulls, fill each
* null payload slot with the correct-width sentinel so consumers that
* read the raw payload (without consulting the bitmap) honor the null
* contract. Narrowing casts (Hazard 3) require writing the dest-width
* sentinel directly — propagating through the cast macro produces
* (int16_t)NULL_I32 = 0 etc., which collides with a legitimate value. */
if (vec->attrs & RAY_ATTR_HAS_NULLS) {
switch (vec->type) {
case RAY_F64: {
double* d = (double*)ray_data(vec);
for (int64_t j = 0; j < vec->len; j++)
if (ray_vec_is_null(vec, j)) d[j] = NULL_F64;
break;
}
case RAY_I64: case RAY_TIMESTAMP: {
int64_t* d = (int64_t*)ray_data(vec);
for (int64_t j = 0; j < vec->len; j++)
if (ray_vec_is_null(vec, j)) d[j] = NULL_I64;
break;
}
case RAY_I32: case RAY_DATE: case RAY_TIME: {
int32_t* d = (int32_t*)ray_data(vec);
for (int64_t j = 0; j < vec->len; j++)
if (ray_vec_is_null(vec, j)) d[j] = NULL_I32;
break;
}
case RAY_I16: {
int16_t* d = (int16_t*)ray_data(vec);
for (int64_t j = 0; j < vec->len; j++)
if (ray_vec_is_null(vec, j)) d[j] = NULL_I16;
break;
}
default: break;
}
}
return vec;
}

Expand Down Expand Up @@ -1802,8 +1837,13 @@ ray_t* ray_enlist_fn(ray_t** args, int64_t n) {
d[i] = (args[i]->type == -RAY_F64) ? args[i]->f64 : (double)args[i]->i64;
vec->len = n;
for (int64_t i = 0; i < n; i++) {
if (RAY_ATOM_IS_NULL(args[i]))
if (RAY_ATOM_IS_NULL(args[i])) {
/* Dual-encoding contract: the (double)NULL_I64 cast above
* produces a large finite value (~-9.22e18), not NaN. Stamp
* the F64 sentinel so the payload matches the bitmap bit. */
d[i] = NULL_F64;
ray_vec_set_null(vec, i, true);
}
}
return vec;
}
Expand Down Expand Up @@ -2602,8 +2642,26 @@ ray_t* ray_group_fn(ray_t* x) {
* the first row index suffices. */
if (idx_vecs[g] && idx_vecs[g]->len > 0) {
int64_t first_row = ((int64_t*)ray_data(idx_vecs[g]))[0];
if (ray_vec_is_null(x, first_row))
if (ray_vec_is_null(x, first_row)) {
ray_vec_set_null(keys_vec, g, true);
/* Dual-encoding contract: the payload at slot g must hold
* the width-correct null sentinel so sentinel-aware readers
* agree with the bitmap. */
void* base = ray_data(keys_vec);
switch (key_type) {
case RAY_I64: case RAY_TIMESTAMP:
((int64_t*)base)[g] = NULL_I64; break;
case RAY_I32: case RAY_DATE: case RAY_TIME:
((int32_t*)base)[g] = NULL_I32; break;
case RAY_I16:
((int16_t*)base)[g] = NULL_I16; break;
case RAY_F64:
((double*)base)[g] = NULL_F64; break;
case RAY_F32:
((float*)base)[g] = (float)NULL_F64; break;
default: break; /* SYM/BOOL/U8: no sentinel slot */
}
}
}
}

Expand Down
42 changes: 37 additions & 5 deletions src/ops/expr.c
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,11 @@ bool try_linear_sumavg_input_i64(ray_graph_t* g, ray_t* tbl, ray_op_t* input_op,
for (uint8_t i = 0; i < lin.n_terms; i++) {
ray_t* col = ray_table_get_col(tbl, lin.syms[i]);
if (!col || !type_is_linear_i64_col(col->type)) return false;
/* Phase 3a: scalar_sum_linear_i64_fn reads slots raw via
* scalar_i64_at; any nullable term would poison the sum with
* NULL_I{16,32,64} sentinels. Refuse the fast plan and let
* the caller fall back to the generic masked path. */
if (col->attrs & RAY_ATTR_HAS_NULLS) return false;
out_plan->term_ptrs[i] = ray_data(col);
out_plan->term_types[i] = col->type;
out_plan->coeff_i64[i] = lin.coeff_i64[i];
Expand Down Expand Up @@ -936,14 +941,15 @@ static void expr_full_fn(void* ctx, uint32_t worker_id, int64_t start, int64_t e
/* Post-pass for the fused unary path: |INT64_MIN| and -INT64_MIN don't fit in
* i64 (signed-overflow; k/q convention surfaces this as typed null). The
* element-wise loop uses unsigned wrap, so any overflow position lands as
* INT64_MIN in data. Convert each such position to typed-null: zero data[i]
* (preserve "null position is 0" invariant) and set the null bit. Caller
* must invoke single-threaded — after pool dispatch joins. */
* INT64_MIN in data. Post Phase 3a-1, INT64_MIN IS the canonical NULL_I64
* sentinel — the dual-encoding contract requires the payload to *remain*
* INT64_MIN while the null bit is set. So we only need to flip the bitmap
* bit; the payload is already correct. Caller must invoke single-threaded
* — after pool dispatch joins. */
static void mark_i64_overflow_as_null(ray_t* result, int64_t off, int64_t len) {
int64_t* d = (int64_t*)ray_data(result) + off;
for (int64_t i = 0; i < len; i++) {
if (RAY_UNLIKELY(d[i] == INT64_MIN)) {
d[i] = 0;
if (RAY_UNLIKELY(d[i] == NULL_I64)) {
ray_vec_set_null(result, off + i, true);
}
}
Expand Down Expand Up @@ -1228,6 +1234,32 @@ static void set_all_null(ray_t* result, int64_t len) {
} else {
for (int64_t i = 0; i < len; i++) ray_vec_set_null(result, i, true);
}
/* Phase 2/3a dual-encoding: results must also carry the matching
* width sentinel in every payload slot so raw-payload consumers see
* the null marker without consulting the bitmap. */
switch (result->type) {
case RAY_F64: {
double* d = (double*)ray_data(result);
for (int64_t i = 0; i < len; i++) d[i] = NULL_F64;
break;
}
case RAY_I64: case RAY_TIMESTAMP: {
int64_t* d = (int64_t*)ray_data(result);
for (int64_t i = 0; i < len; i++) d[i] = NULL_I64;
break;
}
case RAY_I32: case RAY_DATE: case RAY_TIME: {
int32_t* d = (int32_t*)ray_data(result);
for (int64_t i = 0; i < len; i++) d[i] = NULL_I32;
break;
}
case RAY_I16: {
int16_t* d = (int16_t*)ray_data(result);
for (int64_t i = 0; i < len; i++) d[i] = NULL_I16;
break;
}
default: break;
}
}

/* Propagate null bitmaps for binary ops: null in either operand → null in result. */
Expand Down
Loading
Loading