Data declaration extensions by comnik · Pull Request #216 · RelationalAI/logical-query-protocol

comnik · 2026-02-25T01:04:21Z

RelEDB → EDB, dropping the "Rel" prefix which we've been using to indicate constructs that will go away once we drop Rel support. This is not the case for EDB references.

CSVColumn → GNFColumn, generalizing to make column declarations reusable between CSV and Iceberg data. GNFColumn can represent nested paths like /:append/:METADATA$KEY.

Generalize Snapshot to support multiple mappings. The snapshot implementation on the engine is more awkward than I had hoped for, and this allows us to at least do it more efficiently.

comnik · 2026-02-25T21:36:22Z

meta/src/meta/yacc_action_parser.py

Claude says: the grammar's gnf_column_path rule indexes into a Sequence[String] with $$[0], which requires the sequence/list indexing support added in that file.

Right, GetElement was just for tuple types. We never had a need for indexing into lists, until now.

tests/lqp/snapshot.lqp

meta/src/meta/grammar.y

minsungc · 2026-02-25T23:33:44Z

meta/src/meta/grammar.y

+    | "[" STRING* "]"
+      construct: $$ = $2
+      deconstruct if builtin.length($$) != 1:
+        $2: Sequence[String] = $$


Is there semantics for when gnf_column_path is a sequence of length 1? Is it something we can explicitly disallow?

I think the common case is a path of length 1 (i.e. just the column name). We do support it, we just don't require wrapping it in [].

minsungc

Quick clarifying question otherwise LGTM!

davidwzhao

Nice, thanks!

davidwzhao · 2026-02-26T05:13:10Z

proto/relationalai/lqp/v1/logic.proto

  CSVLocator locator = 1;
  CSVConfig config = 2;
-  repeated CSVColumn columns = 3;
+  repeated GNFColumn columns = 3;


I wonder if at some point we can unify or better distinguish between columns used for data ingestion vs data export? We currently also have ExportCSVColumn which is actually pretty similar to GNFColumn, just without the types field.

My work on NOT NULL constraints (inclusion dependencies) may have adjacencies to this
consideration. Please keep me in the loop.

Huh, yeah good point. Maybe even lower than that, it seems that we have the need for a mapping from RelationId to a String[] path (possibly of length 1) in several places, most recently the SnapshotMapping.

davidwzhao · 2026-02-26T05:32:57Z

tests/lqp/cdc.lqp

🎉 cool preview of the future ahead

Dress for the job you want... ;)

nystrom

Looks good, thanks!

nystrom · 2026-02-26T10:49:22Z

Looks like you need to rebuild the printers

staworko

Nothing on my side just a small question to check if I'm missing something

staworko · 2026-02-26T11:27:50Z

meta/src/meta/grammar.y

-rel_edb
-    : "(" "rel_edb" relation_id rel_edb_path rel_edb_types ")"
-      construct: $$ = logic.RelEDB(target_id=$3, path=$4, types=$5)
+edb


As much as I love Rel, I like the removal of references to rel_ from the raicode.

staworko · 2026-02-26T11:31:09Z

meta/src/meta/grammar.y

-snapshot
-    : "(" "snapshot" rel_edb_path relation_id ")"
-      construct: $$ = transactions.Snapshot(destination_path=$3, source_relation=$4)
+snapshot_mapping


I don't see what has changed here except for the ability to do a snapshot of multiple
relations at once. Is that the whole change? Reading on.

That's the whole change, yeah, multiple snapshots in one action 👍 Its a little awkward to have to do this rather than just having multiple snapshot actions in the same epoch, but it simplifies the implementation on the engine side.

staworko · 2026-02-26T11:33:39Z

proto/relationalai/lqp/v1/logic.proto

  CSVLocator locator = 1;
  CSVConfig config = 2;
-  repeated CSVColumn columns = 3;
+  repeated GNFColumn columns = 3;


My work on NOT NULL constraints (inclusion dependencies) may have adjacencies to this
consideration. Please keep me in the loop.

staworko · 2026-02-26T11:37:10Z

tests/lqp/snapshot.lqp

-      (snapshot ["my_edb"] :my_rel)
-      (snapshot ["database" "table"] :computed)
-      (snapshot ["schema" "namespace" "relation"] :big_signed))
+      (snapshot


Yeah, I still don't see any big change other than multiple relation at one snapshot rather
than one per snapshot. Am I missing something?

Naturally, there are conceivable reason why this is smarter e.g., triggering snapshot
creation comes with an overhead. but this doesn't need any additional explanation.

See my comment above, it has to do with some awkwardness in how to implement snapshot on the engine, because it is kind of both, a read and a write, which doesn't fit well into Arroyo's epoch model. There would be ways to implement it in a nicer way, where could just support many single snapshot actions in one epoch, but not in the short term.

staworko

LGTM

comnik added 6 commits February 25, 2026 01:32

Extend column

1e4dae5

CSVColumn -> GNFColumn

74d4e49

IcebergRelation -> IcebergData

96a7958

Simplify grammar and rename csv_column -> gnf_column

2de9f39

RelEDB -> EDB

8c4b474

Update SDKS

142e4a1

comnik self-assigned this Feb 25, 2026

Merge remote-tracking branch 'origin/main' into ncg-load-cdc

6e16b86

comnik commented Feb 25, 2026

View reviewed changes

Generalize Snapshot action

500175d

comnik marked this pull request as ready for review February 25, 2026 21:41

comnik requested review from davidwzhao, nystrom and staworko February 25, 2026 21:42

minsungc reviewed Feb 25, 2026

View reviewed changes

tests/lqp/snapshot.lqp Show resolved Hide resolved

Remove stray binary

a93518a

minsungc reviewed Feb 25, 2026

View reviewed changes

meta/src/meta/grammar.y Outdated Show resolved Hide resolved

minsungc reviewed Feb 25, 2026

View reviewed changes

minsungc approved these changes Feb 25, 2026

View reviewed changes

davidwzhao approved these changes Feb 26, 2026

View reviewed changes

nystrom approved these changes Feb 26, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into ncg-load-cdc

d3097fb

staworko reviewed Feb 26, 2026

View reviewed changes

staworko approved these changes Feb 26, 2026

View reviewed changes

comnik merged commit 5e27319 into main Feb 26, 2026
6 checks passed

comnik deleted the ncg-load-cdc branch February 26, 2026 11:49

Conversation

comnik commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

minsungc Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minsungc left a comment

Choose a reason for hiding this comment

Uh oh!

davidwzhao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nystrom left a comment

Choose a reason for hiding this comment

Uh oh!

nystrom commented Feb 26, 2026

Uh oh!

staworko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

staworko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

comnik commented Feb 25, 2026 •

edited

Loading

minsungc Feb 25, 2026 •

edited

Loading