Add icebug mem impl by aheev · Pull Request #496 · LadybugDB/ladybug

aheev · 2026-05-18T13:40:29Z

ArrowNodeTable is reused for icebug-memory node tables
ArrowRelTable is extended to support icebug-memory rel tables
new API createArrowCsrRelTable is added to arrow_table_support to support icebug-memory rel table creation. It takes indices, indptr arrow data, bundles them in ArrowCsrRelData and adds it in csrRegistry

Closes: #329

adsharma · 2026-05-18T23:42:51Z

I realize this is not ready for review. But thanks for sharing it early for feedback!

Two high level concerns and details:

Adds a lot of new code. One of my concerns as a maintainer is to minimize the amount of code to maintain
Ice-mem branding. These things change and I would keep them on the docs/communication and use more technical names such as arrow-csr in the code.

Recommendations:

Keep ArrowNodeTable; do not add IceMemNodeTable.
Extend ArrowRelTable with a layout enum, e.g. TRIPLES vs CSR.
Put the layout-specific edge cursor behind a small helper, not a new table class.
For TRIPLES, keep current behavior: read from/to, lookup node PKs, emit matches.
For CSR, store indices and indptr Arrow arrays in the Arrow rel table data. For FWD scans, use the bound node offset to directly slice CSR. For BWD scans, either require/provide reverse CSR or explicitly fall back to a full scan.
Extend ArrowTableSupport registration with a tagged rel payload, rather than inventing icebug-memory catalog format.

aheev · 2026-05-19T04:41:53Z

Ice-mem branding. These things change and I would keep them on the docs/communication and use more technical names such as arrow-csr in the code.

Brand names tend to change far less frequently than technical details, wouldn’t you agree? Perhaps only on rare occasions. Also, if we were to name it arrow-csr, it might give the impression that we are confining the ice-mem format within both arrow and CSR

Recommendations:

Keep ArrowNodeTable; do not add IceMemNodeTable.

Extend ArrowRelTable with a layout enum, e.g. TRIPLES vs CSR.

Put the layout-specific edge cursor behind a small helper, not a new table class.

For TRIPLES, keep current behavior: read from/to, lookup node PKs, emit matches.

For CSR, store indices and indptr Arrow arrays in the Arrow rel table data. For FWD scans, use the bound node offset to directly slice CSR. For BWD scans, either require/provide reverse CSR or explicitly fall back to a full scan.

There is already a significant amount of redundant state and data in both the table and scan state classes due to inheritance from the native classes. For instance, if you look at setToTable, much of the code is not strictly necessary but is retained to remain compliant with the native class structure. Consequently, any changes to the native classes can propagate and affect the external storage classes as well. I have faced these issues while working on icebug-disk implementation

void ArrowRelTableScanState::setToTable(const transaction::Transaction* transaction, Table* table_,
    std::vector<column_id_t> columnIDs_, std::vector<ColumnPredicateSet> columnPredicateSets_,
    RelDataDirection direction_) {
    // Same behavior as IceDiskRelTable: no local table for external data sources.
    TableScanState::setToTable(transaction, table_, std::move(columnIDs_),
        std::move(columnPredicateSets_));
    columns.resize(columnIDs.size());
    direction = direction_;
    for (size_t i = 0; i < columnIDs.size(); ++i) {
        auto columnID = columnIDs[i];
        if (columnID == INVALID_COLUMN_ID || columnID == ROW_IDX_COLUMN_ID) {
            columns[i] = nullptr;
        } else {
            columns[i] = table->cast<RelTable>().getColumn(columnID, direction);
        }
    }
    csrOffsetColumn = table->cast<RelTable>().getCSROffsetColumn(direction);
    csrLengthColumn = table->cast<RelTable>().getCSRLengthColumn(direction);
    nodeGroupIdx = INVALID_NODE_GROUP_IDX;
}

Even with the ColumnarTableBase abstraction and the majority of common utilities in ArrowUtils, the Arrow classes remain large and complex. Introducing new data and states from icebug-memory only adds to this complexity, increasing redundant compliance code, more conditional checks, and potential risk.

Extend ArrowTableSupport registration with a tagged rel payload, rather than inventing icebug-memory catalog format.

didn't get this part

adsharma · 2026-05-19T05:36:44Z

Extend ArrowTableSupport registration with a tagged rel payload, rather than inventing icebug-memory catalog format.

The Tag in this context is TRIPLES or CSR (feel free to rename). The status right now:

	Parquet	Arrow
Triples	❌	✅
CSR	✅	✅

In the future there may be a legit use case for running cypher over a triple table stored in parquet (we support this for duckdb/sqlite/postgres).

I'm thinking about how to implement this matrix with the most code reuse and maintainability. I agree that sometimes its necessary to duplicate code for clarity. Inheritance is not the best way to share code etc.

But I'm not convinced that we should have 4 cases above * {node, rel} = 8 classes.

adsharma · 2026-05-19T18:03:01Z

Also think about it from the perspective of someone reading the code. They already know what "Arrow" or "CSR" means. If not, it's probably 2 secs away in the browser. But searching for icemem?

aheev · 2026-05-20T01:42:25Z

@adsharma could you PTAL? Couple of clarifications

arrow C data interface doesn't support reading metadata out of the box. How do we go about version checks?
are we planning to have an icebug-format converter similar to icebug-disk?

aheev · 2026-05-20T03:13:20Z

python PR: LadybugDB/ladybug-python#17

aheev · 2026-05-20T03:14:05Z

@adsharma should we add support for optional bwd indices and indptr in icebug-disk too?

adsharma · 2026-05-20T05:26:52Z

should we add support for optional bwd indices and indptr in icebug-disk too?

The problem with providing both fwd and bwd direction in user facing APIs is that users can screw up and provide logically inconsistent data.

This is why storing data in one direction (fwd) and then using --directed is preferable for logical consistency. icebug-format cli.py does this.

However, as a lower level API (when processing directed and computing bwd edges is expensive), there could be a case for an internal API.

For now, let's do only the logically consistent version and punt on the other option.

Not clear we need to version these things, since everything is super well defined and has been around for many years in the industry. Arrow for 10 years and CSR for even longer.

Example: https://github.com/tensorflow/community/blob/master/rfcs/20200519-csr-sparse-matrix.md

aheev · 2026-05-20T06:53:19Z

should we add support for optional bwd indices and indptr in icebug-disk too?

The problem with providing both fwd and bwd direction in user facing APIs is that users can screw up and provide logically inconsistent data.

This is why storing data in one direction (fwd) and then using --directed is preferable for logical consistency. icebug-format cli.py does this.

However, as a lower level API (when processing directed and computing bwd edges is expensive), there could be a case for an internal API.

For now, let's do only the logically consistent version and punt on the other option.

I don't follow. Can you put these together in a issue/disc and discuss there?

aheev force-pushed the add-icebug-mem-impl branch from 69ff7fc to 42ecaa6 Compare May 19, 2026 16:50

aheev changed the title ~~[WIP] Add icebug mem impl~~ Add icebug mem impl May 20, 2026

aheev mentioned this pull request May 20, 2026

implement icebug-memory LadybugDB/ladybug-python#17

Open

aheev added 7 commits May 20, 2026 10:33

add ice-mem node tables

1db2a5d

add ice-mem rel tables

8790dd4

merge ice-mem classes into arrow classes

c83baf1

fix arrowCsrRelTable init

e859bf6

add icebug-memory spec

3f649e7

fix unused parameter transaction

dfb1b80

block alter for external tables

87b5945

aheev force-pushed the add-icebug-mem-impl branch from 9803e33 to 87b5945 Compare May 20, 2026 05:03

fix icebug_disk test

81d8e5f

expand test suite

2da243c

aheev mentioned this pull request May 21, 2026

Add Arrow CSR and ice-disk FLAT relationship layouts #506

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add icebug mem impl#496

Add icebug mem impl#496
aheev wants to merge 9 commits into
LadybugDB:mainfrom
aheev:add-icebug-mem-impl

aheev commented May 18, 2026 •

edited

Loading

Uh oh!

adsharma commented May 18, 2026

Uh oh!

aheev commented May 19, 2026

Uh oh!

adsharma commented May 19, 2026

Uh oh!

adsharma commented May 19, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

adsharma commented May 20, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aheev commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adsharma commented May 18, 2026

Uh oh!

aheev commented May 19, 2026

Uh oh!

adsharma commented May 19, 2026

Uh oh!

adsharma commented May 19, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

adsharma commented May 20, 2026

Uh oh!

aheev commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aheev commented May 18, 2026 •

edited

Loading