Skip to content

PACKED logical and physical operators#584

Merged
adsharma merged 9 commits into
mainfrom
ffx
Jun 18, 2026
Merged

PACKED logical and physical operators#584
adsharma merged 9 commits into
mainfrom
ffx

Conversation

@adsharma

@adsharma adsharma commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Context: https://amine.io/papers/2026-sigmod-ffx.pdf

Based on the cited work, we implement:

  • PACKED_EXTEND (logical and physical)
  • PACKED_FILTER_COUNT (physical)

operators. More physical operators can be implemented in the future based on the desired optimization.

On this query:

  CALL enable_packed_path_extend=true;
  MATCH
    (:Leaf)-[:R1]->(c:Center)<-[:R2]-(l2:Leaf),
    (c)<-[:R3]-(l3:Leaf)
  WITH c.grp AS grp, l2.id AS l2id, l3.id AS l3id
  WHERE (l2id + l3id) % 10 = 0
  RETURN grp, count(*) AS cnt
  ORDER BY cnt DESC;

using the following synthetic graph:

from pathlib import Path

out = Path("/tmp/ffx_synth_csv")
centers = 1000
fanout = 200

with (out / "center.csv").open("w") as f:
    for c in range(centers):
        f.write(f"{c},{c % 10}\n")

with (out / "leaf.csv").open("w") as f:
    for i in range(centers * fanout):
        f.write(f"{i}\n")

for rel in ["r1", "r2", "r3"]:
    with (out / f"{rel}.csv").open("w") as f:
        for c in range(centers):
            base = c * fanout
            for j in range(fanout):
                f.write(f"{base + j},{c}\n")

we got:

baseline 0.282s ±0.1306,
packed 0.072s ±0.0056

@queryproc queryproc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through the code and had a few comments. The first point reflects my understanding, so please correct me if I am mistaken; the second and third are comments on what to address and what to keep in mind, respectively.

  1. My understanding is that you are reusing the adjacency-list scan for a physical packed extend. In a scan -> extend pipeline, the extend operator performs lookups over the input, processes multiple input nodes in a single processNextChunk call, and writes the offsets for each list of matches. Once the output vector is full, it sends the results to the next operator.

Furthermore, you do not need to maintain explicit start and end positions in the state. For example, suppose you are processing 2048 input elements and each expands to roughly 10 outputs. What you need is to maintain an active slice over the input. Since you already track the parent positions and write the offsets in a packed fashion, the active slice is implicitly available.

  1. I assume you are building this progressively, which is why there is not yet a notion of cascade updates. Otherwise, cascade updates are necessary for correctness. For example, consider expanding from a set of a nodes to b nodes, and then from those b nodes to c nodes. If some b nodes do not have matching c nodes, this needs to be propagated upward so that the corresponding a entries are also invalidated when all of their b matches fail to produce c matches. This is discussed in Section 5.2 of the paper.

  2. Eventually, one major difficulty beyond expand operators is making expression evaluation aware of the indirection introduced by offsets and optimizing for it. In particular, expression evaluation needs to correctly align the operand vectors before applying binary operations. We have some work on this over the summer, and I will keep you in the loop.

class LBUG_API DataChunkState {
public:
struct PackedChildSlices {
std::vector<sel_t> parentPositions;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: The max sizes are known at initialization time, I would use an array instead of vector and allocate them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with vector::reserve() instead to optimize allocations. arrays could overflow stack and not clear we know the sizes at compile time.

DASSERT(packedChildSlices.has_value());
return *packedChildSlices;
}
void setPackedChildSlices(std::vector<sel_t> parentPositions, std::vector<sel_t> offsets) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure if this move is what updates the Datachunk slice. To get back to.

Comment thread src/include/common/data_chunk/data_chunk_state.h
Comment thread src/include/main/settings.h
adsharma added 2 commits June 17, 2026 12:24
Use nodeIDVector selVector[0], not currBoundNodeIdx

ScanRelTable::updatePackedChildSlices previously read the parent position
from cachedBoundNodeSelVector[currBoundNodeIdx], but currBoundNodeIdx can
advance past the current parent within a single scan() call (it is
incremented when a parent's CSR list is fully consumed). This produced a
wrong parentPos, breaking the factorized correlation between extends and
yielding 0 results for multi-hop packed-extend queries.

Restore the original, correct logic: the CSR scan sets nodeIDVector to
flat with selVector[0] pointing at the actual parent whose children are in
the output (via setNodeIDVectorToFlat). Use that as parentPos and set a
single-parent slice (overwrite), matching the one-parent-per-batch scan
architecture. Also remove the clear-at-start added in getNextTuplesInternal
since setSingleParentPackedChildSlice already overwrites.

The appendPackedChildSlice API on DataChunkState is retained for future
multi-parent packing. Fix the drops-parents test to iterate via hasNext().
@adsharma

Copy link
Copy Markdown
Contributor Author

@queryproc Thanks for the review!

  1. Commit 5139223 addresses it
  2. Added a unittest to verify
  3. Looking forward to seeing your work this summer and learning from it

docs/multi_parent_lifetime.md contains some of what we could be doing, but deferred for now to keep this PR manageable.

@adsharma adsharma merged commit 34b3b3f into main Jun 18, 2026
4 checks passed
@adsharma adsharma deleted the ffx branch June 18, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants