Wire Cypher IR -> DataFusion plan execution last mile (read-only)

## Why

The Cypher pipeline in `crates/lance-graph` parses Cypher into an AST (`parser::parse_cypher_query` -> `ast::CypherQuery`), lowers it to a graph IR (`logical_plan::LogicalOperator` via `LogicalPlanner`), and translates that IR to a DataFusion `LogicalPlan` via `datafusion_planner::DataFusionPlanner::plan()`. The "last mile" -- registering tables on a `SessionContext`, executing the produced `LogicalPlan`, and streaming Arrow `RecordBatch` results back to callers -- is not yet wired end-to-end on a single supported entry point. Today, callers either get an unparsed SQL string (via the dialect `Unparser`), a graph-shaped JSON, or a partially executed path that does not cover the IR features the planner already supports.

This blocks q2's cockpit from running ad-hoc Cypher: the cockpit currently renders pre-computed graphs. Once the last mile is in place, cockpit-issued Cypher queries flow `parse -> IR -> DataFusionPlanner -> SessionContext::execute_logical_plan -> Arrow stream` and the cockpit renders live results.

## What

Wire a single, supported execution entry point on `CypherQuery` (e.g. `execute_with_context(&self, ctx: &SessionContext) -> Result<SendableRecordBatchStream>`) that:

1. Parses Cypher (already done by `parse_cypher_query`).
2. Substitutes parameters (`parameter_substitution`).
3. Runs semantic analysis (`semantic`).
4. Lowers to graph IR via `LogicalPlanner` (`logical_plan::LogicalOperator`).
5. Calls `DataFusionPlanner::plan(&ir)` -> DataFusion `LogicalPlan`.
6. Registers required tables on the provided `SessionContext` using the existing `GraphSourceCatalog` / `TableReader` plumbing (`table_readers`, `sql_catalog`, `lance_graph_catalog`).
7. Optionally registers UDFs (`datafusion_planner::udf`, plus `cam_distance` / `cam_heel_distance` if `with_cam_codebook` was used).
8. Calls `SessionContext::execute_logical_plan(plan)` and returns the resulting `SendableRecordBatchStream` (and a convenience `collect()` variant returning `Vec<RecordBatch>`).

Out of band, expose this through the Python binding (`crates/lance-graph-python`) so the cockpit can drive it directly.

## Architecture

Code locations (all under `crates/lance-graph/src/`):

- Parser: `parser.rs` -- `parse_cypher_query` produces `ast::CypherQuery`.
- AST: `ast.rs` -- `CypherQuery`, `ReadingClause`, `BooleanExpression`, `RelationshipDirection`, etc.
- Semantic: `semantic.rs`.
- Graph IR: `logical_plan.rs` -- `LogicalOperator` (Scan / Filter / Expand / Project / Aggregate / variable-length Expand).
- IR -> DataFusion: `datafusion_planner/mod.rs` -- `DataFusionPlanner::plan(&LogicalOperator) -> LogicalPlan` (two-phase: `analysis` then `builder`). Subordinates: `scan_ops.rs`, `join_ops.rs`, `expression.rs`, `predicate_pushdown.rs`, `vector_ops.rs`, `udf.rs`, `analysis.rs`, `cost_estimation.rs`, `config_helpers.rs`.
- Catalog / readers: `sql_catalog.rs`, `table_readers.rs`, `lance_graph_catalog::GraphSourceCatalog`.
- High-level entry: `query.rs` -- `CypherQuery` already holds `query_text`, `ast`, `config`, `parameters`, `cam_codebook` and exposes `with_*` builders. The `execute_with_context` entry point referenced in inline comments is the seam to wire.

What is missing on the IR -> LogicalPlan -> execution path:

- A single `CypherQuery::execute_with_context` (and a `collect`/`stream` pair) that runs the whole pipeline on a caller-provided `SessionContext` instead of returning SQL, JSON, or a partially executed result.
- Catalog -> `SessionContext` table registration glue: walk `analysis::QueryAnalysis::required_datasets` (or equivalent) and call `ctx.register_table` for each, using `default_table_readers()` / `ParquetTableReader` / `DeltaTableReader` from `table_readers.rs` and the `GraphSourceCatalog` resolver from `sql_catalog.rs`.
- UDF registration helper that registers everything in `datafusion_planner::udf` plus optional CAM UDFs from `cam_pq` when `cam_codebook` is set.
- Variable-length pattern (`*1..N`) end-to-end smoke: the planner already emits unrolled UNIONs (mod.rs Phase 2 doc), but execution has not been smoke-tested for the cockpit path.

## Acceptance criteria

Scope: read-only Cypher only.

- [ ] `CypherQuery::execute_with_context(&self, ctx: &SessionContext) -> Result<SendableRecordBatchStream>` exists, plus a `collect_with_context` returning `Vec<RecordBatch>`.
- [ ] Helper that takes a `GraphSourceCatalog` + `SessionContext` and registers all tables required by `QueryAnalysis`.
- [ ] Helper that registers all `datafusion_planner::udf` UDFs (and CAM UDFs when `cam_codebook` is set).
- [ ] Python binding in `crates/lance-graph-python` exposes the new entry point so q2 cockpit can call it directly.
- [ ] Integration tests under `crates/lance-graph/tests/` execute against an in-memory / temp-dir Lance dataset (or `MemTable`) and assert returned `RecordBatch` schema + row counts:
  - `MATCH (n)-[r]->(m) RETURN n, r, m` -- two-hop returning node + rel + node columns.
  - `MATCH (n) WHERE n.label = 'foo' RETURN n` -- filter pushdown via `predicate_pushdown.rs`.
  - `MATCH (n)-[*1..3]-(m) RETURN m` -- variable-length expansion via UNION unroll.
  - `MATCH (n) RETURN COUNT(n)` -- aggregation through DataFusion.
- [ ] Errors from each stage (parse / semantic / lower / plan / execute) surface as `GraphError` with stage context.
- [ ] Streaming path verified: caller can pull `RecordBatch` batches without materializing the full result.

## Out of scope

- Write Cypher: `CREATE`, `DELETE`, `SET`, `REMOVE`, `MERGE`.
- Cypher-specific aggregations beyond `COUNT`, `SUM`, `AVG` (no `COLLECT`, `PERCENTILE_*`, etc. in this issue).
- New optimizer rules; we only consume what `DataFusionPlanner` already produces.
- Lance-native executor path (`ExecutionStrategy::LanceNative`) and BlasGraph path (`ExecutionStrategy::BlasGraph`) -- this issue is `ExecutionStrategy::DataFusion` only.
- Cockpit UI changes -- consumed via Python binding in a follow-up.

## Dependencies

- Independent. Can land in parallel with A1 and A3 -- this is the ontology / Cypher spine, not a downstream consumer.
- Touches only `crates/lance-graph` and `crates/lance-graph-python` (binding shim). Does not require changes in `lance-graph-planner` or other sibling crates.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire Cypher IR -> DataFusion plan execution last mile (read-only) #332

Why

What

Architecture

Acceptance criteria

Out of scope

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wire Cypher IR -> DataFusion plan execution last mile (read-only) #332

Description

Why

What

Architecture

Acceptance criteria

Out of scope

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions