You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#4 generates Java classes for the datafusion-proto schema. This issue is the end-to-end wiring that makes those classes useful — Java code constructs a LogicalPlanNode, hands it to SessionContext, and gets back a DataFrame that streams Arrow batches the same way ctx.sql(...) does today.
Sketch
Three pieces sit between Java-built protobuf and execution:
Java surface. A new method on SessionContext:
publicDataFramefromProto(byte[] planBytes);
No other Java work is needed — the protobuf builders generated by Add datafusion-proto build #4 are the public construction API.
JNI bridge. A single new native method that takes the byte array and returns a DataFrame pointer, modeled on the existing sql() JNI shim. Arrow FFI stays on the result path; the plan crosses JNI as a primitive byte[].
Rust deserialization + execution. Add datafusion-proto = \"53\" to native/Cargo.toml, then in the JNI implementation:
let node = datafusion_proto::protobuf::LogicalPlanNode::decode(&bytes[..])?;let plan = node.try_into_logical_plan(&ctx.state(),&DefaultLogicalExtensionCodec{})?;let df = runtime.block_on(ctx.execute_logical_plan(plan))?;
Wrap the resulting DataFrame the same way sql() does and return its pointer.
Open design questions for this issue
Logical vs. physical plan first. Recommend logical so DataFusion's own optimizer runs; physical is a follow-up.
Schema discovery. A JVM caller building a TableScanNode needs the schema of registered tables. Likely a new JNI shim like SessionContext.tableSchema(name) -> org.apache.arrow.vector.types.pojo.Schema. Without this, the API is not usable for anything but trivial plans.
Extension codec.DefaultLogicalExtensionCodec is fine for now; a real codec arrives with Java-defined UDFs (the fourth roadmap item).
Round-trip tests. Build a plan in Java, execute it, and compare results against the same query run via ctx.sql(...) over the TPC-H integration data.
Out of scope here
Java-defined UDFs and their custom extension codec.
Physical plan submission.
Any builder helpers / fluent API on top of the raw protobuf builders.
Follow-up to #4.
#4 generates Java classes for the
datafusion-protoschema. This issue is the end-to-end wiring that makes those classes useful — Java code constructs aLogicalPlanNode, hands it toSessionContext, and gets back aDataFramethat streams Arrow batches the same wayctx.sql(...)does today.Sketch
Three pieces sit between Java-built protobuf and execution:
Java surface. A new method on
SessionContext:No other Java work is needed — the protobuf builders generated by Add datafusion-proto build #4 are the public construction API.
JNI bridge. A single new native method that takes the byte array and returns a
DataFramepointer, modeled on the existingsql()JNI shim. Arrow FFI stays on the result path; the plan crosses JNI as a primitivebyte[].Rust deserialization + execution. Add
datafusion-proto = \"53\"tonative/Cargo.toml, then in the JNI implementation:Wrap the resulting
DataFramethe same waysql()does and return its pointer.Open design questions for this issue
TableScanNodeneeds the schema of registered tables. Likely a new JNI shim likeSessionContext.tableSchema(name) -> org.apache.arrow.vector.types.pojo.Schema. Without this, the API is not usable for anything but trivial plans.DefaultLogicalExtensionCodecis fine for now; a real codec arrives with Java-defined UDFs (the fourth roadmap item).ctx.sql(...)over the TPC-H integration data.Out of scope here