feat: add support for generating JSON formatted substrait plan #1376
+348
−59
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #508
Rationale for this change
Other Substrait producers (Isthmus, DuckDB) support generating JSON-formatted Substrait plans, which makes it much easier to inspect and compare plans across engines. The substrait crate already provides serde support via the pbjson feature, so DataFusion Python can leverage this to expose JSON plan generation and parsing with minimal effort.
What changes are included in this PR?
python/datafusion/substrait.pyto_json()instance method andfrom_json()static method on thePlanclass for converting plans to/from JSON strings.serialize_json()anddeserialize_json()static methods on theSerdeclass for writing/reading JSON plan files.src/substrait.rsfn to_json(&self) -> PyDataFusionResult<String>#[staticmethod] fn from_json(json: &str) -> PyDataFusionResult<PyPlan>#[staticmethod] pub fn serialize_json(sql: &str, ctx: PySessionContext, path: &str, py: Python) -> PyDataFusionResult<()>#[staticmethod] pub fn deserialize_json(path: &str) -> PyDataFusionResult<PyPlan>src/errors.rsSerdeJsonErrorandDecodeErrorerror variants toPyDataFusionErrorfor proper error handling.Cargo.tomlserde_jsonas a dependency.python/tests/test_substrait.pyAre there any user-facing changes?
Yes — four new public API methods:
Plan.to_json() -> str— convert a Substrait plan to a JSON stringPlan.from_json(json: str) -> Plan— create a Substrait plan from a JSON stringSerde.serialize_json(sql, ctx, path)— generate a JSON-formatted Substrait plan file from a SQL querySerde.deserialize_json(path) -> Plan— read a Substrait plan from a JSON file