Skip to content

feat(dataframe): add writeJson with JsonWriteOptions#61

Merged
andygrove merged 2 commits into
apache:mainfrom
LantaoJin:feat/dataframe-write-json
May 19, 2026
Merged

feat(dataframe): add writeJson with JsonWriteOptions#61
andygrove merged 2 commits into
apache:mainfrom
LantaoJin:feat/dataframe-write-json

Conversation

@LantaoJin
Copy link
Copy Markdown
Contributor

@LantaoJin LantaoJin commented May 18, 2026

Which issue does this PR close?

Rationale for this change

DataFrame.writeParquet shipped in #27. JSON is the third writer DataFusion's DataFrame API exposes natively (DataFrame::write_json) and is the easiest format to consume from non-Arrow downstream tooling. The implementation follows the same proto-over-JNI pattern as the merged readers, mirrors the writer-side shape we'd land for CSV (#38), and has zero binary-size impact -- DataFusion's JSON support is in the default feature set, no Cargo flag changes required.

What changes are included in this PR?

  • proto/json_write_options.proto -- new JsonWriteOptionsProto message
  • JsonWriteOptions Java builder
  • Java_org_apache_datafusion_DataFrame_writeJsonWithOptions JNI handler in native/src/json.rs

Are these changes tested?

Yes -- 9 new tests across JsonWriteOptionsTest and DataFrameWriteJsonTest.

Are there any user-facing changes?

Yes -- purely additive. New public API:

  • org.apache.datafusion.JsonWriteOptions
  • DataFrame.writeJson(String)
  • DataFrame.writeJson(String, JsonWriteOptions)

The new org.apache.datafusion.protobuf.JsonWriteOptionsProto generated class is also exposed via the protobuf-Java output, consistent with how CsvReadOptionsProto, NdJsonReadOptionsProto, etc. are exposed. No API removals, no deprecations, no behavior change for existing callers. No Cargo feature changes; binary size is unchanged.

Mirror writeParquet's surface for newline-delimited JSON.
JsonWriteOptions exposes singleFileOutput, partitionCols, and
fileCompressionType; the DataFusion-side JsonOptions only carries
compression in writer mode (the read-side toggles like
newline_delimited and schema_infer_max_rec do not apply here).

JsonOptions has no fluent setters, so the native handler builds it
via struct-update syntax (same idiom as ArrowReadOptions /
AvroReadOptions). Option<JsonOptions> stays None when no
writer-side knob is set, so DataFusion's runtime defaults are
preserved when callers pass new JsonWriteOptions().

When the caller leaves singleFileOutput unset, default to directory
output (with_single_file_output(false)) rather than DataFusion's
Automatic mode. Automatic treats extension-bearing paths like
"out.json" as single-file targets, which would silently contradict
the documented "directory unless overridden" default.
@andygrove
Copy link
Copy Markdown
Member

@LantaoJin could you fix conflict? Thanks

…e-json

# Conflicts:
#	core/src/main/java/org/apache/datafusion/DataFrame.java
@LantaoJin
Copy link
Copy Markdown
Contributor Author

@LantaoJin could you fix conflict? Thanks

Done

@andygrove andygrove merged commit 2ecf3f1 into apache:main May 19, 2026
1 check passed
@LantaoJin LantaoJin deleted the feat/dataframe-write-json branch May 20, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add DataFrame.writeJson with JsonWriteOptions

2 participants