The current SessionContext.registerParquet(name, path) is hardwired to ParquetReadOptions::default(), and there's no Java entry point for DataFusion's read_parquet(path, options) -> DataFrame.
Expose:
- A
ParquetReadOptions Java class with fluent setters for fileExtension, parquetPruning, skipMetadata, metadataSizeHint, and an explicit schema (Arrow Java Schema).
- An overload
SessionContext.registerParquet(name, path, options).
- A new method
SessionContext.readParquet(path[, options]) that returns a DataFrame without registering.
Marshalled across JNI as flat primitive arguments plus optional Arrow IPC bytes for the schema (reuses the IPC mechanism from #8 / #13).
Out of scope for this issue: table_partition_cols, file_sort_order, file_decryption_properties. CSV/JSON/Avro analog option classes.
The current
SessionContext.registerParquet(name, path)is hardwired toParquetReadOptions::default(), and there's no Java entry point for DataFusion'sread_parquet(path, options) -> DataFrame.Expose:
ParquetReadOptionsJava class with fluent setters forfileExtension,parquetPruning,skipMetadata,metadataSizeHint, and an explicitschema(Arrow JavaSchema).SessionContext.registerParquet(name, path, options).SessionContext.readParquet(path[, options])that returns aDataFramewithout registering.Marshalled across JNI as flat primitive arguments plus optional Arrow IPC bytes for the schema (reuses the IPC mechanism from #8 / #13).
Out of scope for this issue:
table_partition_cols,file_sort_order,file_decryption_properties. CSV/JSON/Avro analog option classes.