Skip to content

Expose ParquetReadOptions via registerParquet and readParquet #17

@andygrove

Description

@andygrove

The current SessionContext.registerParquet(name, path) is hardwired to ParquetReadOptions::default(), and there's no Java entry point for DataFusion's read_parquet(path, options) -> DataFrame.

Expose:

  • A ParquetReadOptions Java class with fluent setters for fileExtension, parquetPruning, skipMetadata, metadataSizeHint, and an explicit schema (Arrow Java Schema).
  • An overload SessionContext.registerParquet(name, path, options).
  • A new method SessionContext.readParquet(path[, options]) that returns a DataFrame without registering.

Marshalled across JNI as flat primitive arguments plus optional Arrow IPC bytes for the schema (reuses the IPC mechanism from #8 / #13).

Out of scope for this issue: table_partition_cols, file_sort_order, file_decryption_properties. CSV/JSON/Avro analog option classes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions