Description
Add native scan support for Apache Hudi Copy-On-Write (COW) tables using Auron's vectorized execution engine. This enhancement enables Auron to accelerate Hudi table queries by converting FileSourceScanExec operations on Hudi tables to native Parquet/ORC scan implementations.
Scope
Supported Features
- COW (Copy-On-Write) tables: Native scan for Parquet and ORC base files
- Configuration switch:
spark.auron.enable.hudi.scan (default: true)
- Timestamp handling: Automatic fallback when native timestamp scanning is disabled
Limitations (Initial Version)
- MOR (Merge-On-Read) tables: Not supported, automatically falls back to Spark
- Time travel queries: Falls back to Spark to preserve metadata semantics
- Spark version: Only Spark 3.0–3.5 (Spark 4.x not supported)
- Hudi version: Only Hudi 0.15.0