-
Notifications
You must be signed in to change notification settings - Fork 585
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Time to define 2026 features. Feel free to add more or comment.
For more information about the achievements in 2025 Roadmap, please check #8226
For more information about the achievements in 2024 Roadmap, please check #4709
Apache Gluten graduated to an ASF Top-Level Project in February 2026. This roadmap tracks the community's goals for 2026.
Spark Compatibility
- Spark 4.0/4.1 GA support — fix all disabled test suites (Spark 4.x: Tracking disabled test suites #11550, [VL] Track on Spark-4.1.x failed unit tests #11400)
- TIMESTAMP_NTZ type support ([VL] Support TIMESTAMP_NTZ Type #11622, [GLUTEN-11622][VL] Add basic TIMESTAMP_NTZ type support #11626)
- Variant shredding support for Parquet reader and writer (Spark 4.0) ([VL][Spark-4.0] Add Variant shredding support for Parquet reader and writer #11371)
- Parquet type widening support (SPARK-40876) ([VL] Support type widening in Parquet reader (SPARK-40876) #11683, [GLUTEN-11683][VL] Add Parquet type widening support #11719)
- Other Spark Type Support: DayTimeIntervalType, CalendarIntervalType, TimeType, GeographyType, GeometryType
- ANSI mode support ([VL] Add ANSI mode support #10134)
- Complete remaining unsupported Spark functions ([VL] Unsupported spark function list [please leave a comment if you plan to pick some] #4039)
- Deprecate Spark 3.2/3.3 and JDK 8
Performance Optimization
- Use runtime stats to choose hash build side (HashBuild OOM caused by incorrect build side #11774, [GLUTEN-11774][VL] Use runtime stats to choose hash build side #11775)
- Bloom filter optimization — translate might_contain as subfield filter (Bloom filter optimization #11771, [VL] Translate might_contain as a subfield filter #11708, [GLUTEN-11708][VL] Translate might_contain as a subfield filter for scan-level bloom filter pushdown #11711)
- Partial project UDF optimization (Partial Project UDF optimization #11783)
- Execution-aware dynamic join strategy selection after filter execution (Proposal: Add execution‑aware dynamic join strategy selection after filter execution #11808)
- Parquet metadata check limit optimization (Parquet metadata check limit optimization #11782)
- Push dynamic filters to shuffle reader with per-block column statistics ([VL] Push Velox dynamic filters to shuffle reader #11605, [GLUTEN-11605][VL] Write per-block column statistics in shuffle writer #11769)
- Multi-core per task ([VL] Use multiple threads in the same executor #7810)
- Spill enhancement — streaming window functions ([VL] Spill related issues tracker #3030)
- Pick split with most data prefetched ([VL] pick split with most data prefetched #11821)
- Complex type Row-to-Columnar optimization
Native Engine Integration
- Switch to upstream Velox official release ([VL] upstream OAP/Velox commits to upstream #8782)
- Upstream useful Velox PRs not merged from Gluten community ([VL] useful Velox PRs not merged into upstream #11585)
- Bolt backend integration — ByteDance native engine with LLVM JIT (WIP: add bolt backend in gluten #11261, Add a new backend: Bolt #10929)
- ClickHouse backend upgrade ([CH]Update Clickhouse Version (Branch_25.12_20260310) #11734)
- Kafka read support for Velox backend ([VL] Adding kafka read support for Velox backend #11801)
Data Lake & File Formats
- Full Iceberg support — map write configs with Velox (Map iceberg configuration with Velox configuration #11703, Added iceberg write configs #11776)
- Native Parquet write for complex types (Struct/Array/Map) ([VL] Enable native Parquet write for complex types (Struct/Array/Map) #11788)
- Iceberg equality delete MOR table support ([WIP][GLUTEN-8055][VL] Support read Iceberg equality delete file MOR table #8056)
- Hudi MOR table support
- Delta Lake feature parity
- JSON file format support
- ORC writer support
GPU & Hardware Acceleration
- GPU BHJ bug fix ([VL] GPU failed on BHJ #11794)
- Multi-threaded decompression in GPU shuffle reader (Support multi-threaded decompression in the GPU shuffle reader #11779, [GLUTEN-11779][VL] Support multi-threaded decompression in the GPU shuffle reader #11780)
- GPU code cleanup and stabilization ([VL] Clean up GPU code: remove dead/redundant code #11824)
- ARM SVE optimization
- FPGA accelerator exploration
Flink Integration
- Fix Flink memory leak with RocksDB state backend ([FLINK] Memory leak when trying to run nexmark with rocksdb state backend #11791)
- Fix Flink CI build failures ([FLINK] CI build failure for flink-test #11793)
- Stabilize Flink + Velox from experimental to beta
- Nexmark benchmark support ([FLINK] Nexmark Q3 submission error #11790)
PySpark & Python Ecosystem
- PySpark Python UDF support
- Arrow UDF support
- Fix Python UDF/UDTF test suites on Spark 4.x (Spark 4.x: Tracking disabled test suites #11550)
Stability & Quality
- OOM prevention and memory stability (OOM but memory is enough #11747, [VL] Distinct aggregation OOM when getOutput #8025)
- Full fuzzer support and result mismatch resolution ([VL] Result mismatch issues tracker #4652)
- Timezone edge case fixes ([VL]Velox rejects America/Coyhaique timezone (CentOS 9) results Gluten job failure #11597)
- Complex type validation in native engine ([VL] [BUG] Complex types already supported in Velox are considered not supported by Gluten #11746, [VL] CrossRelNode's expression is not validated in native validation #11678)
- Support getting C++ stack traces via GDB from Spark UI (Support get cpp stack using gdb from Spark UI #11677)
- collect_set ignoreNulls support ([VL] Enable
collect_setignoreNulls #11826)
Build, CI & Developer Experience
- VCPKG for macOS build ([GLUTEN-9577][VL] Enable VCPKG for MacOS build #11563)
- IWYU tool for C++ code format checking ([GLUTEN-11287][VL] Use IWYU tool to check code format #11287)
- Cache Maven dependencies in CI ([VL][CI] cache maven deps m2 repo #11655)
- Fix ARM64/aarch64 build issues (Build failures when building Apache Gluten with Velox on Azure Arm64 (Ubuntu 24.04) #11633, Bundle build failure on Azure Cobalt aarch64 #11639)
- Fix glog macOS build breakage ([Build] branch-1.5 build fail on macOS due to breaking change introduced by glog #11763)
- Docker testing improvements ([VL] Caching java dependencies in testing docker #11501)
- Remove Arrow-CSV dependency ([VL] Remove Arrow-CSV dependency #11591)
Community & Governance
- Complete TLP graduation remaining tasks ([Umbrella] Apache Gluten Graduation Tasks #11713)
- Quarterly releases: 1.7 / 1.8 / 1.9 / 2.0
- GlutenCon 2026
- Expand ecosystem documentation (Kyuubi, Celeborn, Velox)
- openEuler OS support
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request