-
Notifications
You must be signed in to change notification settings - Fork 582
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Here listed Gluten roadmap in 2024. We will use this list to have an overview for Gluten, an easy way to track feature support plan, and to avoid duplicate work before starting the implementation.
- Migrate to Apache Gluten: https://issues.apache.org/jira/browse/INFRA-25506
- Partially Done: Installation Method: pip(Pending)/conda/docker
- Spark version support: Spark3.4 UT([VL] Track all the failed unit test in Spark 3.4. #3559 )(done), Spark 3.5 API/UT([GLUTEN-4424] Explore adding Spark 35 w/ Scala 2.12 only (WIP) #4425 )
- Partially Done: Data Type Support: complex, decimal32, timestamp, timestampNTZ, Calendar, Interval, UDT
- Partially Done, fuzz tests support decimal/agg, Spark query runner in Velox(Enable decimal fuzzer test in experimental workflow facebookincubator/velox#10948 misc: Enable Spark query runner as reference in aggregation fuzzer test facebookincubator/velox#9559)
- OS support: move to use static link via vcpkg to avoid any OS support issue
- Partially Done: File Format Support: Parquet(gap on some datatypes), ORC, Text, CSV, Json(Pending)
- Data Source Support: ABFS, S3, GCS, Alluxio, HDFS viewfs
- Spark Functions Support, please check ([VL] Unsupported spark function list [please leave a comment if you plan to pick some] #4039 )
- Partially Done: Spark Operator Support: SMJ, PyArrow
- Spill Support including Partial Agg with single agg, Join Probe side Spill, Window + Spill, Sort + Spill, Spill performance improvement, Issue fixing([VL] Spill related issues tracker #3030 )
- Shuffle Support including Sort based Shuffle, Merge before Compress, Reducer Optimization
- Data Lake Support([VL] Unified design for data lake read support in Gluten + Velox #3378 ): DeltaLake, Iceberg, Hudi, Paimon
- RSS Support: Apache Uniffle([VL] Add uniffle integration #3767 )
- Pyspark Support including Python UDF/Arrow UDF (Pending)
- Stage level resource management(Stage level resource management to handle offheap/onheap memory conflict #4392 ), (Pending)
- Other Big Data Framework Support (Pending)
- Partially Done: Performance Optimizations: Sort, Join, Agg., Parquet Write, Scan
- Other Accelerator such as GPU, FPGA Support (Pending)
- Document and Website Update
- Planner: optimization for multiple fallback cases (RAS framework)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request