[FEA] Fully support nested types in Spark SQL functions #8550

ttnghia · 2023-06-12T02:28:29Z

Since libcudf now has full support for nested types in both equality and lexicographic comparisons, we should also adopt such features to various Spark SQL functions in the plugin.

Currently, there are several existing FEA issues related to nested types that can be addressed:

Feature	Issue(s)	PR(s)	Status	Notes
Fully support nested type for HashPartitioning	#8676		❌	Top dependency for all other issues.
Verify that array nulls are empty	#5430	#8517	✔️	Top dependency for all other issues.
Order-by on arbitrary level nested arrays	#5509, #7230	#7233	❌	Depends on #5430.
Use merge sort for nested types	#2252	rapidsai/cudf#13347	❌	Depends on #5430 and cudf issues: rapidsai/cudf#8050 and rapidsai/cudf#13514.
Sort arrays containing nested types	#3715		❌	Depends on #5430.
Support hash aggregate on single-level arrays	#6680	#7465	✔️
Support hash aggregate on nested arrays	N/A		❌	Depends on #8676.
Support `min`/`max` in groupby/reduction for nested structs	#3153, rapidsai/cudf#8974, rapidsai/cudf#8964	#8638	✔️	There is no support for windowing ops.
Support `min`/`max` in groupby/reduction for nested arrays	#4929, #4900, #8668	rapidsai/cudf#13069, rapidsai/cudf#13676, #8689	❌	There is no support for windowing ops.
Support Delta Lake optimized write on Databricks	#7799		❌	Require to implement GPU partitioning for nested types (#4887). Also depends on #5430 and #6680.

The text was updated successfully, but these errors were encountered:

sameerz removed the ? - Needs Triage Need team to review and classify label Jun 13, 2023

ttnghia mentioned this issue Jul 11, 2023

Support nested arrays for min/max aggregations in groupby and reduction #8689

Merged

GregoryKimball mentioned this issue Jul 30, 2023

[FEA] Implement full support for nested types rapidsai/cudf#11844

Closed

ttnghia mentioned this issue Nov 9, 2023

[FEA] Audit nested type support #9580

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Fully support nested types in Spark SQL functions #8550

[FEA] Fully support nested types in Spark SQL functions #8550

ttnghia commented Jun 12, 2023 •

edited

[FEA] Fully support nested types in Spark SQL functions #8550

[FEA] Fully support nested types in Spark SQL functions #8550

Comments

ttnghia commented Jun 12, 2023 • edited

ttnghia commented Jun 12, 2023 •

edited