Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Fully support nested types in Spark SQL functions #8550

Open
ttnghia opened this issue Jun 12, 2023 · 0 comments
Open

[FEA] Fully support nested types in Spark SQL functions #8550

ttnghia opened this issue Jun 12, 2023 · 0 comments
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing

Comments

@ttnghia
Copy link
Collaborator

ttnghia commented Jun 12, 2023

Since libcudf now has full support for nested types in both equality and lexicographic comparisons, we should also adopt such features to various Spark SQL functions in the plugin.

Currently, there are several existing FEA issues related to nested types that can be addressed:

Feature Issue(s) PR(s) Status Notes
Fully support nested type for HashPartitioning #8676 Top dependency for all other issues.
Verify that array nulls are empty #5430 #8517 ✔️ Top dependency for all other issues.
Order-by on arbitrary level nested arrays #5509, #7230 #7233 Depends on #5430.
Use merge sort for nested types #2252 rapidsai/cudf#13347 Depends on #5430 and cudf issues: rapidsai/cudf#8050 and rapidsai/cudf#13514.
Sort arrays containing nested types #3715 Depends on #5430.
Support hash aggregate on single-level arrays #6680 #7465 ✔️
Support hash aggregate on nested arrays N/A Depends on #8676.
Support min/max in groupby/reduction for nested structs #3153, rapidsai/cudf#8974, rapidsai/cudf#8964 #8638 ✔️ There is no support for windowing ops.
Support min/max in groupby/reduction for nested arrays #4929, #4900, #8668 rapidsai/cudf#13069, rapidsai/cudf#13676, #8689 There is no support for windowing ops.
Support Delta Lake optimized write on Databricks #7799 Require to implement GPU partitioning for nested types (#4887). Also depends on #5430 and #6680.
@ttnghia ttnghia added feature request New feature or request ? - Needs Triage Need team to review and classify SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing epic Issue that encompasses a significant feature or body of work and removed ? - Needs Triage Need team to review and classify labels Jun 12, 2023
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing
Projects
None yet
Development

No branches or pull requests

2 participants