Skip to content

[WIP][SPARK-46349] Prevent nested SortOrder instances in SortOrder expressions#44283

Closed
hhdri wants to merge 1 commit intoapache:masterfrom
hhdri:fix-sort-order
Closed

[WIP][SPARK-46349] Prevent nested SortOrder instances in SortOrder expressions#44283
hhdri wants to merge 1 commit intoapache:masterfrom
hhdri:fix-sort-order

Conversation

@hhdri
Copy link

@hhdri hhdri commented Dec 10, 2023

Hello everyone,

This is my first contribution to the project. I welcome any feedback and edits to improve this pull request.

Issue Addressed:
Currently, it's possible to create redundant sort expressions in both Scala and Python APIs, leading to potentially incorrect and confusing SQL statements. For example:

Scala:

spark.range(10).orderBy($"id".desc.asc).show()

Python:

spark.range(10).orderBy(f.desc('id'), ascending=False).show()

Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST, causing non-descriptive error messages.

Solution:
This pull request introduces a constraint in the SortOrder class, ensuring that its child cannot be another instance of SortOrder. This change prevents the creation of nested, redundant sort expressions.

Additionally, in PySpark's DataFrame.sort, there's an ascending keyword argument that could conflict with already sorted expressions. I've added an exception handler to generate more descriptive error messages in such cases.

Tests:
A test case has been added to verify that no double ordering occurs after this fix.

I look forward to your feedback and thank you for considering this contribution.

@hhdri hhdri changed the title [WIP] Prevent nested SortOrder instances in SortOrder expressions [WIP][SPARK-46349] Prevent nested SortOrder instances in SortOrder expressions Dec 10, 2023
@hhdri hhdri force-pushed the fix-sort-order branch 3 times, most recently from 243c7e2 to 1ca3f3e Compare December 10, 2023 03:45
@github-actions github-actions bot removed the PYTHON label Dec 10, 2023
@hhdri hhdri marked this pull request as draft December 10, 2023 14:48
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 20, 2024
@github-actions github-actions bot closed this Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments