[WIP][SPARK-46349] Prevent nested SortOrder instances in SortOrder expressions#44283
Closed
hhdri wants to merge 1 commit intoapache:masterfrom
Closed
[WIP][SPARK-46349] Prevent nested SortOrder instances in SortOrder expressions#44283hhdri wants to merge 1 commit intoapache:masterfrom
hhdri wants to merge 1 commit intoapache:masterfrom
Conversation
243c7e2 to
1ca3f3e
Compare
1ca3f3e to
042abf6
Compare
042abf6 to
c0b6fed
Compare
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello everyone,
This is my first contribution to the project. I welcome any feedback and edits to improve this pull request.
Issue Addressed:
Currently, it's possible to create redundant sort expressions in both Scala and Python APIs, leading to potentially incorrect and confusing SQL statements. For example:
Scala:
Python:
Such usage generates SQL like order by id DESC NULLS LAST DESC NULLS LAST, causing non-descriptive error messages.
Solution:
This pull request introduces a constraint in the SortOrder class, ensuring that its child cannot be another instance of SortOrder. This change prevents the creation of nested, redundant sort expressions.
Additionally, in PySpark's DataFrame.sort, there's an ascending keyword argument that could conflict with already sorted expressions. I've added an exception handler to generate more descriptive error messages in such cases.
Tests:
A test case has been added to verify that no double ordering occurs after this fix.
I look forward to your feedback and thank you for considering this contribution.