-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36210][SQL] Preserve column insertion order in Dataset.withColumns #33423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36210][SQL] Preserve column insertion order in Dataset.withColumns #33423
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #141265 has finished for PR 33423 at commit
|
|
cc @viirya |
| } | ||
|
|
||
| test("SPARK-36210: withColumns preserve insertion ordering") { | ||
| val df = Seq(1, 2, 3).toDS() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test with a duplicate column names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think withColumns doesn't allow duplicate column names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one test in DataFrameSuite.
|
@HyukjinKwon Any more comments? If no, I will merge this tomorrow. Thanks. |
|
nope lgtm too! |
|
Seems OK to me. I'm very slightly concerned about the behavior change - column order is different - but seems worth it as a 'fix'. |
|
Thanks. Merging to master/3.2/3.1/3.0 |
…umns ### What changes were proposed in this pull request? Preserve the insertion order of columns in Dataset.withColumns ### Why are the changes needed? It is the expected behavior. We preserve insertion order in all other places. ### Does this PR introduce _any_ user-facing change? No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future. ### How was this patch tested? Added test in DatasetSuite Closes #33423 from koertkuipers/feat-withcolumns-preserve-order. Authored-by: Koert Kuipers <koert@tresata.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit bf680bf) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…umns ### What changes were proposed in this pull request? Preserve the insertion order of columns in Dataset.withColumns ### Why are the changes needed? It is the expected behavior. We preserve insertion order in all other places. ### Does this PR introduce _any_ user-facing change? No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future. ### How was this patch tested? Added test in DatasetSuite Closes #33423 from koertkuipers/feat-withcolumns-preserve-order. Authored-by: Koert Kuipers <koert@tresata.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit bf680bf) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…umns ### What changes were proposed in this pull request? Preserve the insertion order of columns in Dataset.withColumns ### Why are the changes needed? It is the expected behavior. We preserve insertion order in all other places. ### Does this PR introduce _any_ user-facing change? No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future. ### How was this patch tested? Added test in DatasetSuite Closes #33423 from koertkuipers/feat-withcolumns-preserve-order. Authored-by: Koert Kuipers <koert@tresata.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit bf680bf) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
|
I'm not super against back-porting, but is more questionable given the behavior change |
|
Hmm, |
|
Oh I see, OK |
…umns ### What changes were proposed in this pull request? Preserve the insertion order of columns in Dataset.withColumns ### Why are the changes needed? It is the expected behavior. We preserve insertion order in all other places. ### Does this PR introduce _any_ user-facing change? No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future. ### How was this patch tested? Added test in DatasetSuite Closes apache#33423 from koertkuipers/feat-withcolumns-preserve-order. Authored-by: Koert Kuipers <koert@tresata.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit bf680bf) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
…umns ### What changes were proposed in this pull request? Preserve the insertion order of columns in Dataset.withColumns ### Why are the changes needed? It is the expected behavior. We preserve insertion order in all other places. ### Does this PR introduce _any_ user-facing change? No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future. ### How was this patch tested? Added test in DatasetSuite Closes apache#33423 from koertkuipers/feat-withcolumns-preserve-order. Authored-by: Koert Kuipers <koert@tresata.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com> (cherry picked from commit bf680bf) Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
What changes were proposed in this pull request?
Preserve the insertion order of columns in Dataset.withColumns
Why are the changes needed?
It is the expected behavior. We preserve insertion order in all other places.
Does this PR introduce any user-facing change?
No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.
How was this patch tested?
Added test in DatasetSuite