Skip to content

Conversation

@koertkuipers
Copy link
Contributor

What changes were proposed in this pull request?

Preserve the insertion order of columns in Dataset.withColumns

Why are the changes needed?

It is the expected behavior. We preserve insertion order in all other places.

Does this PR introduce any user-facing change?

No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.

How was this patch tested?

Added test in DatasetSuite

@github-actions github-actions bot added the SQL label Jul 19, 2021
@SparkQA
Copy link

SparkQA commented Jul 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45779/

@SparkQA
Copy link

SparkQA commented Jul 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45779/

@SparkQA
Copy link

SparkQA commented Jul 19, 2021

Test build #141265 has finished for PR 33423 at commit 3b69021.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @viirya

}

test("SPARK-36210: withColumns preserve insertion ordering") {
val df = Seq(1, 2, 3).toDS()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test with a duplicate column names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think withColumns doesn't allow duplicate column names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one test in DataFrameSuite.

@viirya
Copy link
Member

viirya commented Jul 20, 2021

@HyukjinKwon Any more comments? If no, I will merge this tomorrow. Thanks.

@HyukjinKwon
Copy link
Member

nope lgtm too!

@srowen
Copy link
Member

srowen commented Jul 20, 2021

Seems OK to me. I'm very slightly concerned about the behavior change - column order is different - but seems worth it as a 'fix'.

@viirya
Copy link
Member

viirya commented Jul 20, 2021

Thanks. Merging to master/3.2/3.1/3.0

@viirya viirya closed this in bf680bf Jul 20, 2021
viirya pushed a commit that referenced this pull request Jul 20, 2021
…umns

### What changes were proposed in this pull request?
Preserve the insertion order of columns in Dataset.withColumns

### Why are the changes needed?
It is the expected behavior. We preserve insertion order in all other places.

### Does this PR introduce _any_ user-facing change?
No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.

### How was this patch tested?
Added test in DatasetSuite

Closes #33423 from koertkuipers/feat-withcolumns-preserve-order.

Authored-by: Koert Kuipers <koert@tresata.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit bf680bf)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
viirya pushed a commit that referenced this pull request Jul 20, 2021
…umns

### What changes were proposed in this pull request?
Preserve the insertion order of columns in Dataset.withColumns

### Why are the changes needed?
It is the expected behavior. We preserve insertion order in all other places.

### Does this PR introduce _any_ user-facing change?
No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.

### How was this patch tested?
Added test in DatasetSuite

Closes #33423 from koertkuipers/feat-withcolumns-preserve-order.

Authored-by: Koert Kuipers <koert@tresata.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit bf680bf)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
viirya pushed a commit that referenced this pull request Jul 20, 2021
…umns

### What changes were proposed in this pull request?
Preserve the insertion order of columns in Dataset.withColumns

### Why are the changes needed?
It is the expected behavior. We preserve insertion order in all other places.

### Does this PR introduce _any_ user-facing change?
No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.

### How was this patch tested?
Added test in DatasetSuite

Closes #33423 from koertkuipers/feat-withcolumns-preserve-order.

Authored-by: Koert Kuipers <koert@tresata.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit bf680bf)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
@srowen
Copy link
Member

srowen commented Jul 20, 2021

I'm not super against back-porting, but is more questionable given the behavior change

@viirya
Copy link
Member

viirya commented Jul 20, 2021

Hmm, withColumns is not a public API yet. It is used by a few ML classes to add columns. Although column order is different, I guess that change is less severe? I'm totally okay to revert backported patches.

@srowen
Copy link
Member

srowen commented Jul 20, 2021

Oh I see, OK

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…umns

### What changes were proposed in this pull request?
Preserve the insertion order of columns in Dataset.withColumns

### Why are the changes needed?
It is the expected behavior. We preserve insertion order in all other places.

### Does this PR introduce _any_ user-facing change?
No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.

### How was this patch tested?
Added test in DatasetSuite

Closes apache#33423 from koertkuipers/feat-withcolumns-preserve-order.

Authored-by: Koert Kuipers <koert@tresata.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit bf680bf)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…umns

### What changes were proposed in this pull request?
Preserve the insertion order of columns in Dataset.withColumns

### Why are the changes needed?
It is the expected behavior. We preserve insertion order in all other places.

### Does this PR introduce _any_ user-facing change?
No. Currently Dataset.withColumns is not actually used anywhere to insert more than one column. This change is to make sure it behaves as expected when it is used for that purpose in future.

### How was this patch tested?
Added test in DatasetSuite

Closes apache#33423 from koertkuipers/feat-withcolumns-preserve-order.

Authored-by: Koert Kuipers <koert@tresata.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit bf680bf)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants