Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45920][SQL][3.4] group by ordinal should be idempotent #43838

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

backport #43797

What changes were proposed in this pull request?

GROUP BY ordinal is not idempotent today. If the ordinal points to another integer literal and the plan get analyzed again, we will re-do the ordinal resolution which can lead to wrong result or index out-of-bound error. This PR fixes it by using a hack: if the ordinal points to another integer literal, don't replace the ordinal.

Why are the changes needed?

For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable.

Does this PR introduce any user-facing change?

No

How was this patch tested?

new test

Was this patch authored or co-authored using generative AI tooling?

no

GROUP BY ordinal is not idempotent today. If the ordinal points to another integer literal and the plan get analyzed again, we will re-do the ordinal resolution which can lead to wrong result or index out-of-bound error. This PR fixes it by using a hack: if the ordinal points to another integer literal, don't replace the ordinal.

For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable.

No

new test

no

Closes apache#43797 from cloud-fan/group.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@github-actions github-actions bot added the SQL label Nov 16, 2023
@dongjoon-hyun dongjoon-hyun changed the title [3.4][SPARK-45920][SQL] group by ordinal should be idempotent [SPARK-45920][SQL][3.4] group by ordinal should be idempotent Nov 16, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to branch-3.4. Thank you, @cloud-fan and @HyukjinKwon .

dongjoon-hyun pushed a commit that referenced this pull request Nov 16, 2023
backport #43797

### What changes were proposed in this pull request?

GROUP BY ordinal is not idempotent today. If the ordinal points to another integer literal and the plan get analyzed again, we will re-do the ordinal resolution which can lead to wrong result or index out-of-bound error. This PR fixes it by using a hack: if the ordinal points to another integer literal, don't replace the ordinal.

### Why are the changes needed?

For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43838 from cloud-fan/3.4-port.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Feb 7, 2024
backport apache#43797

### What changes were proposed in this pull request?

GROUP BY ordinal is not idempotent today. If the ordinal points to another integer literal and the plan get analyzed again, we will re-do the ordinal resolution which can lead to wrong result or index out-of-bound error. This PR fixes it by using a hack: if the ordinal points to another integer literal, don't replace the ordinal.

### Why are the changes needed?

For advanced users or Spark plugins, they may manipulate the logical plans directly. We need to make the framework more reliable.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#43838 from cloud-fan/3.4-port.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants