Skip to content

[SPARK-41326] [CONNECT] Fix deduplicate is missing input#38842

Closed
grundprinzip wants to merge 1 commit into
apache:masterfrom
grundprinzip:SPARK-41326
Closed

[SPARK-41326] [CONNECT] Fix deduplicate is missing input#38842
grundprinzip wants to merge 1 commit into
apache:masterfrom
grundprinzip:SPARK-41326

Conversation

@grundprinzip
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

In the transformation of the Spark Connect plan for Deduplicate, it was missing to copy the input relation into the plan. This caused an exception on the server and failing the query.

This patch fixes that bug.

Why are the changes needed?

Bugfix

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

def plan(self, session: "SparkConnectClient") -> proto.Relation:
assert self._child is not None
plan = proto.Relation()
plan.deduplicate.input.CopyFrom(self._child.plan(session))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a case that probably we have a test in test_connect_basic to avoid, maybe BTW add a test case there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing tests are exhaustive they just missed that the input was never copied.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block this PR by my comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me just merge this for now and go forward.

@AmplabJenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?
In the transformation of the Spark Connect plan for `Deduplicate`, it was missing to copy the input relation into the plan. This caused an exception on the server and failing the query.

This patch fixes that bug.

### Why are the changes needed?
Bugfix

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes apache#38842 from grundprinzip/SPARK-41326.

Authored-by: Martin Grund <martin.grund@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants