Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21165][SQL] FileFormatWriter should handle mismatched attribute ids between logical and physical plan #19483

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Oct 12, 2017

What changes were proposed in this pull request?

Due to optimizer removing some unnecessary aliases, the logical and physical plan may have different output attribute ids. FileFormatWriter should handle this when creating the physical sort node.

How was this patch tested?

new regression test.

@cloud-fan
Copy link
Contributor Author

cc @gatorsmile

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Oct 12, 2017

Test build #82683 has finished for PR 19483 at commit d90a0e4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 12, 2017

Test build #82685 has finished for PR 19483 at commit 3bd5b11.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SPARK-21165][SQL] FileFormatWriter should only rely on attributes from analyzed plan [SPARK-21165][SQL] FileFormatWriter should handle mismatched attribute ids between logical and physical plan Oct 13, 2017
@gatorsmile
Copy link
Member

It sounds like we are facing various issues because we are using the analyzed plan. Is that possible we just add an extra Project using the analyzed plan's output at the end of optimizer?

@cloud-fan
Copy link
Contributor Author

I'll refactor it later, to use requiredChildOrdering to do the sort. I just wanna make this bug fix as simple as possible.

@tejasapatil
Copy link
Contributor

I'll refactor it later, to use requiredChildOrdering to do the sort.

The hive bucketing PR does that : #19001 I can isolate that piece and put out a PR

@cloud-fan
Copy link
Contributor Author

that will be great, thanks @tejasapatil !

@SparkQA
Copy link

SparkQA commented Oct 13, 2017

Test build #82712 has finished for PR 19483 at commit f4a7337.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Oct 13, 2017
…ry schema

## What changes were proposed in this pull request?

#18386 fixes SPARK-21165 but breaks SPARK-22252. This PR reverts #18386 and picks the patch from #19483 to fix SPARK-21165.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes #19484 from cloud-fan/bug.
@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master!

@asfgit asfgit closed this in ec12220 Oct 13, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…ry schema

## What changes were proposed in this pull request?

apache#18386 fixes SPARK-21165 but breaks SPARK-22252. This PR reverts apache#18386 and picks the patch from apache#19483 to fix SPARK-21165.

## How was this patch tested?

new regression test

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#19484 from cloud-fan/bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants