Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12976][SQL] Add LazilyGenerateOrdering and use it for RangePartitioner of Exchange. #10894

Closed
wants to merge 8 commits into from

Conversation

ueshin
Copy link
Member

@ueshin ueshin commented Jan 25, 2016

Add LazilyGenerateOrdering to support generated ordering for RangePartitioner of Exchange instead of InterpretedOrdering.

@SparkQA
Copy link

SparkQA commented Jan 25, 2016

Test build #49975 has finished for PR 10894 at commit 624927a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LazilyGenerateOrdering(val ordering: Seq[SortOrder]) extends Ordering[InternalRow]

@JoshRosen
Copy link
Contributor

Cool! Are there any remaining usages of InterpretedOrdering after this patch?

@ueshin
Copy link
Member Author

ueshin commented Jan 26, 2016

Yes, there are some places to use it to interpret (eval()) expressions like MaxOf, MinOf, LessThan, etc. or it is used as a fallback of GenerateOrdering in SparkPlan.

@SparkQA
Copy link

SparkQA commented Feb 9, 2016

Test build #50956 has finished for PR 10894 at commit 99597b8.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 9, 2016

Test build #50957 has finished for PR 10894 at commit 644c861.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

this(ordering.map(BindReferences.bindReference(_, inputSchema)))

@transient
lazy val generatedOrdering = GenerateOrdering.generate(ordering)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this use of lazy val incur a performance penalty due to having to check whether it has been initialized on each call to compare()? If so, I wonder whether we could eagerly initialize the ordering in the constructor then re-initialize on executors as part of readObject(), e.g. @transient private[this] var generatedOrdering = ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, it might cause a performance penalty.
I'll try to rewrite as you mentioned.

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51167 has finished for PR 10894 at commit 7151a73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

I spot one other place where we might be able to use this in TakeOrderedAndProject.execute(). There's also a usage in TakeOrderedAndProjectNode, but I'm not sure if we want to update that one.

Also, one minor naming suggestion: I'd probably call this LazilyGeneratedOrdering, with a 'd', since it is an ordering rather than a factory for making them.

Aside from those two minor changes, this looks good to me. Sorry for missing those suggestions on the first review pass.

@ueshin
Copy link
Member Author

ueshin commented Feb 13, 2016

I renamed LazilyGenerateOrdering to LazilyGeneratedOrdering and modify TakeOrderedAndProject to use it.

And I found TakeOrderedAndProjectNode could use GenerateOrdering.generate() (not lazy) and UnsafeProjection.create() (not interpret).
It seems other LocalNodes are using generated projections or predicates.
Should we use them for TakeOrderedAndProjectNode?

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51225 has finished for PR 10894 at commit 3684b70.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51232 has finished for PR 10894 at commit bd1a53c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

This looks good to me, so I'm going to merge it into master. Thanks @ueshin!

Regarding TakeOrderedAndProjectNode, I think we should handle that change in a separate patch since it doesn't seem directly related to LazilyGeneratedOrdering. Feel free to open a new JIRA and PR for that change.

@asfgit asfgit closed this in 19dc69d Feb 16, 2016
@ueshin
Copy link
Member Author

ueshin commented Feb 17, 2016

@JoshRosen Thank you for merging this.
I opened a new issue for TakeOrderedAndProjectNode and PRed at #11230.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants