Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Aug 5, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-9563

We should remove repartition operators when they are the child of Exchange and shuffle=True.

@SparkQA
Copy link

SparkQA commented Aug 5, 2015

Test build #39858 has finished for PR 7959 at commit 5dc7be8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we also need a test case to ensure that this pruning does not apply if shuffle=False.

@JoshRosen
Copy link
Contributor

One other thought: a related optimization might be to collapse adjacent repartition calls. E.g. if I call repartition(10).repartition(100) then we'll actually repartition twice, whereas I think we may be able to safely drop the smaller repartition.

Similar optimizations may apply to coalesce, although I'd have to think a bit more about whether that risks introducing regressions in any corner-cases.

@viirya
Copy link
Member Author

viirya commented Aug 6, 2015

To optimise repartition(10).repartition(100) to repartition(100) seems reasonable. Is it also doable for repartition(100).repartition(10) to repartition(10)?

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #40000 has finished for PR 7959 at commit 4ba7a38.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 6, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #248 has finished for PR 7959 at commit 4ba7a38.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2015

Test build #40022 has finished for PR 7959 at commit 4ba7a38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40285 has finished for PR 7959 at commit e1efa4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this case ever occur, given that Exchange operators are currently added all at once in a single pass? Are you planning ahead in case we implement Repartition using Exchange, as in #8030?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, because logical.RepartitionByExpression will be transformed to execution.Exchange as well. I am wondering if it is possible to have an Exchange here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Exchange for a Repartition is another possible case too.

@JoshRosen
Copy link
Contributor

/cc @yhuai, do you have any thoughts on this PR? The basic change here seems fine to me.

@viirya
Copy link
Member Author

viirya commented Aug 11, 2015

test this please.

@viirya
Copy link
Member Author

viirya commented Aug 11, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40429 has finished for PR 7959 at commit 0b334b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 26, 2015

@JoshRosen Hi, I think this patch stays here too long. Is it ok to merge it?

@andrewor14
Copy link
Contributor

@yhuai any thoughts? I think the high level approach makes sense.

@viirya viirya closed this Nov 30, 2015
@viirya viirya deleted the remove_repartition branch December 27, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants