Skip to content

[SPARK-27413][SS] keep the same epoch pace between driver and executor.#24323

Closed
uncleGen wants to merge 6 commits intoapache:masterfrom
uncleGen:SPARK-27413
Closed

[SPARK-27413][SS] keep the same epoch pace between driver and executor.#24323
uncleGen wants to merge 6 commits intoapache:masterfrom
uncleGen:SPARK-27413

Conversation

@uncleGen
Copy link
Contributor

@uncleGen uncleGen commented Apr 9, 2019

What changes were proposed in this pull request?

The pace of epoch generation in driver and epoch pulling in executor is different. It will result in many empty epochs for partition if the epoch pulling interval is larger than epoch generation.

How was this patch tested?

update existing unit tests.

@uncleGen uncleGen changed the title keep the same epoch pace between driver and executor. [SPARK-27413][SS] keep the same epoch pace between driver and executor. Apr 9, 2019
@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 9, 2019

cc @jose-torres

@SparkQA
Copy link

SparkQA commented Apr 9, 2019

Test build #104427 has finished for PR 24323 at commit d624637.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor

I agree it would reduce unused computation. On the other hand I think a better approach would be instead of pull mechanism sending push notifications to executors. It would end-up in less resource waste.

@shaneknapp
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Apr 9, 2019

Test build #104441 has started for PR 24323 at commit d624637.

@uncleGen
Copy link
Contributor Author

@gaborgsomogyi Agree with you, and let us think about code refactor later, ok?

@uncleGen
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 10, 2019

Test build #104463 has finished for PR 24323 at commit d624637.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor

Spark can be configured such a way to reach this without code modification. Refactoring is changing the code without behavior modification. I mean here and now a different approach can be used.

@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 10, 2019

@gaborgsomogyi Hmm, I misunderstood what you meant. From my point of view, using push mechanism doesn't really help much but makes this code more complex. The main concern of PR is to avoid producing unexpected empty epochs. Both pulling and pushing are able to achieve it. There is not any strong reason to change actual implementation. So IMHO, keep the actual implementation.

@jose-torres
Copy link
Contributor

I'm not sure I understand the motivation here. It's true that setting the poll interval larger than the generation interval will generate a bunch of empty epochs, but why does that imply that we shouldn't allow it to be configured at all? (And even if we shouldn't, why is "equal to the epoch duration" the right value?)

@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 11, 2019

@jose-torres Thanks for you reply

I'm not sure I understand the motivation here. It's true that setting the poll interval larger than the generation interval will generate a bunch of empty epochs,

You have got the motivation. As Mentioned above, the main concern of PR is to avoid produce empty epoch.

but why does that imply that we shouldn't allow it to be configured at all?

We can indeed allow it to be configured. But firstly, IMO, it is not a good idea to expose this config to users and let them to set it carefully. Secondly, the epoch pulling interval is a internal config, and we may use a better approach to optimize this issue but not add config.

And even if we shouldn't, why is "equal to the epoch duration" the right value?

Hmm... After think again and deeply, "equal to the epoch duration" is not a very good choice. In some corner cases, executor will pull epoch from driver later than "epoch duration". So as @gaborgsomogyi mentioned, "push notifications to executors" may be a better approach.

To reiterate, the main concern of PR is to avoid produce empty epoch and late epoch. Do you have any doubt about this motivation?

Any suggestion is appreciated.

@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 23, 2019

Send epoch update to executors instead of pull mechanism.

@SparkQA
Copy link

SparkQA commented Apr 23, 2019

Test build #104822 has finished for PR 24323 at commit ef4238a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 23, 2019

Test build #104821 has finished for PR 24323 at commit cf37736.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104848 has finished for PR 24323 at commit ef4238a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104859 has finished for PR 24323 at commit 02b2b2b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104866 has finished for PR 24323 at commit 02b2b2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

uncleGen commented May 6, 2019

cc @tdas

@SparkQA
Copy link

SparkQA commented May 6, 2019

Test build #105154 has finished for PR 24323 at commit 4676c64.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2019

Test build #105195 has finished for PR 24323 at commit 4676c64.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2019

Test build #105196 has finished for PR 24323 at commit ad14cd2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2019

Test build #105200 has finished for PR 24323 at commit ad14cd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

uncleGen commented May 8, 2019

retest this please

@SparkQA
Copy link

SparkQA commented May 8, 2019

Test build #105242 has finished for PR 24323 at commit ad14cd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Dec 31, 2019
@github-actions github-actions bot closed this Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants