Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33933][SQL] Remove UT before we completely fix SPARK-33933 #31099

Closed
wants to merge 1 commit into from

Conversation

zhongyu09
Copy link
Contributor

What changes were proposed in this pull request?

Remove the UT "SPARK-33933: AQE broadcast should not timeout with slow map tasks" in AdaptiveQueryExecSuite

Why are the changes needed?

In #30998, we give a simple solution try to resolve SPARK-33933. However, the solution is not complete and the UT could not pass in some situations. For the stable of regression test, I suggest to remove the UT before we completely fix SPARK-33933.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No

@zhongyu09
Copy link
Contributor Author

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

some situations

Mind elabourating what are the "some situations"?

@SparkQA
Copy link

SparkQA commented Jan 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38471/

@SparkQA
Copy link

SparkQA commented Jan 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38471/

@HyukjinKwon
Copy link
Member

Looks like we need some investigations. Also it blocks RC preparation. I reverted this for now at 105ba6e. Let's make a PR again. I think that's the safe way.

cc @cloud-fan, @LuciferYang, @dongjoon-hyun FYI

@SparkQA
Copy link

SparkQA commented Jan 10, 2021

Test build #133882 has finished for PR 31099 at commit e00e542.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhongyu09
Copy link
Contributor Author

some situations

Mind elabourating what are the "some situations"?

Because the submit of broadcast job and shuffle map job are in different thread,

  1. for broadcast job, call doPrepare() in main thread, and then start the real materialization in "broadcast-exchange" thread pool: calling getByteArrayRdd().collect() to submit collect job
  2. for shuffle map job, call ShuffleExchangeExec.mapOutputStatisticsFuture() which call sparkContext.submitMapStage() directly in main thread to submit map stage
    Removed reference to incubation in README.md. #1 is trigger in Removed reference to incubation in Spark user docs. #2, so in normal cases, the broadcast job will be submit first.
    However, we can not control how fast the two thread runs, so the "broadcast-exchange" thread could run a little bit slower than main thread, result in map stage submit first

@zhongyu09 zhongyu09 deleted the aqe-broadcast-remove-UT branch January 24, 2021 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants