Skip to content

Conversation

@choojoyq
Copy link

What changes were proposed in this pull request?

During graceful shutdown of StreamingContext graph.stop() is invoked right after stopping of timer which generates new job. Thus it's possible that the latest jobs generated by timer are still in the middle of generation but invocation of graph.stop() closes some objects required to job generation, e.g. consumer for Kafka, and generation fails. That also leads to fully waiting of spark.streaming.gracefulStopTimeout which is equal to 10 batch intervals by default. Stopping of the graph should be performed later, after haveAllBatchesBeenProcessed is completed.

How was this patch tested?

Added test to existing test suite.

@choojoyq choojoyq force-pushed the SPARK-22955-job-generation-error-on-graceful-stop branch from 2ba6586 to c8937b1 Compare August 20, 2019 10:36
@choojoyq
Copy link
Author

Hi @srowen, could you please take a look ?

@srowen
Copy link
Member

srowen commented Aug 23, 2019

I tend to agree, but let's see what tests say. CC @tdas

@SparkQA
Copy link

SparkQA commented Aug 23, 2019

Test build #4840 has finished for PR 25511 at commit c8937b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 13b1eb6 Aug 27, 2019
@srowen
Copy link
Member

srowen commented Aug 27, 2019

Merged to master

@zzcclp
Copy link
Contributor

zzcclp commented Aug 27, 2019

@choojoyq @srowen will this pr be merged into branch-2.4 too?

@srowen
Copy link
Member

srowen commented Aug 27, 2019

Yes I think that's OK. We're in a 'code freeze' for the 2.4.4 release at the moment, so I hesitate to merge anything but critical fixes until it's finalized. But it could go in for 2.4.5.

@zzcclp
Copy link
Contributor

zzcclp commented Aug 27, 2019

OK, thanks.

@zzcclp
Copy link
Contributor

zzcclp commented Sep 3, 2019

@choojoyq @srowen 2.4.4 was released, do you plan to merge this pr into branch-2.4?

@srowen
Copy link
Member

srowen commented Sep 3, 2019

@dongjoon-hyun did you say you're having problems with the merge script not being able to backport? I'm seeing the same. We are just manually cherry-picking and pushing?

srowen pushed a commit that referenced this pull request Sep 3, 2019
### What changes were proposed in this pull request?
During graceful shutdown of ``StreamingContext`` ``graph.stop()`` is invoked right after stopping of ``timer`` which generates new job. Thus it's possible that the latest jobs generated by timer are still in the middle of generation but invocation of ``graph.stop()`` closes some objects required to job generation, e.g. consumer for Kafka, and generation fails. That also leads to fully waiting of ``spark.streaming.gracefulStopTimeout`` which is equal to 10 batch intervals by default. Stopping of the graph should be performed later, after ``haveAllBatchesBeenProcessed`` is completed.

### How was this patch tested?
Added test to existing test suite.

Closes #25511 from choojoyq/SPARK-22955-job-generation-error-on-graceful-stop.

Authored-by: Nikita Gorbachevsky <nikitag@playtika.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
@srowen
Copy link
Member

srowen commented Sep 3, 2019

Seemed to work fine. This is backported to 2.4

@zzcclp
Copy link
Contributor

zzcclp commented Sep 4, 2019

Thank @srowen

@dongjoon-hyun
Copy link
Member

Yes. @srowen . The current merge script has two issues.

  1. It doesn't work on the merged PR.
  2. It doesn't work on the resolved JIRA issue.

BTW, sorry for being late response, I've been on a vacation since last Saturday. I'm connecting here time to time.

rluta pushed a commit to rluta/spark that referenced this pull request Sep 17, 2019
### What changes were proposed in this pull request?
During graceful shutdown of ``StreamingContext`` ``graph.stop()`` is invoked right after stopping of ``timer`` which generates new job. Thus it's possible that the latest jobs generated by timer are still in the middle of generation but invocation of ``graph.stop()`` closes some objects required to job generation, e.g. consumer for Kafka, and generation fails. That also leads to fully waiting of ``spark.streaming.gracefulStopTimeout`` which is equal to 10 batch intervals by default. Stopping of the graph should be performed later, after ``haveAllBatchesBeenProcessed`` is completed.

### How was this patch tested?
Added test to existing test suite.

Closes apache#25511 from choojoyq/SPARK-22955-job-generation-error-on-graceful-stop.

Authored-by: Nikita Gorbachevsky <nikitag@playtika.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants