Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10311][Streaming]Reload appId and attemptId when app starts with checkpoint file in cluster mode #8477

Closed
wants to merge 1 commit into from

Conversation

XuTingjun
Copy link
Contributor

No description provided.

@XuTingjun XuTingjun changed the title [SPARK-10311]Reload appId and attemptId when a new ApplicationMaster registes [SPARK-10311]Reload appId and attemptId when a new ApplicationMaster registes of streaming app in cluster mode Aug 27, 2015
@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41670 has finished for PR 8477 at commit 3211a68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@XuTingjun XuTingjun changed the title [SPARK-10311]Reload appId and attemptId when a new ApplicationMaster registes of streaming app in cluster mode [SPARK-10311][Streaming]Reload appId and attemptId when a new ApplicationMaster registes in cluster mode Aug 27, 2015
@tdas
Copy link
Contributor

tdas commented Aug 27, 2015

@harishreedharan @vanzin Could you guys take a look at this?

@@ -49,6 +49,8 @@ class Checkpoint(@transient ssc: StreamingContext, val checkpointTime: Time)
// Reload properties for the checkpoint application since user wants to set a reload property
// or spark had changed its value and user wants to set it back.
val propertiesToReload = List(
"spark.yarn.app.id",
"spark.yarn.app.attemptId",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. If the AM restarted, the attempt ID will be different.

@vanzin
Copy link
Contributor

vanzin commented Aug 27, 2015

I'm not familiar with how this data is used by the streaming backend, but it looks odd to be manually reloading it from a checkpoint.

In client mode, AM restarts do not affect the driver. In cluster mode, the app id is the same, but the attempt id changes; and those will be re-populated anyway when the SparkContext for the app is initialized by the new AM process.

@XuTingjun
Copy link
Contributor Author

Sorry that I don't declare the problem clearly.

When an app starts with CheckPoint file using getOrCreate method, the new AM process will new a SparkContext object, but just using the old SparkConf, So the new attemptId set by new AM process doesn't do anything.

 val newSparkConf = new SparkConf(loadDefaults = false).setAll(sparkConfPairs)

Also the appId is the same.

@XuTingjun XuTingjun changed the title [SPARK-10311][Streaming]Reload appId and attemptId when a new ApplicationMaster registes in cluster mode [SPARK-10311][Streaming]Reload appId and attemptId when app starts with checkpoint file in cluster mode Aug 28, 2015
@harishreedharan
Copy link
Contributor

+1. This looks good.

@vanzin - I assume your concern is with the reloading. Reloading here is to ignore the old values from the serlialized spark conf so we pick up the values in the new one which was created by the current attempt.

@vanzin
Copy link
Contributor

vanzin commented Sep 4, 2015

@tdas Hari knows a lot more about this area than me so feel free to ignore my comments.

@tdas
Copy link
Contributor

tdas commented Sep 4, 2015

@harishreedharan Please take a look.
@XuTingjun Could you update the Description of this PR (the area below the title) with the explanation you gave to @vanzin ??

@harishreedharan
Copy link
Contributor

I already +1-ed it. LGTM

@tdas
Copy link
Contributor

tdas commented Sep 4, 2015

@harishreedharan Oops, yeah, missed that. Merging this to master and 1.5

@asfgit asfgit closed this in eafe372 Sep 4, 2015
asfgit pushed a commit that referenced this pull request Sep 4, 2015
…with checkpoint file in cluster mode

Author: xutingjun <xutingjun@huawei.com>

Closes #8477 from XuTingjun/streaming-attempt.
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…with checkpoint file in cluster mode

Author: xutingjun <xutingjun@huawei.com>

Closes apache#8477 from XuTingjun/streaming-attempt.

(cherry picked from commit dc39658)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants