[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

testsgmr · 2021-12-15T13:33:58Z

What changes were proposed in this pull request?

After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug.

Why are the changes needed?

We need to find if the response received from a backup master client must ignore it.

Does this PR introduce any user-facing change?

No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters.

How was this patch tested?

AmplabJenkins · 2021-12-15T13:35:21Z

Can one of the admins verify this patch?

testsgmr · 2021-12-15T13:36:54Z

@Ngone51
Based on our conversation in #34331 (comment), here is PR for branch-3.1.

…sters ### What changes were proposed in this pull request? After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug. ### Why are the changes needed? We need to find if the response received from a backup master client must ignore it. ### Does this PR introduce _any_ user-facing change? No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters. ### How was this patch tested? Closes #34911 from mohamadrezarostami/fix-a-bug-in-report-driver-status. Authored-by: Mohamadreza Rostami <mohamadrezarostami2@gmail.com> Signed-off-by: yi.wu <yi.wu@databricks.com>

Ngone51 · 2021-12-16T07:25:02Z

Thanks, merged to branch-3.1.

…sters ### What changes were proposed in this pull request? After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug. ### Why are the changes needed? We need to find if the response received from a backup master client must ignore it. ### Does this PR introduce _any_ user-facing change? No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters. ### How was this patch tested? Closes apache#34911 from mohamadrezarostami/fix-a-bug-in-report-driver-status. Authored-by: Mohamadreza Rostami <mohamadrezarostami2@gmail.com> Signed-off-by: yi.wu <yi.wu@databricks.com>

[SPARK-37060][CORE] Handle driver status response from backup masters

d315087

github-actions bot added the CORE label Dec 15, 2021

dongjoon-hyun changed the title ~~[SPARK-37060][CORE] Handle driver status response from backup masters~~ [SPARK-37060][CORE][3.1] Handle driver status response from backup masters Dec 15, 2021

Ngone51 approved these changes Dec 16, 2021

View reviewed changes

Ngone51 closed this Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

testsgmr commented Dec 15, 2021

AmplabJenkins commented Dec 15, 2021

testsgmr commented Dec 15, 2021

Ngone51 commented Dec 16, 2021

[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

Conversation

testsgmr commented Dec 15, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Dec 15, 2021

testsgmr commented Dec 15, 2021

Ngone51 commented Dec 16, 2021