Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

Closed
wants to merge 1 commit into from
Closed

[SPARK-37060][CORE][3.1] Handle driver status response from backup masters #34911

wants to merge 1 commit into from

Conversation

testsgmr
Copy link
Contributor

What changes were proposed in this pull request?

After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug.

Why are the changes needed?

We need to find if the response received from a backup master client must ignore it.

Does this PR introduce any user-facing change?

No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters.

How was this patch tested?

@github-actions github-actions bot added the CORE label Dec 15, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@testsgmr
Copy link
Contributor Author

@Ngone51
Based on our conversation in #34331 (comment), here is PR for branch-3.1.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-37060][CORE] Handle driver status response from backup masters [SPARK-37060][CORE][3.1] Handle driver status response from backup masters Dec 15, 2021
Ngone51 pushed a commit that referenced this pull request Dec 16, 2021
…sters

### What changes were proposed in this pull request?
After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug.

### Why are the changes needed?

We need to find if the response received from a backup master client must ignore it.

### Does this PR introduce _any_ user-facing change?

No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters.

### How was this patch tested?

Closes #34911 from mohamadrezarostami/fix-a-bug-in-report-driver-status.

Authored-by: Mohamadreza Rostami <mohamadrezarostami2@gmail.com>
Signed-off-by: yi.wu <yi.wu@databricks.com>
@Ngone51
Copy link
Member

Ngone51 commented Dec 16, 2021

Thanks, merged to branch-3.1.

@Ngone51 Ngone51 closed this Dec 16, 2021
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…sters

### What changes were proposed in this pull request?
After an improvement in SPARK-31486, contributor uses 'asyncSendToMasterAndForwardReply' method instead of 'activeMasterEndpoint.askSync' to get the status of driver. Since the driver's status is only available in active master and the 'asyncSendToMasterAndForwardReply' method iterate over all of the masters, we have to handle the response from the backup masters in the client, which the developer did not consider in the SPARK-31486 change. So drivers running in cluster mode and on a cluster with multi masters affected by this bug.

### Why are the changes needed?

We need to find if the response received from a backup master client must ignore it.

### Does this PR introduce _any_ user-facing change?

No, It's only fixed a bug and brings back the ability to deploy in cluster mode on multi-master clusters.

### How was this patch tested?

Closes apache#34911 from mohamadrezarostami/fix-a-bug-in-report-driver-status.

Authored-by: Mohamadreza Rostami <mohamadrezarostami2@gmail.com>
Signed-off-by: yi.wu <yi.wu@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants