Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32124][CORE][SHS] Fix taskEndReasonFromJson to handle event logs from old Spark versions #28941

Closed
wants to merge 3 commits into from

Conversation

warrenzhu25
Copy link
Contributor

What changes were proposed in this pull request?

Fix bug of exception when parse event log of fetch failed task end reason without Map Index

Why are the changes needed?

When Spark history server read event log produced by older version of spark 2.4 (which don't have Map Index field), parsing of TaskEndReason will fail. This will cause TaskEnd event being ignored.

Does this PR introduce any user-facing change?

No

How was this patch tested?

JsonProtocolSuite.test("FetchFailed Map Index backwards compatibility")

@viirya
Copy link
Member

viirya commented Jun 28, 2020

ok to test

@SparkQA
Copy link

SparkQA commented Jun 28, 2020

Test build #124609 has finished for PR 28941 at commit b66fa2b.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for your first contribution, @warrenzhu25 .

  • Could you file a new JIRA issue for this?
  • Also, it would be great if you can use a proper title for your PR content.

@dongjoon-hyun
Copy link
Member

cc @xuanyuanking , @cloud-fan , @gatorsmile

@warrenzhu25 warrenzhu25 changed the title [SPARK-25341][CORE][FOLLOWUP] Support rolling back a shuffle map stage and re-generate the shuffle files [SHS] Failed to parse FetchFailed TaskEndReason from event log produce by Spark 2.4 Jun 28, 2020
@warrenzhu25 warrenzhu25 changed the title [SHS] Failed to parse FetchFailed TaskEndReason from event log produce by Spark 2.4 [SPARK-32124][SHS] Failed to parse FetchFailed TaskEndReason from event log produce by Spark 2.4 Jun 28, 2020
@viirya
Copy link
Member

viirya commented Jun 28, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jun 29, 2020

Test build #124612 has finished for PR 28941 at commit b66fa2b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2020

Test build #124613 has finished for PR 28941 at commit 910e285.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32124][SHS] Failed to parse FetchFailed TaskEndReason from event log produce by Spark 2.4 [SPARK-32124][CORE][SHS] Failed to parse FetchFailed TaskEndReason from event log produce by Spark 2.4 Jun 29, 2020
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32124][CORE][SHS] Failed to parse FetchFailed TaskEndReason from event log produce by Spark 2.4 [SPARK-32124][CORE][SHS] Failed to parse FetchFailed TaskEndReason from event log produce by old Spark versions Jun 29, 2020
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you so much, @warrenzhu25 and @viirya .
Merged to master/3.0.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32124][CORE][SHS] Failed to parse FetchFailed TaskEndReason from event log produce by old Spark versions [SPARK-32124][CORE][SHS] Fix taskEndReasonFromJson to handle event logs from old Spark versions Jun 29, 2020
dongjoon-hyun pushed a commit that referenced this pull request Jun 29, 2020
…gs from old Spark versions

### What changes were proposed in this pull request?
Fix bug of exception when parse event log of fetch failed task end reason without `Map Index`

### Why are the changes needed?
When Spark history server read event log produced by older version of spark 2.4 (which don't have `Map Index` field), parsing of TaskEndReason will fail. This will cause TaskEnd event being ignored.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
JsonProtocolSuite.test("FetchFailed Map Index backwards compatibility")

Closes #28941 from warrenzhu25/shs-task.

Authored-by: Warren Zhu <zhonzh@microsoft.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 197ac3b)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

SPARK-32124 is assigned to you, @warrenzhu25 .

@HeartSaVioR
Copy link
Contributor

Late LGTM. Thanks for taking care of!

Copy link
Member

@xuanyuanking xuanyuanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixing! Late LGTM.

Sorry for the late. Because of the fix has been merged, I think it's unnecessary to submit a follow-up PR for the small comment. If this file has change next time, maybe we can consider addressing this.

@@ -1078,7 +1078,10 @@ private[spark] object JsonProtocol {
val blockManagerAddress = blockManagerIdFromJson(json \ "Block Manager Address")
val shuffleId = (json \ "Shuffle ID").extract[Int]
val mapId = (json \ "Map ID").extract[Long]
val mapIndex = (json \ "Map Index").extract[Int]
val mapIndex = (json \ "Map Index") match {
case JNothing => 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 0 is a legal value for the map index here. Since it's for backward compatibility, an illegal value here should make more sense. Also, it's better to comment here for emphasizing it's for backward compatibility, otherwise the fetch failed event will be dropped when SHS loading old version logs.

Copy link
Contributor

@HeartSaVioR HeartSaVioR Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's valid concern, and caller may need to be prepared to handle the invalid value. Why not we fix it for FOLLOW-UP PR? We have been doing it even for nits, and this doesn't even look to be a nit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. It deserves a follow-up, let me submit one soon.

HeartSaVioR pushed a commit that referenced this pull request Jul 8, 2020
…fill the map index when the event logs from the old Spark version

### What changes were proposed in this pull request?
Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version.

### Why are the changes needed?
Follow up PR for #28941.

### Does this PR introduce _any_ user-facing change?
When we use the Spark version 3.0 history server reading the event log written by the old Spark version, we use the invalid value -2 to fill the map index.

### How was this patch tested?
Existing UT.

Closes #28965 from xuanyuanking/follow-up.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
HeartSaVioR pushed a commit that referenced this pull request Jul 8, 2020
…fill the map index when the event logs from the old Spark version

### What changes were proposed in this pull request?
Use the invalid value Int.MinValue to fill the map index when the event logs from the old Spark version.

### Why are the changes needed?
Follow up PR for #28941.

### Does this PR introduce _any_ user-facing change?
When we use the Spark version 3.0 history server reading the event log written by the old Spark version, we use the invalid value -2 to fill the map index.

### How was this patch tested?
Existing UT.

Closes #28965 from xuanyuanking/follow-up.

Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(cherry picked from commit 3659611)
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
@warrenzhu25 warrenzhu25 deleted the shs-task branch June 11, 2022 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants