[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing #28744

zsxwing · 2020-06-06T23:20:08Z

What changes were proposed in this pull request?

Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged.

Why are the changes needed?

A user may use internal accumulators by adding the internal.metrics. prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI).

However, org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if SparkListenerTaskEnd is dropped because of this issue, the user will see the task is still running even if it was finished.

It's better to make accumValueToJson more robust because it's up to the user to pick up the accumulator name.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The new unit tests.

…crashing

SparkQA · 2020-06-07T01:58:21Z

Test build #123592 has finished for PR 28744 at commit 7716ab6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2020-06-08T15:03:57Z

cc @tdas @cloud-fan

tdas · 2020-06-08T16:50:07Z

LGTM. We should backport this branch-3.0 as well as this is a good narrow bug fix with low risk.

zsxwing · 2020-06-08T19:05:30Z

Thanks! I will also merge this to branch-2.4 for the same reason.

…d types rather than crashing ### What changes were proposed in this pull request? Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. ### Why are the changes needed? A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The new unit tests. Closes #28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com> (cherry picked from commit b333ed0) Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>

zsxwing · 2020-06-08T19:13:31Z

Merged to master and branch-3.0. There is some minor conflict with branch-2.4. I will submit a backport PR.

…d types rather than crashing Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. No The new unit tests. Closes apache#28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>

…d types rather than crashing (branch-2.4) ### What changes were proposed in this pull request? Backport #28744 to branch-2.4. ### Why are the changes needed? Low risky fix for branch-2.4. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #28758 from zsxwing/SPARK-31923-2.4. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>

dongjoon-hyun · 2020-06-14T06:04:26Z

Late LGTM. Thank you for making this available at master/3.0/2.4, @zsxwing and @tdas .

…d types rather than crashing ### What changes were proposed in this pull request? Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged. ### Why are the changes needed? A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI). However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash. An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished. It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The new unit tests. Closes apache#28744 from zsxwing/fix-internal-accum. Authored-by: Shixiong Zhu <zsxwing@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>

Ignore internal accumulators that use unrecognized types rather than …

7716ab6

…crashing

probot-autolabeler bot added the CORE label Jun 6, 2020

asfgit closed this in b333ed0 Jun 8, 2020

zsxwing deleted the fix-internal-accum branch June 8, 2020 19:13

zsxwing mentioned this pull request Jun 8, 2020

[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing (branch-2.4) #28758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing #28744

[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing #28744

zsxwing commented Jun 6, 2020

SparkQA commented Jun 7, 2020

zsxwing commented Jun 8, 2020

tdas commented Jun 8, 2020

zsxwing commented Jun 8, 2020

zsxwing commented Jun 8, 2020

dongjoon-hyun commented Jun 14, 2020

[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing #28744

[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing #28744

Conversation

zsxwing commented Jun 6, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Jun 7, 2020

zsxwing commented Jun 8, 2020

tdas commented Jun 8, 2020

zsxwing commented Jun 8, 2020

zsxwing commented Jun 8, 2020

dongjoon-hyun commented Jun 14, 2020