Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31923][Core]Ignore internal accumulators that use unrecognized types rather than crashing #28744

Closed
wants to merge 1 commit into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Jun 6, 2020

What changes were proposed in this pull request?

Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged.

Why are the changes needed?

A user may use internal accumulators by adding the internal.metrics. prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI).

However, org.apache.spark.util.JsonProtocol.accumValueToJson assumes an internal accumulator has only 3 possible types: int, long, and java.util.List[(BlockId, BlockStatus)]. When an internal accumulator uses an unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if SparkListenerTaskEnd is dropped because of this issue, the user will see the task is still running even if it was finished.

It's better to make accumValueToJson more robust because it's up to the user to pick up the accumulator name.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The new unit tests.

@SparkQA
Copy link

SparkQA commented Jun 7, 2020

Test build #123592 has finished for PR 28744 at commit 7716ab6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jun 8, 2020

cc @tdas @cloud-fan

@tdas
Copy link
Contributor

tdas commented Jun 8, 2020

LGTM. We should backport this branch-3.0 as well as this is a good narrow bug fix with low risk.

@zsxwing
Copy link
Member Author

zsxwing commented Jun 8, 2020

Thanks! I will also merge this to branch-2.4 for the same reason.

@asfgit asfgit closed this in b333ed0 Jun 8, 2020
asfgit pushed a commit that referenced this pull request Jun 8, 2020
…d types rather than crashing

### What changes were proposed in this pull request?

Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged.

### Why are the changes needed?

A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI).

However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished.

It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The new unit tests.

Closes #28744 from zsxwing/fix-internal-accum.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(cherry picked from commit b333ed0)
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
@zsxwing
Copy link
Member Author

zsxwing commented Jun 8, 2020

Merged to master and branch-3.0. There is some minor conflict with branch-2.4. I will submit a backport PR.

@zsxwing zsxwing deleted the fix-internal-accum branch June 8, 2020 19:13
zsxwing added a commit to zsxwing/spark that referenced this pull request Jun 8, 2020
…d types rather than crashing

Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged.

A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI).

However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished.

It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name.

No

The new unit tests.

Closes apache#28744 from zsxwing/fix-internal-accum.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
asfgit pushed a commit that referenced this pull request Jun 8, 2020
…d types rather than crashing (branch-2.4)

### What changes were proposed in this pull request?

Backport #28744 to branch-2.4.

### Why are the changes needed?

Low risky fix for branch-2.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

Closes #28758 from zsxwing/SPARK-31923-2.4.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
@dongjoon-hyun
Copy link
Member

Late LGTM. Thank you for making this available at master/3.0/2.4, @zsxwing and @tdas .

wankunde pushed a commit to wankunde/spark that referenced this pull request Mar 1, 2021
…d types rather than crashing

### What changes were proposed in this pull request?

Ignore internal accumulators that use unrecognized types rather than crashing so that an event log containing such accumulators can still be converted to JSON and logged.

### Why are the changes needed?

A user may use internal accumulators by adding the `internal.metrics.` prefix to the accumulator name to hide sensitive information from UI (Accumulators except internal ones will be shown in Spark UI).

However, `org.apache.spark.util.JsonProtocol.accumValueToJson` assumes an internal accumulator has only 3 possible types: `int`, `long`, and `java.util.List[(BlockId, BlockStatus)]`. When an internal accumulator uses an unexpected type, it will crash.

An event log that contains such accumulator will be dropped because it cannot be converted to JSON, and it will cause weird UI issue when rendering in Spark History Server. For example, if `SparkListenerTaskEnd` is dropped because of this issue, the user will see the task is still running even if it was finished.

It's better to make `accumValueToJson` more robust because it's up to the user to pick up the accumulator name.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The new unit tests.

Closes apache#28744 from zsxwing/fix-internal-accum.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants