Skip to content

Conversation

JoshRosen
Copy link
Contributor

What changes were proposed in this pull request?

This PR updates JsonProtocol so that its accumulableExcludeList is considered when logging accumulable in TaskEndReasons and ExecutorMetricUpdate events.

This exclusion list was originally added in #17412 and originally only applied to StageInfo and TaskInfo logging.

Why are the changes needed?

Reduce event log size and improve event logger throughput by avoiding the need to process possibly large / expensive accumulators.

This gap for ExecutorMetricsUpdate was noted in a comment (#17412 (comment)); I recently rediscovered it in my own Spark usage.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Mar 4, 2025
@JoshRosen JoshRosen changed the title [SPARK-51378] [SPARK-51378] Apply JsonProtocol's accumulableExcludeList to ExecutorMetricsUpdate and TaskEndReason Mar 4, 2025
@HyukjinKwon HyukjinKwon changed the title [SPARK-51378] Apply JsonProtocol's accumulableExcludeList to ExecutorMetricsUpdate and TaskEndReason [SPARK-51378][CORE] Apply JsonProtocol's accumulableExcludeList to ExecutorMetricsUpdate and TaskEndReason Mar 4, 2025
@HyukjinKwon
Copy link
Member

Merged to master and branch-4.0.

HyukjinKwon pushed a commit that referenced this pull request Mar 4, 2025
…ecutorMetricsUpdate and TaskEndReason

### What changes were proposed in this pull request?

This PR updates `JsonProtocol` so that its `accumulableExcludeList` is considered when logging accumulable in TaskEndReasons and ExecutorMetricUpdate events.

This exclusion list was originally added in #17412 and originally only applied to StageInfo and TaskInfo logging.

### Why are the changes needed?

Reduce event log size and improve event logger throughput by avoiding the need to process possibly large / expensive accumulators.

This gap for ExecutorMetricsUpdate was noted in a comment (#17412 (comment)); I recently rediscovered it in my own Spark usage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50141 from JoshRosen/apply-accumulableExcludeList-to-other-event-log-fields.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 62f7f14)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Pajaraja pushed a commit to Pajaraja/spark that referenced this pull request Mar 6, 2025
…ecutorMetricsUpdate and TaskEndReason

### What changes were proposed in this pull request?

This PR updates `JsonProtocol` so that its `accumulableExcludeList` is considered when logging accumulable in TaskEndReasons and ExecutorMetricUpdate events.

This exclusion list was originally added in apache#17412 and originally only applied to StageInfo and TaskInfo logging.

### Why are the changes needed?

Reduce event log size and improve event logger throughput by avoiding the need to process possibly large / expensive accumulators.

This gap for ExecutorMetricsUpdate was noted in a comment (apache#17412 (comment)); I recently rediscovered it in my own Spark usage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50141 from JoshRosen/apply-accumulableExcludeList-to-other-event-log-fields.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
anoopj pushed a commit to anoopj/spark that referenced this pull request Mar 15, 2025
…ecutorMetricsUpdate and TaskEndReason

### What changes were proposed in this pull request?

This PR updates `JsonProtocol` so that its `accumulableExcludeList` is considered when logging accumulable in TaskEndReasons and ExecutorMetricUpdate events.

This exclusion list was originally added in apache#17412 and originally only applied to StageInfo and TaskInfo logging.

### Why are the changes needed?

Reduce event log size and improve event logger throughput by avoiding the need to process possibly large / expensive accumulators.

This gap for ExecutorMetricsUpdate was noted in a comment (apache#17412 (comment)); I recently rediscovered it in my own Spark usage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50141 from JoshRosen/apply-accumulableExcludeList-to-other-event-log-fields.

Authored-by: Josh Rosen <joshrosen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants