New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-11213] Display Beam Metrics in Spark History Server #13743
Conversation
R: @ibzib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tszerszen! A couple questions:
- Does this PR affect the regular Spark UI at all, or does it only show metrics in the history server?
- These changes only apply to the portable Spark runner. What about the "classic"/non-portable version?
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
@@ -34,6 +34,12 @@ | |||
*/ | |||
public interface SparkPipelineOptions extends SparkCommonPipelineOptions { | |||
|
|||
@Description("The directory to save Spark History Server logs") | |||
@Default.String("/tmp/spark-events/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set spark.eventLog.dir
in the Spark conf? Or does that not matter?
beam/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkContextFactory.java
Line 88 in d1c8c24
SparkConf conf = new SparkConf(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter, however for consistency I think it would be good to configure it in such a way.
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
Run Java PreCommit |
R: @ibzib |
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineOptions.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
} | ||
})); | ||
eventLoggingListener.onApplicationEnd( | ||
new SparkListenerApplicationEnd(Instant.now().getMillis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure pipeline end time (and also start time for that matter) is itself a metric. To keep things consistent, it'd be better to use that metric here instead of Instant.now()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I printed out the results of renderAll method I didn't found such metrics for whole pipeline only for it's parts. Maybe not all metrics appear in renderAll method or should I filter for them specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it's not a blocker for this PR though. Thanks for checking.
R: @ibzib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tszerszen, this is looking a lot better. I have a few more comments.
} | ||
})); | ||
eventLoggingListener.onApplicationEnd( | ||
new SparkListenerApplicationEnd(Instant.now().getMillis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, it's not a blocker for this PR though. Thanks for checking.
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/test/java/org/apache/beam/runners/spark/metrics/SparkBeamMetricTest.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/metrics/SparkBeamMetric.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
I have not checked what is the current status on the classic runners, are the Beam metrics shown correctly? Do you think we can reuse this work (if they are not shown correctly) there too @tszerszen ? |
@iemejia I think this work can be reused there, since it's calling native Spark EventLoggingListener. |
Great to know @tszerszen, thanks . Sounds like a nice follow up issue to be created/fixed in case you feel motivated after this one is merged. |
R: @ibzib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tomasz, I think this will be good to go after addressing a couple things:
- Remove the new options from the Spark job server configuration, and all related code. (
getSparkHistoryDir
andgetEventLogEnabled
should only be pipeline options.) - Fix the SparkListenerExecutorAdded logic.
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
…stener executor added logic
…stener executor added logic
…stener executor added logic
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
…stener executor added logic
…stener executor added logic
…stener executor added logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor cleanup and then we can merge this.
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
Outdated
Show resolved
Hide resolved
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkJobServerDriver.java
Outdated
Show resolved
Hide resolved
Run Java PreCommit |
Run Java PreCommit |
R: @ibzib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Run Java PreCommit |
1 similar comment
Run Java PreCommit |
Java test failure seems to be a flake: BEAM-11746 |
Run Java PreCommit |
Create event log following Spark History Server format in default /tmp/spark-events log directory, so it could be read by Spark History Server.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.