[BEAM-1048] Added a per-batch read duration metric to SparkUnboundedSource.#2073
[BEAM-1048] Added a per-batch read duration metric to SparkUnboundedSource.#2073staslev wants to merge 2 commits intoapache:masterfrom staslev:BEAM-1048-reporting-batch-read-duration-metrics
Conversation
|
Refer to this link for build results (access rights to CI server needed): |
|
Don't review yet, got more stuff to do here. This is broken at the moment since the metrics in this PR are not measured (at least not all of them) per-batch, which is the whole point. |
|
OK |
|
Sorry about that, got ahead of myself. |
|
np |
|
@staslev is there an update here? |
|
Not yet, but hopefully I'll get around to it soon. |
|
R: @aviemzur |
|
Refer to this link for build results (access rights to CI server needed): Failed Tests: 2beam_PreCommit_Java_MavenInstall/org.apache.beam:beam-runners-spark: 2
--none-- |
|
Refer to this link for build results (access rights to CI server needed): |
aviemzur
left a comment
There was a problem hiding this comment.
Great feature! A few comments.
There was a problem hiding this comment.
Use Metrics API Metrics.gauge(...)
There was a problem hiding this comment.
Not related to your change, but I think the initialization of stopwatch should be moved to before the call to microbatchSource.createReader since this is a time intensive operation as well.
There was a problem hiding this comment.
Setting the value here will have splits overwriting each other. The real number we're looking for here is the max read duration among all splits per microbatch. Consider calculating this number in ReadReportDStream.
There was a problem hiding this comment.
I agree, while moving this to ReadReportDStream#compute will not entirely solve this, as the metrics reporter till samples the value once in a reporting-period, it does open the door to lowering the reporting-period so that it actually reports the value (instead of sampling).
|
Refer to this link for build results (access rights to CI server needed): |
|
Run Spark ValidatesRunner |
|
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Since this is a Spark-runner specific metric, perhaps we should put it in a namespace that indicates this, something like spark_metrics
There was a problem hiding this comment.
Ok, changed io to spark-runner.io.
|
LGTM |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
R: @amitsela