Skip to content

[BEAM-1048] Added a per-batch read duration metric to SparkUnboundedSource.#2073

Closed
staslev wants to merge 2 commits intoapache:masterfrom
staslev:BEAM-1048-reporting-batch-read-duration-metrics
Closed

[BEAM-1048] Added a per-batch read duration metric to SparkUnboundedSource.#2073
staslev wants to merge 2 commits intoapache:masterfrom
staslev:BEAM-1048-reporting-batch-read-duration-metrics

Conversation

@staslev
Copy link
Contributor

@staslev staslev commented Feb 22, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

R: @amitsela

@coveralls
Copy link

Coverage Status

Coverage remained the same at 69.297% when pulling 1e02ab5 on staslev:BEAM-1048-reporting-batch-read-duration-metrics into 8b2e8f2 on apache:master.

@asfbot
Copy link

asfbot commented Feb 22, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7701/
--none--

@staslev
Copy link
Contributor Author

staslev commented Feb 22, 2017

Don't review yet, got more stuff to do here.

This is broken at the moment since the metrics in this PR are not measured (at least not all of them) per-batch, which is the whole point.

@amitsela
Copy link
Member

OK

@staslev
Copy link
Contributor Author

staslev commented Feb 22, 2017

Sorry about that, got ahead of myself.

@amitsela
Copy link
Member

np

@aaltay
Copy link
Member

aaltay commented Mar 16, 2017

@staslev is there an update here?

@staslev
Copy link
Contributor Author

staslev commented Mar 16, 2017

Not yet, but hopefully I'll get around to it soon.

@staslev
Copy link
Contributor Author

staslev commented Apr 2, 2017

R: @aviemzur

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 70.145% when pulling 67750ce on staslev:BEAM-1048-reporting-batch-read-duration-metrics into ea33e33 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.006%) to 70.142% when pulling 67750ce on staslev:BEAM-1048-reporting-batch-read-duration-metrics into ea33e33 on apache:master.

@asfbot
Copy link

asfbot commented Apr 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/9055/
--none--

Copy link
Member

@aviemzur aviemzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature! A few comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Metrics API Metrics.gauge(...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double space here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your change, but I think the initialization of stopwatch should be moved to before the call to microbatchSource.createReader since this is a time intensive operation as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the value here will have splits overwriting each other. The real number we're looking for here is the max read duration among all splits per microbatch. Consider calculating this number in ReadReportDStream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, while moving this to ReadReportDStream#compute will not entirely solve this, as the metrics reporter till samples the value once in a reporting-period, it does open the door to lowering the reporting-period so that it actually reports the value (instead of sampling).

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 70.148% when pulling 6acdd54 on staslev:BEAM-1048-reporting-batch-read-duration-metrics into ea33e33 on apache:master.

@asfbot
Copy link

asfbot commented Apr 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/9056/
--none--

@staslev
Copy link
Contributor Author

staslev commented Apr 2, 2017

Run Spark ValidatesRunner

@asfbot
Copy link

asfbot commented Apr 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/1474/
--none--

Copy link
Member

@aviemzur aviemzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more comment regarding the metric namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a Spark-runner specific metric, perhaps we should put it in a namespace that indicates this, something like spark_metrics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed io to spark-runner.io.

@aviemzur
Copy link
Member

aviemzur commented Apr 2, 2017

LGTM

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 70.132% when pulling 003cfcd on staslev:BEAM-1048-reporting-batch-read-duration-metrics into ea33e33 on apache:master.

@asfbot
Copy link

asfbot commented Apr 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/9057/
--none--

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 70.132% when pulling 003cfcd on staslev:BEAM-1048-reporting-batch-read-duration-metrics into ea33e33 on apache:master.

@asfbot
Copy link

asfbot commented Apr 2, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/9058/
--none--

@asfgit asfgit closed this in fe1d412 Apr 2, 2017
@staslev staslev deleted the BEAM-1048-reporting-batch-read-duration-metrics branch April 2, 2017 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments