Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4537][Streaming] Expand StreamingSource to add more metrics #3466

Closed
wants to merge 5 commits into from

Conversation

jerryshao
Copy link
Contributor

Add processingDelay, schedulingDelay and totalDelay for the last completed batch. Add lastReceivedBatchRecords and totalReceivedBatchRecords to the received records counting.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23868 has started for PR 3466 at commit c7a9376.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23868 has finished for PR 3466 at commit c7a9376.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23868/
Test PASSed.

// Gauge for last received batch records and total received batch records.
private var totalReceivedBatchRecords: Long = 0L
def getTotalReceivedBatchRecords(listener: StreamingJobProgressListener): Long = {
totalReceivedBatchRecords += listener.lastReceivedBatchRecords.values.sum
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this counter work? I think that gauges are collected only on request of a source, so if nobody is consuming the metric or consuming it too often, we will have a wrong count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, you're right, so it is hard to collect the total batch records without modifying the StreamingJobProgressListener code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this, thanks a lot.

@SparkQA
Copy link

SparkQA commented Nov 27, 2014

Test build #23913 has started for PR 3466 at commit 02dd44f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 27, 2014

Test build #23913 has finished for PR 3466 at commit 02dd44f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23913/
Test PASSed.

@@ -62,6 +62,14 @@ private[streaming] class StreamingSource(ssc: StreamingContext) extends Source {
registerGauge("lastCompletedBatch_processEndTime",
_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)

// Gauge for last completed batch's delay information.
registerGauge("lastCompletedBatch_processTime",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its better to name this "processingTime"

@jerryshao
Copy link
Contributor Author

Thanks a lot TD for your comments, I will factor out the old code and fix the above issues you mentioned.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24751 has started for PR 3466 at commit c097ddc.

  • This patch merges cleanly.

@jerryshao
Copy link
Contributor Author

Hey TD, I've addressed the problem you mentioned in this way, I'm not is this what you want, would you mind taking a look at it ? Thanks a lot.

defaultValue: T) {
metricRegistry.register(MetricRegistry.name("streaming", name), new Gauge[T] {
override def getValue: T = Option(f(streamingListener)).getOrElse(defaultValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to keep the Option here (and document that defaultValue is used when f returns null. And other places should not have to use Option. This is safer for any one to use and also minimizes the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will revert it back and try a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi TD, what's your meaning of "And other places should not have to use Option", If here as an example, change to

registerGauge("lastCompletedBatch_submissionTime",
    _.lastCompletedBatch.map(_.submissionTime).get, -1L)

get will throw exception when there's no completed batch. I'm not sure what's actual meaning, sorry if I misunderstand anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi TD, sorry to bother you again, I'm not if there's a better way to address this problem, would you mind giving me some hints, thanks a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the problem. Good catch, I did not realize that. How about this. Lets make two versions of registerGauge, one that takes f: StreamingProgressListener => T without any default value, another that takes f: StreamingProgressListener => Option[T] and the default value. Each version will be used accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, I will change the code as you suggested.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24751 has finished for PR 3466 at commit c097ddc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24751/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24808 has started for PR 3466 at commit 44721a6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24808 has finished for PR 3466 at commit 44721a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24808/
Test PASSed.

@@ -35,6 +35,15 @@ private[streaming] class StreamingSource(ssc: StreamingContext) extends Source {
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The existing version of registerGauge could have used the new version. Not a big deal, very small amount of duplicate code.

@tdas
Copy link
Contributor

tdas commented Dec 25, 2014

Just a couple of more comments for making the name more consistent with existing ones. Otherwise I approve of the how the registerGauge works now.

@SparkQA
Copy link

SparkQA commented Dec 26, 2014

Test build #24827 has started for PR 3466 at commit 00f5f7f.

  • This patch merges cleanly.

@jerryshao
Copy link
Contributor Author

Hi TD, thanks a lot for your comments, I just change the code style as you suggested, also add one more metrics totalProcessedRecords, would you mind reviewing this again? Thanks a lot and appreciate your time.

@tdas
Copy link
Contributor

tdas commented Dec 26, 2014

LGTM. Merging this. Thanks @jerryshao

asfgit pushed a commit that referenced this pull request Dec 26, 2014
Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting.

Author: jerryshao <saisai.shao@intel.com>

Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits:

00f5f7f [jerryshao] Change the code style and add totalProcessedRecords
44721a6 [jerryshao] Further address the comments
c097ddc [jerryshao] Address the comments
02dd44f [jerryshao] Fix the addressed comments
c7a9376 [jerryshao] Expand StreamingSource to add more metrics

(cherry picked from commit f205fe4)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in f205fe4 Dec 26, 2014
asfgit pushed a commit that referenced this pull request Dec 26, 2014
Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting.

Author: jerryshao <saisai.shao@intel.com>

Closes #3466 from jerryshao/SPARK-4537 and squashes the following commits:

00f5f7f [jerryshao] Change the code style and add totalProcessedRecords
44721a6 [jerryshao] Further address the comments
c097ddc [jerryshao] Address the comments
02dd44f [jerryshao] Fix the addressed comments
c7a9376 [jerryshao] Expand StreamingSource to add more metrics

(cherry picked from commit f205fe4)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@SparkQA
Copy link

SparkQA commented Dec 26, 2014

Test build #24827 has finished for PR 3466 at commit 00f5f7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24827/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants