[BEAM-3926] Add new metrics protos based on "Defining and adding SDK Metrics" htt… by ajamato · Pull Request #5437 · apache/beam

ajamato · 2018-05-21T18:34:45Z

…ps://s.apache.org/beam-fn-api-metrics

Add new metrics protos based on "Defining and adding SDK Metrics" https://s.apache.org/beam-fn-api-metrics

Follow this checklist to help us incorporate your contribution quickly and easily:

Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
[] If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

ajamato · 2018-05-21T18:37:40Z

@robertwb @echauchot Would you please review. This is based off of this design.
https://s.apache.org/beam-fn-api-metrics

Apologies in advance if I have made any obvious mistakes, as I have not sent many PRs. Would love some pointers too, in order to keep things smooth in the future.

ajamato

ajamato wrote:
@robertwb @echauchot Would you please review. This is based off of this design.
https://s.apache.org/beam-fn-api-metrics

Apologies in advance if I have made any obvious mistakes, as I have not sent many PRs. Would love some pointers too, in order to keep things smooth in the future.

Perhaps I have failed to properly keep my branch up to date with master, in this change I only meant to modify beam_fn_api_proto

ajamato · 2018-05-21T23:32:35Z

With some help from @tgroh the other files have been removed now.

robertwb

A compact representation of the proto data is

monitoring_status
    monitored_table_data
    metric
        data
            counter_data  # also used for gague
                int64_value
                string_value
                double_value
            distribution_data
                int_double_distribution
                double_distribution_data
            extrema_data
                int_values
                double_values

where each option is in a oneof (except where not possible). It always feels a bit odd to me to say "there are 11 possible values, and here they are" but this much structure seems to implies such confidence. Do you feel that if this grows over time it will be natural? If so, it'd be good to not block on this further.

robertwb · 2018-05-22T23:27:20Z

model/fn-execution/src/main/proto/beam_fn_api.proto

  // ProcessBundleProgressResponse was sent.
  BundleSplit split = 2;
+
+  // (Optional) If metrics or monitored sttate reporting is supported by


s/sttate/state

robertwb · 2018-05-22T23:29:24Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+
+  // The Metric or monitored state.
+  oneof monitoring_status {
+    MonitoringTableData monitored_table_data = 3;


Weren't we going to merge these two, branching on type?

No, we agreed to distinguish Metrics (The type of monitoring information compatible with metrics collection systems such as Drop Wizard and Stackdriver) from MonitoredState (Other relevant information to collect for debugging a pipeline, such as a Table of File states which indicate why a pipeline is stuck).

So the Metrics are all in the same group, so its clear that these can be used with systems such as Stackdriver and DropWizard.

robertwb · 2018-05-22T23:37:52Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+message CounterData {
+   oneof value {
+     int64 int64_value = 1;
+     string string_value = 2;


For consistency, I would order them int, double, string.

robertwb · 2018-05-22T23:38:47Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+   }
+}
+
+// Extrema messages are used for calculating


Order these messages the same as above.

robertwb · 2018-05-22T23:39:45Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+// style of distribution metric.
+message DistributionData {
+  oneof distribution {
+    IntDistributionData int_double_distribution = 1;


int_double_distribution vs. double_distribution_data?

robertwb · 2018-05-22T23:48:47Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+     int64 int64_value = 1;
+     string string_value = 2;
+     double double_value = 3;
+   }


No LatestString?

There is a string_value. Is that sufficient? Or were you thinking of something else?

robertwb · 2018-05-22T23:52:47Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+  // Only one of the two should be specified.
+  // Note: oneof is not allowed on repeated fields.
+  repeated int64 int_values = 1;
+  repeated double double_values = 2;


Top and bottom strings makes sense as well. (Actually, one of the most useful extrema is MostFrequent).

Can we table that one for now?

The more I think about that, I am not so sure where it belongs. It seems a lot like a Histogram, buckets of different strings with counts, with some cutoff that they are large enough.

It's not clear how you calculate that to me either. Because I think that you need to provide a list of strings and coutner for them, and you must send them all because you don't know if they will be in the Top-N until the future updates come in.

Where the Top-N/Bottom-N int/double just require sending the max N or min N values on every update, since you cal always use those updates to aggregate into the top-N for the whole pipeline over time.

echauchot · 2018-05-23T09:13:48Z

@ajamato I don't know protobuf so I won't be pertinent enough to review this I'm afraid. I think @robertwb will be much more pertinent than me. But as a general comment from a protobuf newbie, I have the impression that the structure is a bit complex so hard to understand/maintain. I'm sure you went through keep-it-simple iterations but please ensure that this is not over-design to support un-probable future use cases.
Regarding the design it made me think of some comments, I'd rather do them in the doc. Sorry if they come a bit late.

lukecwik · 2018-05-23T18:58:21Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+message Extrema {
+  // Only one of the two should be specified.
+  // Note: oneof is not allowed on repeated fields.
+  repeated int64 int_values = 1;


Could we follow the same pattern as DistributionData and use a oneof with IntExtremaData and a DoubleExtremaData?

lukecwik · 2018-05-23T19:45:25Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+
+// Extrema messages are used for calculating
+// Top-N/Bottom-N metrics.
+message Extrema {


Extrema -> ExtremaData to be consistent with the others.

lukecwik · 2018-05-23T19:46:43Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+// metric format. For example, a table of important files
+// and metadata which an I/O source is reading.
+// Note: Since MonitoredState is designed to be
+// customizable, and allow engines to aggregate these


engines don't customize them as below you mention that the aggregation is always just latest from the runners perspective.

Latest across all shards/bundles? Or Union?

Perhaps I did not describe this well. Updating... What I mean to say is that a custom URN can do a custom aggregation, if an engine choose to support it in its aggregation system.

Consider the I/O source reading files, emitting the table of file statuses:
The workers, will do the same thing always, just emit their 'current state'
I.e. every time I emit my oldest three file which have been waiting for data for over X hours.
Then if the engine supports the particular URN and has custom handling it can just maintain the Top-N oldest files.

Otherwise, it can just union all the information together and create a large MonitoringTableData (agnostic of what is inside it).

lukecwik · 2018-05-23T19:47:56Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+    repeated MonitoringColumnValue values = 1;
+  }
+
+ repeated string column_names = 1;


Mention that the number of column_names must match the number of row_data.

Done, but its actually:
The number of column names must match the number of values in each MonitoringRow.

ajamato · 2018-05-29T22:58:36Z

@tgroh helped me fix up this PR into a good state again. Should be ready to review again

ajamato · 2018-06-04T16:24:39Z

Hey @lukecwik I addressed your comments.

lukecwik

Minor nits, just ping me again when you want me to merge if you feel you would want to address them now or in a follow up PR.

lukecwik · 2018-06-04T16:31:48Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+}
+
+// Data associated with a distribution metric.
+// This is based off of the current DistributionData metric


nit: Add . at the end after metric.

lukecwik · 2018-06-04T16:36:38Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+  string type = 2;
+
+  // The Metric or monitored state.
+  oneof monitoring_status {


nit: This doesn't look like a status type, should we just call this data like everywhere else?

lukecwik · 2018-06-04T16:37:30Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+
+  // A set of key+value labels which define the scope of the metric.
+  // Either a well defined entity id for the keys:
+  // “transform”, “pcollection”, “windowing_strategy”,


Want to add an enum defining these "well" known strings?

This will allow developers across languages to have a consistent spelling.

lukecwik · 2018-06-04T16:39:01Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+  // “transform”, “pcollection”, “windowing_strategy”,
+  // “coder”, “environment” or any arbitrary label
+  // set by a custom metric or user metric.
+  // A monitoring system is expected to be able to aggregate the metric together


nit: metric -> metrics

lukecwik · 2018-06-04T16:39:59Z

model/fn-execution/src/main/proto/beam_fn_api.proto

+
+  // The Metric or monitored state.
+  oneof monitoring_status {
+    MonitoringTableData monitored_table_data = 3;


monitored_table_data -> monitoring_table_data

ajamato · 2018-06-04T20:04:24Z

test this please

ajamato · 2018-06-04T22:43:46Z

test this please

ajamato · 2018-06-05T00:00:58Z

test this please

echauchot · 2018-06-05T10:47:42Z

@ajamato I think there is no "test please" jenkins phrase, I think you meant "Run Java PreCommit" (was "retest this please" in the past)

ajamato commented May 21, 2018

View reviewed changes

ajamato force-pushed the metrics_protos branch from a02551a to 06f4ec9 Compare May 21, 2018 23:31

robertwb reviewed May 22, 2018

View reviewed changes

lukecwik requested changes May 23, 2018

View reviewed changes

ajamato force-pushed the metrics_protos branch 2 times, most recently from 9b864e3 to d75ae48 Compare May 29, 2018 22:54

lukecwik approved these changes Jun 4, 2018

View reviewed changes

ajamato force-pushed the metrics_protos branch 2 times, most recently from 5700961 to 03d4167 Compare June 4, 2018 19:58

Add new metrics protos based on s.apache.org/beam-fn-api-metrics

4d384eb

ajamato force-pushed the metrics_protos branch from 035604d to 4d384eb Compare June 4, 2018 21:18

lukecwik merged commit c1743cc into apache:master Jun 5, 2018

Conversation

ajamato commented May 21, 2018

Uh oh!

ajamato commented May 21, 2018

Uh oh!

ajamato left a comment

Choose a reason for hiding this comment

Uh oh!

ajamato commented May 21, 2018

Uh oh!

robertwb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajamato May 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

echauchot commented May 23, 2018

Uh oh!

lukecwik May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajamato commented May 29, 2018

Uh oh!

ajamato commented Jun 4, 2018

Uh oh!

lukecwik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

robertwb left a comment •

edited

Loading

ajamato May 29, 2018 •

edited

Loading

lukecwik May 23, 2018 •

edited

Loading