Skip to content

[BEAM-3926] Add new metrics protos based on "Defining and adding SDK Metrics" htt…#5437

Merged
lukecwik merged 1 commit intoapache:masterfrom
ajamato:metrics_protos
Jun 5, 2018
Merged

[BEAM-3926] Add new metrics protos based on "Defining and adding SDK Metrics" htt…#5437
lukecwik merged 1 commit intoapache:masterfrom
ajamato:metrics_protos

Conversation

@ajamato
Copy link

@ajamato ajamato commented May 21, 2018

…ps://s.apache.org/beam-fn-api-metrics

Add new metrics protos based on "Defining and adding SDK Metrics" https://s.apache.org/beam-fn-api-metrics


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • [] If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

@ajamato
Copy link
Author

ajamato commented May 21, 2018

@robertwb @echauchot Would you please review. This is based off of this design.
https://s.apache.org/beam-fn-api-metrics

Apologies in advance if I have made any obvious mistakes, as I have not sent many PRs. Would love some pointers too, in order to keep things smooth in the future.

Copy link
Author

@ajamato ajamato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ajamato wrote:
@robertwb @echauchot Would you please review. This is based off of this design.
https://s.apache.org/beam-fn-api-metrics

Apologies in advance if I have made any obvious mistakes, as I have not sent many PRs. Would love some pointers too, in order to keep things smooth in the future.

Perhaps I have failed to properly keep my branch up to date with master, in this change I only meant to modify beam_fn_api_proto

@ajamato
Copy link
Author

ajamato commented May 21, 2018

With some help from @tgroh the other files have been removed now.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A compact representation of the proto data is

monitoring_status
    monitored_table_data
    metric
        data
            counter_data  # also used for gague
                int64_value
                string_value
                double_value
            distribution_data
                int_double_distribution
                double_distribution_data
            extrema_data
                int_values
                double_values

where each option is in a oneof (except where not possible). It always feels a bit odd to me to say "there are 11 possible values, and here they are" but this much structure seems to implies such confidence. Do you feel that if this grows over time it will be natural? If so, it'd be good to not block on this further.

// ProcessBundleProgressResponse was sent.
BundleSplit split = 2;

// (Optional) If metrics or monitored sttate reporting is supported by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/sttate/state

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// The Metric or monitored state.
oneof monitoring_status {
MonitoringTableData monitored_table_data = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weren't we going to merge these two, branching on type?

Copy link
Author

@ajamato ajamato May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we agreed to distinguish Metrics (The type of monitoring information compatible with metrics collection systems such as Drop Wizard and Stackdriver) from MonitoredState (Other relevant information to collect for debugging a pipeline, such as a Table of File states which indicate why a pipeline is stuck).

So the Metrics are all in the same group, so its clear that these can be used with systems such as Stackdriver and DropWizard.

message CounterData {
oneof value {
int64 int64_value = 1;
string string_value = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, I would order them int, double, string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

// Extrema messages are used for calculating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order these messages the same as above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// style of distribution metric.
message DistributionData {
oneof distribution {
IntDistributionData int_double_distribution = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int_double_distribution vs. double_distribution_data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int64 int64_value = 1;
string string_value = 2;
double double_value = 3;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No LatestString?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a string_value. Is that sufficient? Or were you thinking of something else?

// Only one of the two should be specified.
// Note: oneof is not allowed on repeated fields.
repeated int64 int_values = 1;
repeated double double_values = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top and bottom strings makes sense as well. (Actually, one of the most useful extrema is MostFrequent).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we table that one for now?

The more I think about that, I am not so sure where it belongs. It seems a lot like a Histogram, buckets of different strings with counts, with some cutoff that they are large enough.

It's not clear how you calculate that to me either. Because I think that you need to provide a list of strings and coutner for them, and you must send them all because you don't know if they will be in the Top-N until the future updates come in.

Where the Top-N/Bottom-N int/double just require sending the max N or min N values on every update, since you cal always use those updates to aggregate into the top-N for the whole pipeline over time.

@echauchot
Copy link
Contributor

@ajamato I don't know protobuf so I won't be pertinent enough to review this I'm afraid. I think @robertwb will be much more pertinent than me. But as a general comment from a protobuf newbie, I have the impression that the structure is a bit complex so hard to understand/maintain. I'm sure you went through keep-it-simple iterations but please ensure that this is not over-design to support un-probable future use cases.
Regarding the design it made me think of some comments, I'd rather do them in the doc. Sorry if they come a bit late.

message Extrema {
// Only one of the two should be specified.
// Note: oneof is not allowed on repeated fields.
repeated int64 int_values = 1;
Copy link
Member

@lukecwik lukecwik May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we follow the same pattern as DistributionData and use a oneof with IntExtremaData and a DoubleExtremaData?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Extrema messages are used for calculating
// Top-N/Bottom-N metrics.
message Extrema {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extrema -> ExtremaData to be consistent with the others.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// metric format. For example, a table of important files
// and metadata which an I/O source is reading.
// Note: Since MonitoredState is designed to be
// customizable, and allow engines to aggregate these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

engines don't customize them as below you mention that the aggregation is always just latest from the runners perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest across all shards/bundles? Or Union?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I did not describe this well. Updating... What I mean to say is that a custom URN can do a custom aggregation, if an engine choose to support it in its aggregation system.

Consider the I/O source reading files, emitting the table of file statuses:
The workers, will do the same thing always, just emit their 'current state'
I.e. every time I emit my oldest three file which have been waiting for data for over X hours.
Then if the engine supports the particular URN and has custom handling it can just maintain the Top-N oldest files.

Otherwise, it can just union all the information together and create a large MonitoringTableData (agnostic of what is inside it).

repeated MonitoringColumnValue values = 1;
}

repeated string column_names = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that the number of column_names must match the number of row_data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but its actually:
The number of column names must match the number of values in each MonitoringRow.

@ajamato ajamato force-pushed the metrics_protos branch 2 times, most recently from 9b864e3 to d75ae48 Compare May 29, 2018 22:54
@ajamato
Copy link
Author

ajamato commented May 29, 2018

@tgroh helped me fix up this PR into a good state again. Should be ready to review again

@ajamato
Copy link
Author

ajamato commented Jun 4, 2018

Hey @lukecwik I addressed your comments.

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, just ping me again when you want me to merge if you feel you would want to address them now or in a follow up PR.

}

// Data associated with a distribution metric.
// This is based off of the current DistributionData metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add . at the end after metric.

string type = 2;

// The Metric or monitored state.
oneof monitoring_status {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This doesn't look like a status type, should we just call this data like everywhere else?


// A set of key+value labels which define the scope of the metric.
// Either a well defined entity id for the keys:
// “transform”, “pcollection”, “windowing_strategy”,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to add an enum defining these "well" known strings?

This will allow developers across languages to have a consistent spelling.

// “transform”, “pcollection”, “windowing_strategy”,
// “coder”, “environment” or any arbitrary label
// set by a custom metric or user metric.
// A monitoring system is expected to be able to aggregate the metric together
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: metric -> metrics


// The Metric or monitored state.
oneof monitoring_status {
MonitoringTableData monitored_table_data = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitored_table_data -> monitoring_table_data

@ajamato ajamato force-pushed the metrics_protos branch 2 times, most recently from 5700961 to 03d4167 Compare June 4, 2018 19:58
@ajamato
Copy link
Author

ajamato commented Jun 4, 2018

test this please

@ajamato
Copy link
Author

ajamato commented Jun 4, 2018

test this please

1 similar comment
@ajamato
Copy link
Author

ajamato commented Jun 5, 2018

test this please

@echauchot
Copy link
Contributor

@ajamato I think there is no "test please" jenkins phrase, I think you meant "Run Java PreCommit" (was "retest this please" in the past)

@lukecwik lukecwik merged commit c1743cc into apache:master Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants