[BEAM-3926] Add new metrics protos based on "Defining and adding SDK Metrics" htt…#5437
Conversation
|
@robertwb @echauchot Would you please review. This is based off of this design. Apologies in advance if I have made any obvious mistakes, as I have not sent many PRs. Would love some pointers too, in order to keep things smooth in the future. |
ajamato
left a comment
There was a problem hiding this comment.
ajamato wrote:
@robertwb @echauchot Would you please review. This is based off of this design.
https://s.apache.org/beam-fn-api-metricsApologies in advance if I have made any obvious mistakes, as I have not sent many PRs. Would love some pointers too, in order to keep things smooth in the future.
Perhaps I have failed to properly keep my branch up to date with master, in this change I only meant to modify beam_fn_api_proto
|
With some help from @tgroh the other files have been removed now. |
There was a problem hiding this comment.
A compact representation of the proto data is
monitoring_status
monitored_table_data
metric
data
counter_data # also used for gague
int64_value
string_value
double_value
distribution_data
int_double_distribution
double_distribution_data
extrema_data
int_values
double_values
where each option is in a oneof (except where not possible). It always feels a bit odd to me to say "there are 11 possible values, and here they are" but this much structure seems to implies such confidence. Do you feel that if this grows over time it will be natural? If so, it'd be good to not block on this further.
| // ProcessBundleProgressResponse was sent. | ||
| BundleSplit split = 2; | ||
|
|
||
| // (Optional) If metrics or monitored sttate reporting is supported by |
|
|
||
| // The Metric or monitored state. | ||
| oneof monitoring_status { | ||
| MonitoringTableData monitored_table_data = 3; |
There was a problem hiding this comment.
Weren't we going to merge these two, branching on type?
There was a problem hiding this comment.
No, we agreed to distinguish Metrics (The type of monitoring information compatible with metrics collection systems such as Drop Wizard and Stackdriver) from MonitoredState (Other relevant information to collect for debugging a pipeline, such as a Table of File states which indicate why a pipeline is stuck).
So the Metrics are all in the same group, so its clear that these can be used with systems such as Stackdriver and DropWizard.
| message CounterData { | ||
| oneof value { | ||
| int64 int64_value = 1; | ||
| string string_value = 2; |
There was a problem hiding this comment.
For consistency, I would order them int, double, string.
| } | ||
| } | ||
|
|
||
| // Extrema messages are used for calculating |
There was a problem hiding this comment.
Order these messages the same as above.
| // style of distribution metric. | ||
| message DistributionData { | ||
| oneof distribution { | ||
| IntDistributionData int_double_distribution = 1; |
There was a problem hiding this comment.
int_double_distribution vs. double_distribution_data?
| int64 int64_value = 1; | ||
| string string_value = 2; | ||
| double double_value = 3; | ||
| } |
There was a problem hiding this comment.
There is a string_value. Is that sufficient? Or were you thinking of something else?
| // Only one of the two should be specified. | ||
| // Note: oneof is not allowed on repeated fields. | ||
| repeated int64 int_values = 1; | ||
| repeated double double_values = 2; |
There was a problem hiding this comment.
Top and bottom strings makes sense as well. (Actually, one of the most useful extrema is MostFrequent).
There was a problem hiding this comment.
Can we table that one for now?
The more I think about that, I am not so sure where it belongs. It seems a lot like a Histogram, buckets of different strings with counts, with some cutoff that they are large enough.
It's not clear how you calculate that to me either. Because I think that you need to provide a list of strings and coutner for them, and you must send them all because you don't know if they will be in the Top-N until the future updates come in.
Where the Top-N/Bottom-N int/double just require sending the max N or min N values on every update, since you cal always use those updates to aggregate into the top-N for the whole pipeline over time.
|
@ajamato I don't know protobuf so I won't be pertinent enough to review this I'm afraid. I think @robertwb will be much more pertinent than me. But as a general comment from a protobuf newbie, I have the impression that the structure is a bit complex so hard to understand/maintain. I'm sure you went through keep-it-simple iterations but please ensure that this is not over-design to support un-probable future use cases. |
| message Extrema { | ||
| // Only one of the two should be specified. | ||
| // Note: oneof is not allowed on repeated fields. | ||
| repeated int64 int_values = 1; |
There was a problem hiding this comment.
Could we follow the same pattern as DistributionData and use a oneof with IntExtremaData and a DoubleExtremaData?
|
|
||
| // Extrema messages are used for calculating | ||
| // Top-N/Bottom-N metrics. | ||
| message Extrema { |
There was a problem hiding this comment.
Extrema -> ExtremaData to be consistent with the others.
| // metric format. For example, a table of important files | ||
| // and metadata which an I/O source is reading. | ||
| // Note: Since MonitoredState is designed to be | ||
| // customizable, and allow engines to aggregate these |
There was a problem hiding this comment.
engines don't customize them as below you mention that the aggregation is always just latest from the runners perspective.
There was a problem hiding this comment.
Latest across all shards/bundles? Or Union?
There was a problem hiding this comment.
Perhaps I did not describe this well. Updating... What I mean to say is that a custom URN can do a custom aggregation, if an engine choose to support it in its aggregation system.
Consider the I/O source reading files, emitting the table of file statuses:
The workers, will do the same thing always, just emit their 'current state'
I.e. every time I emit my oldest three file which have been waiting for data for over X hours.
Then if the engine supports the particular URN and has custom handling it can just maintain the Top-N oldest files.
Otherwise, it can just union all the information together and create a large MonitoringTableData (agnostic of what is inside it).
| repeated MonitoringColumnValue values = 1; | ||
| } | ||
|
|
||
| repeated string column_names = 1; |
There was a problem hiding this comment.
Mention that the number of column_names must match the number of row_data.
There was a problem hiding this comment.
Done, but its actually:
The number of column names must match the number of values in each MonitoringRow.
9b864e3 to
d75ae48
Compare
|
@tgroh helped me fix up this PR into a good state again. Should be ready to review again |
|
Hey @lukecwik I addressed your comments. |
lukecwik
left a comment
There was a problem hiding this comment.
Minor nits, just ping me again when you want me to merge if you feel you would want to address them now or in a follow up PR.
| } | ||
|
|
||
| // Data associated with a distribution metric. | ||
| // This is based off of the current DistributionData metric |
There was a problem hiding this comment.
nit: Add . at the end after metric.
| string type = 2; | ||
|
|
||
| // The Metric or monitored state. | ||
| oneof monitoring_status { |
There was a problem hiding this comment.
nit: This doesn't look like a status type, should we just call this data like everywhere else?
|
|
||
| // A set of key+value labels which define the scope of the metric. | ||
| // Either a well defined entity id for the keys: | ||
| // “transform”, “pcollection”, “windowing_strategy”, |
There was a problem hiding this comment.
Want to add an enum defining these "well" known strings?
This will allow developers across languages to have a consistent spelling.
| // “transform”, “pcollection”, “windowing_strategy”, | ||
| // “coder”, “environment” or any arbitrary label | ||
| // set by a custom metric or user metric. | ||
| // A monitoring system is expected to be able to aggregate the metric together |
|
|
||
| // The Metric or monitored state. | ||
| oneof monitoring_status { | ||
| MonitoringTableData monitored_table_data = 3; |
There was a problem hiding this comment.
monitored_table_data -> monitoring_table_data
5700961 to
03d4167
Compare
|
test this please |
|
test this please |
1 similar comment
|
test this please |
|
@ajamato I think there is no "test please" jenkins phrase, I think you meant "Run Java PreCommit" (was "retest this please" in the past) |
…ps://s.apache.org/beam-fn-api-metrics
Add new metrics protos based on "Defining and adding SDK Metrics" https://s.apache.org/beam-fn-api-metrics
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.