Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data model proto #126

Merged
merged 6 commits into from
Nov 5, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions proto/openmetrics_data_model.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
syntax = "proto3";

package openmetrics;

// This is a description of the OpenMetrics data model in a proto
// form. Additionally, it serves as the proto wire format.
//
// Interpret "required" as semantically required. Some fields are
// semantically required depending on the values of the other fields
// (and indicated by comments).

// The top-level message sent on the wire.
message MetricSet {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a better name is MetricList. Set semantics are a bit unclear here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a set, you aren't allowed have duplicates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then probably document that, to make it clear why is a set and what is the "hash" function.

// Each MetricPoints has one or more points from a single metric.
repeated MetricPoints metric_points = 1;
}

// One or more timeseries for a single metric, where each timeseries has
// one or more points.
message MetricPoints {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just Metric?

// Required.
MetricDescriptor metric_descriptor = 1;

repeated Timeseries timeseries = 2;
}

// A single timeseries.
message Timeseries {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anywhere something like MonitoredResource in Stackdriver or __meta__ labels (in Prometheus, hope I used the correct name). Is this intentional? How are these reported?

Couple of use-cases where these are necessary and cannot be determine even if this data model is focused on a pull mechanism which would allow the backend to annotate these meta labels in general:

  1. Task (metric producer) -> Proxy -> Metric Backend. In this case the meta labels should be included in the metric because the backend does not know about the metric producer and cannot associate the right meta labels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what Info metrics are for.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but I have some problems understanding some of the things here, maybe worth some clarification:

  1. I don't understand how does the correlation between Info metrics and the other metrics happen. Should we have one info metric per MetricSet? Can I have in the same MetricSet metrics from multiple producers (in case of a proxy if I monitor multiple tasks)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are allowing multiple info metrics, e.g. build_info. It is up to the receiver to decide what to do with them. But they will all apply to the whole MetricSet, so one shouldn't mix multiple producers in the same MetricSet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's no limit on how many info metrics you can have.

// Labels applying to all points.
repeated Label label = 1;

// If there is more than one point, all must have a timestamp field
// and they must be in increasing order of timestamp.
// If there is only one point and has no timestamp field, the receiver
// will assign a timestamp.
repeated Point point = 2;
}

// The descriptor of a metric.
message MetricDescriptor {
// Required.
// Names must match the regex [a-zA-Z_:][a-zA-Z0-9_:]* however colons are
// reserved for monitoring system use, and names beginning with
// underscores are reserved for monitoring-system internal use.
string name = 1;

// TODO: Format?
// STATE_SETs and INFO types must not have a unit.
string unit = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps instead be an enum?

Copy link
Collaborator

@brian-brazil brian-brazil Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's too many potential values, it's a string to allow for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want any restriction on this string, or just UTF-8?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd have to be the same limits as metric names.


// Required.
Type type = 3;

string description = 4;

enum Type {
UNKNOWN = 0; // double or int valued point
GAUGE = 1; // double or int valued point

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid confusion and have backends to support both double/int for the same metric what about making the type include the type of the valued point. Something like GAUGE_DOUBLE?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this at length, this is the way we're going.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the current state or what I proposed?

COUNTER = 2; // double or int valued point.
STATE_SET = 3;
INFO = 4; // provides information about the entity being monitored
CUMULATIVE_HISTOGRAM = 5;
GAUGE_HISTOGRAM = 6;
// Summary quantiles are not recommended, since they cannot be
// aggregated.
SUMMARY = 7;
}
// Note: No bucketing is defined for a HISTOGRAM in the descriptor.
// It is an implementation detail of the backend whether it can handle
// changes in bucketing for a metric. Same for SUMMARY (quantiles are
// not defined here), and STATE_SET (the set of possible values is
// defined in each point).
}

// A name-value pair. These are used in multiple places: identifying
// timeseries, value of INFO metrics, and exemplars in Histograms.
message Label {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because label is key, value you can use Labels as map<String,String>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be ignored and marked as resolved.

// Required.
string name = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support a label description? Human readable description of this metric?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this previously, and no.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be marked as resolved.


// Required.
string value = 2;
}

message Timestamp {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using the default timestamp from proto3? "google/protobuf/timestamp.proto"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to keep self-contained

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least we can copy the definition from there and avoid any arguments about nanoseconds being int vs uint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why it is int32 in google/protobuf/timestamp.proto even though it says "Non-negative fractions of a second at nanosecond resolution."?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons that I know:

  1. This needs to be compatible with languages that do not have unsigned (like Java).
  2. It makes diff calculation easier (you do the diff initially then normalize the nanos).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think either is a problem in practice, it's not going to have values where that'll be an issue.

// Required.
int64 seconds = 1; // Seconds since the epoch. Negative values are permitted

uint32 nanoseconds = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the semantics of this when seconds is negative?

Copy link
Collaborator

@robskillington robskillington Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nanoseconds_offset perhaps is a better concept? And if its signed instead, perhaps that's more permissive and allows you to simply always just blindly add these two together:

unix_nanos = (timestamp.seconds * SECOND_NANOSECONDS) + timestamp.nanoseconds_offset

Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't quite understand. We can add even when nanoseconds is unsigned

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed we'll just use the google protobuf timestamp type here.

}

// A point in a timeseries.
message Point {
oneof value {
double double_value = 1;
int64 int_value = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, did we agree uint64 here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a little strange that double value can be negative but int value cannot be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to restrict it to unsigned? The proto doc has int64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this cannot be negative for a gauge?

HistogramValue histogram_value = 3;
StateSetValue state_set_value = 4;
InfoValue info_value = 5;
SummaryValue summary_value = 6;
}
// Required for COUNTER, SUMMARY or CUMULATIVE_HISTOGRAM type.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had agreed this is required from a model perspective.
Are you saying optional just for backward compatibility with the old text format? The proto model does not deal with that compatibility. It defines the baseline OpenMetrics model and a client that does not send start_timestamp is not OpenMetrics compliant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we had not agreed this. It's optional, as many many systems out there do not support this.

//
// The cumulative value is over
// the time interval (start_timestamp, timestamp].
// If start_timestamp is present and timestamp is absent,
// the backend assigned timestamp may be used to construct the time
// interval.
Timestamp start_timestamp = 7;

// If not specified, the timestamp will be decided by the backend.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make clear this is optional, and discouraged

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to resolve the pull/push debate before we say "discouraged". IMO the same data model suffices for push in which case timestamp should be encouraged.
For now I am fine with saying "discouraged for pull and encouraged for push".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say encouraged for push, it depends on what's on the other side. The other side can always add its own.

Timestamp timestamp = 8;
}

// Value for CUMULATIVE_HISTOGRAM or GAUGE_HISTOGRAM point.
message HistogramValue {
// Required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of these should be required

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some systems don't have them (MySQL)

double sum = 1;

// Required.
int64 count = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Required.
BucketOptions bucket_options = 3;

// `BucketOptions` describes the bucket boundaries used to create a
// histogram. The buckets can be in a linear sequence, an
// exponential sequence, or each bucket can be specified explicitly.
// `BucketOptions` does not include the number of values in each bucket.
//
// A bucket has a 0 lower bound and an inclusive upper bound for the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say "inclusive 0 lower bound" for clarity

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buckets for values <0 don't seem possible with this data model. Is that intended?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussions it is, noone could come up with a realistic use case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My usual tech use-case is Spam Assassin scores.

However, OpenMetrics is supposed to be generally applicable. Observing negative values is a very normal thing for histograms in general. It is almost zero cost to support it. Kicking the support out would be a serious blow to the applicability of the format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current handling of negative bounds by Prometheus is defined here: https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

// values that are counted for that bucket. That is, buckets are
// cumulative. The upper bound of successive buckets must be increasing.
// There is an implicit overflow bucket that extends up to +infinity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and must match the value of _count

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are just the bucket boundaries. I've added a comment later about the sum of bucket counts.

message BucketOptions {
// Exactly one of these three fields must be set.
oneof options {
// The linear bucket.
Linear linear_buckets = 1;

// The exponential buckets.
Exponential exponential_buckets = 2;

// The explicit buckets.
Explicit explicit_buckets = 3;
}

// Specifies a linear sequence of buckets that add the same constant width
// to the previous bucket (except the overflow bucket). This implies
// a constant absolute uncertainty on the specific value of a sample.
//
// There are `num_explicit_buckets` (N) buckets plus the overflow
// bucket. Bucket `i` (0 <= i < N) has upper bound : offset + (width * i).
// Note that when offset = 0, bucket 0 represents the interval [0, 0].
message Linear {
// Required.
// Must be greater than 0.
int32 num_explicit_buckets = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32 perhaps since there's no such thing as negative buckets?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Required.
// Must be greater than 0.
double width = 2;

// Required.
// Must be >= 0.
double offset = 3;
}

// Specifies an exponential sequence of buckets that have an upper bound
// that is proportional to the value of the upper bound of the previous
// bucket. This implies a constant relative uncertainty on the specific
// value of a sample.
//
// There are `num_exponential_buckets` + 1 (N) buckets plus the overflow
// bucket.
// Bucket 0 represents the interval [0, 0].
// Bucket `i` (1 <= i < N) has upper bound :
// scale * (growth_factor ^ (i - 1)).
message Exponential {
// Required.
// Must be greater than 0.
int32 num_exponential_buckets = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint32 perhaps here too since there's no such thing as negative buckets?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Required.
// Must be greater than 1.
double growth_factor = 2;

// Required.
// Must be greater than 0.
double scale = 3;
}

// Specifies a set of buckets with arbitrary upper-bounds
//
// There are `size(bounds)` (N) explicit buckets plus the overflow bucket.
// Bucket `i` (0 <= i < N) has upper bound: bounds[i].
// The `bounds` field must contain at least one element.
message Explicit {
// The values must be monotonically increasing and >= 0.
repeated double bounds = 1;
}

}

// The number of values in each bucket of the histogram, as described in
// `bucket_options`.
//
// `bucket_counts` will typically contain N + 1 values. If you supply fewer
// values, the remaining values are assumed to be 0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense, as these are cumulative. I'd suggest requiring it to be explicit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//
// The order of the values in `bucket_counts` follows the bucket numbering
// schemes described for the three bucket types.
repeated BucketCount bucket_counts = 4;
message BucketCount {
// Required.
// Count is the number of values for a bucket of the histogram.
int64 count = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// An example of a value that fell into this bucket.
Exemplar exemplar = 2;
message Exemplar {
// Required.
// The value of the example.
double value = 1;

// The timestamp of the example. Optional but SHOULD set it.
Timestamp timestamp = 2;

// Additional information about the example value // (e.g. trace id).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length limit should be mentioned

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making it consistent with "The maximum number of characters in the labels, including the curly braces, is 64 UTF-8 characters." seems messy. I'd rather define a length limit on the strings in the Label and compute the resulting limit for the text format (accounting for the quirks in the text format in the text format specification)

repeated Label label = 3;
}
}
}

// Value for STATE_SET point.
message StateSetValue {
// Each state must have a unique name. Any number of states may be enabled.
repeated State states = 1;

message State {
// Required.
bool enabled = 1;

// Required.
string name = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a regular label? Recommended key can be state_name the label value is the state then this becomes a single boolean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it'd not be a first-class type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine not having this as a first class. Probably on the gauge supporting boolean is enough

}
}

// Value for INFO point.
message InfoValue {
repeated Label info = 1;
}

// Value for SUMMARY point.
message SummaryValue {
// Sum and count are optional since some systems (e.g. DropWizard) do not
// expose them. The start_timestamp only applies to the sum and count.
// The quantiles can be reset at arbitrary unknown times.

double sum = 1;
int64 count = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64 since there can't be negative sampled values?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


repeated Quantile quantile = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be clear that this is optional

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying the quantile's are optional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

message Quantile {
// Required.
// Must be in the interval (0.0, 1.0].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fully closed, why allow 1 but not 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you calculate the 0th percentile from a sample of values? i don't think subtracting delta from the lowest value in the sample is mathematically sound.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for 100th percentile. It's either both or neither.

double quantile = 1;
double value = 2;
}
}