New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require consumers to normalize floats represented as strings #129
Comments
For context: This issue became more relevant as I'm laying out the roadmap for the Prometheus Go client and was thinking about when and how to introduce potentially necessary breaking changes for OpenMetrics adoption. The fallout of the change in float formatting has already proven to be way more serious than anticipated. See prometheus/common#168 for illustration. By now, I have received personal communication from another large scale Prometheus user who was affected by the same issue, with similar amount of confusion and lost productivity due to troubleshooting. |
Any thoughts on this? I would really appreciate any idea how the OpenMetrics folks think about this so that I can act accordingly for the next steps in the Prometheus Go exposition library. |
Sorry for taking so long. We talked about this in our bi-weekly call and this is the relevant part:
|
I don't quite get the frame of reference:
Is this "consumers MUST normalize floats" or "producers MUST use a canonical text format for floats"? |
It's "producers MUST use a canonical text format for floats" for le/quantile label. |
Oops. I guess you have discussed all of that at length, and this issue is probably not the right place to shout from the sidelines. Still I want to provide for the record that a good part of the success story of the Prometheus text format has been that it's easy to produce. There has been a worrying tendency in OpenMetrics to favor ease of consumption over ease of production. But now things get really tough, as you are essentially requiring each implementation of a producer to include a float formatter that adheres exactly to a (still to be formulated) standard. |
We are still discussing how lenient a parser can be in relation to that. |
The underlying problem is that as we gain traction outside of cloud native, aka modern tech, implementors will become less good and more confused. Long-term, I am more worried about fragmentation and people doing things wrong over immediate adoption. We can always relax requirements later, but never tighten them. |
If I'm not mistaken, OpenMetrics is about production and consumption. In this particular case, tightening the requirements for producers means relaxing the requirements for consumers. I filed this issue because I think it's the most reasonable way here to tighten the requirements for consumers (normalize float values even if they happen to be represented as strings) rather than for producers (implement a custom float formatter to create standardized string representation of floats). Note that Prometheus 1 was fulfilling the requirement for consumers, and then Prometheus 2 stopped doing so, which wasn't really an informed decision but more or less an accident. On the other hand, no existing text format producer has ever included a custom float formatter. (It is as of now impossible anyway to produce a conforming formatter as OpenMetrics hasn't yet provided a complete spec about how to format floats. Which will be quite an effort.) |
Or in other words: There is already collective experience with normalizing floats, so tightening the requirements for consumers is a proven concept and also defuses a problematic design decision (representing floats as strings in the exposition format). On the other hand, tightening the requirements for producers has no precedent and makes a problematic design decision (representing floats as strings in the exposition format) even worse (representing floats as strings using a delicately (to be) defined canonical format). |
To reiterate about “Go's Besides the general badness of taking the behavior of a particular implementation as the spec (rather than actually specifying the behavior), I don't think we can rely on that being stable for our purposes. Both the documentation for the
In my understanding, any implementation of the Note that in (1) the switching point between %e and %f is not precisely defined (“large” is apparently “greater than 5 or smaller than -4” in practice, which is hardcoded in strconv/ftoa.go with no hint in the code that this has been standardized anywhere). Item (2) is much harder to tackle, though. Example (see code):
Looking at the code in strconv/ftoa.go how the exact representation is created, you can see that it tries the relatively recent Grisu3 algorithm first and falls back to a more expensive algorithm if Grisu3 is unsure. Grisu3 is younger than Go itself (although Go1.0 was released after the Grisu3 paper), so I'm fairly sure that the exact algorithm used is not part of the Go stability guarantees. Even ignoring Go stability guarantees, and coming back to the actual issue at hand here: If OpenMetrics included an exact specification how a float64 has to be represented unambiguously by a unique string, you would need to include the exact algorithm how to do that, and you would need to guarantee somehow that it yields the same results on all hardware platforms (which, I assume, was never a concern of those coming up with efficient algorithms for formatting floating point numbers, as the goal was to create any one of the shortest strings that would parse back into the exact same float but not a particular one selected reproducibly among several possible shortest strings). Finally, even assuming that you can, in fact, specify an algorithm that would guarantee a unique string representation on all hardware platforms, it is unlikely that this algorithm will be very efficient. As the generation of the text format involves a lot of float formatting, this would be a significant added cost to generating the format (not to speak about the significant cost of implementing said algorithm in each language supporting OpenMetrics). My apologies for the long comment. I wished my previous comments had been enough to make my point, but apparently they weren't. I hope it is now overwhelmingly obvious that requiring a unique test representation for floats is opening a can of worms and that it is much saner to require consumers to normalize float values so that different string representations of the very same float64 will be treated the same. |
Grisu3 (with fallback) is that algorithm, and what everything bar Python uses from my research thus far. Python produces the same result for your example input. |
Grisu3 advertises to be “correct: any printed number will evaluate to the same number, when read again”. That's not equivalent to “any given float64 will always result in the same printed number”. I don't know where you gain confidence that that's the case in general and on different hardware in particular. But even if Grisu3 had these properties (which needs to be shown), the fallback algorithm also needs to have the same properties. And even if that's all the case, most OpenMetrics needed to come with its own Grisu3 plus fallback implementation as there will be hardly any language where the standard library guarantees to always use Grisu3 and that exact falllback algorithm. |
Per https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md#considerations-canonical-numbers producers SHOULD produce canonical numbers for a small set of common values, and it is encouraged for other values. The format does not specify what consumers should do with these values, beyond not rejecting non-canonical numbers other than +Inf for Histogram buckets. |
If I understood correctly, the text representation of OpenMetrics will inherit Prometheus's “type dissonance” of representing the floating values for bucket boundaries of a histogram and for φ values of pre-calculated quantiles in summaries as strings. Concretely, the former is represented as an
le
label, the latter as aquantile
label, and as per #56, all label values are strings.Historical note: When the Prometheus text format was designed, this “type dissonance” was considered a pragmatic solution in cases where you could not use the “proper” solution of using the protobuf format (where neither bucket boundaries nor φ values are labels in the first place). As we know, the roles of text vs protobuf changed over time. But back then, there was certainly no intention to promote the somewhat dirty pragmatic solution to an industry standard.
Not having taken part in the discussion so far (for reasons…), it is not my intention to doubt the decisions made. However, I would like to suggest a mean to limit the damage: Please consider the requirement for consumers to normalize float values represented as strings.
For example, the wide-spread Prometheus Go library creates the following summary for the Go garbage collector:
Depending on preferences how to format floats as strings,
{quantile="1"}
could be rendered as{quantile="+1"}
{quantile="1."}
{quantile="1.0"}
{quantile="1e0"}
{quantile="1E0"}
{quantile="0.5"}
could be rendered as:{quantile="+0.5"}
{quantile=".5"}
{quantile="0.50"}
{quantile="5e-1"}
{quantile="5E-1"}
Similar considerations apply to the
le
label of histogram buckets. The practical impact is perhaps even more severe, as the numerical value is actually needed as such to calculate quantiles from an histogram (see recent Prometheus code changes to work around the resulting problems). For example,{le="10000"}
could reasonably be written as:{le="10000"}
{le="10000."}
{le="10000.0"}
{le="1e4"}
{le="1E4"}
If the label is naively ingested in the same way as other labels, each of the above would create a different time series. My suggestion implies that consumers interpret those special labels as the string-rendering of a float and evaluate the actual float value during ingestion, which they then save in their preferred form (which could be a string again, like in the case of Prometheus, but it would be normalized to the preferred string formatting of the ingester).
The alternative would be to precisely specify how to format floats as strings as part of the OpenMetrics standard, which would include a whole bunch of corner cases and is hard to get complete and right and would also put a huge burden on any implementations generating the format as they cannot just use the preferred formatter of their platform.
Another historical note: Prometheus 1.x was doing exactly the normalization suggested here. Prometheus 2.x paradoxically dropped both the normalization and our excuse to represent floats as strings, namely the option to use protobuf and thus avoid the problematic representation altogether. The decision to do so was not discussed widely, pushed through by very few developers in times where no formalized decision making process was established yet. Prometheus is not a good role model here.
The text was updated successfully, but these errors were encountered: