Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize metric labels and structure #3600

Merged
merged 1 commit into from Aug 14, 2023

Conversation

ddanielr
Copy link
Contributor

  • Switched metric names to use dot notation.
  • Refactored Scans metric to use a prefix.
  • Fixed incorrect metric type in documentation

* Switched metric names to use dot notation.
* Refactored Scans metric to use a prefix.
* Fixed incorrect metric type in documentation
@ddanielr
Copy link
Contributor Author

String METRICS_FATE_TYPE_IN_PROGRESS = METRICS_FATE_PREFIX + "ops.in_progress_by_type";
String METRICS_FATE_TOTAL_IN_PROGRESS = METRICS_FATE_PREFIX + "ops.in_progress";

The _ separator is used here because most of our metric testing is done using Graphite which has a hierarchical metric structure.
By adding metrics names using a _ separator, we can define a multi-word line as a single metric in the Graphite hierarchy.

accumulo.fate
accumulo.fate.ops
accumulo.fate.ops.in_progress
accumulo.fate.ops.in_progress_by_type

However, micrometer uses a flat metrics structure and requires dot notation for all metric names due to conversion on registry types. Metric Naming

Because of this, the metric name ends up being multiple items in the Graphite hierarchy.

accumulo.fate
accumulo.fate.ops
accumulo.fate.ops.in
accumulo.fate.ops.in.progress
accumulo.fate.ops.in.progress.by
accumulo.fate.ops.in.progress.by.type

Vs the metric creation in a flat metrics systems like Prometheus.

accumulo_fate_ops_in_progress
accumulo_fate_ops_in_progress_by_type

If we want to be fully compliant with micrometer's conversion ability, then we have to only use dot notation in our metric names.

We could add MeterFilters to our statsd sink to rewrite any metrics we want to have as a single metric.
This seems like the "quick and easy" solution, but not really the "simple" solution long term as any other metric sink would need to understand and implement these MeterFilters if they wanted their metrics to look the same.

It looks like micrometer supports some interesting conversions with labels for hierarchical systems (See micrometer's HierarchicalNameMapper) which fully rewrites the metric name.

@keith-turner
Copy link
Contributor

What are the implication of this change in a bug fix release? What would the release notes for this change look like?

@EdColeman
Copy link
Contributor

Keith asked: What are the implication of this change in a bug fix release

I feel that the "bug" is that incorrect naming / naming hierarchy could complicate or prevent metrics from being used correctly down-stream. Getting the metric correct in a 2.x release should be a priority because it is a LTM release and we expect that it will be used for quite a while.

And the approach may hinge on if metric names are considered part of the public API. Providing a mapping of prev -> new name in the release notes may be sufficient if the metrics are not a "hard" API requirement.

If there is a more strict interpretation of metrics names must consistent across bug-fix versions, the we may want to consider 2.1.2 is released as 2.2 But not sure what that implies as far as LTM designations go. We'd probably not what to maintain 2.1 and 2.2, 3.x.,,

I can see both viewpoints. Having metric names change between versions is irritating and could cause issues as monitors / alarms break or change with the version, and could make trending across versions problematic. On the other hand, having a broken metric so that I cannot monitor my system as intended and I can't get it fixed until the next major release does not seem ideal either,

@keith-turner
Copy link
Contributor

keith-turner commented Jul 12, 2023

I can see both viewpoints. Having metric names change between versions is irritating and could cause issues as monitors / alarms break or change with the version, and could make trending across versions problematic. On the other hand, having a broken metric so that I cannot monitor my system as intended and I can't get it fixed until the next major release does not seem ideal either,

I don't have an opinion about making this change in 2.1 because I don't know enough. I am curious though, want to try to understand the implications. Reading the above point and earlier points it seems like the current metric name is buggy. Is a potential problem with fixing it in 2.1 that someone may have adapted to the buggy name and still have been able to use it?

@ddanielr
Copy link
Contributor Author

I can see both viewpoints. Having metric names change between versions is irritating and could cause issues as monitors / alarms break or change with the version, and could make trending across versions problematic. On the other hand, having a broken metric so that I cannot monitor my system as intended and I can't get it fixed until the next major release does not seem ideal either,

I don't have an opinion about making this change in 2.1 because I don't know enough. I am curious though, want to try to understand the implications. Reading the above point and earlier points it seems like the current metric name is buggy. Is a potential problem with fixing it in 2.1 that someone may have adapted to the buggy name and still have been able to use it?

The problem is two-fold.

  1. If a user implements a RegistryFactory with a MeterRegistry type that does not support a _ separator, then there is a possibility that the metric registration will fail. I do not know if this is a graceful failure and would require more testing.

  2. Having a non-compliant separator means that micrometer conversions for metric names could be unpredictable.
    If we provide conversion examples in our documentation but those aren't followed for a couple of metric names then
    that could be problematic.

I think targeting 2.1 to fix the possible meterRegistry failure is in-scope of a bug fix change.

Overall I think we may need a better structure for the metric generation to help simplify management and generation of any corresponding documentation. Similarly to how the Property docs are generated.

@ctubbsii ctubbsii added this to In progress in 2.1.2 via automation Jul 19, 2023
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Metrics aren't public API, so I don't mind tweaking them, especially if they are being changed to make them stable and more capable of actually being used long-term. However, these should be called out in the release notes, in case anybody happened to be using the old names in an earlier 2.1 release.

@ctubbsii ctubbsii merged commit aaec35f into apache:2.1 Aug 14, 2023
8 checks passed
2.1.2 automation moved this from In progress to Done Aug 14, 2023
@ddanielr ddanielr deleted the standardize-label-names branch September 26, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2.1.2
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants