From 88ea2083fe3680fc7376f23da733b17086cd7564 Mon Sep 17 00:00:00 2001 From: Brian Harrington Date: Wed, 24 Sep 2025 16:24:57 +0000 Subject: [PATCH] update naming context A bit of cleanup and clarify details about cardinality. --- docs/concepts/naming.md | 137 ++++++++++++++++++++++++++++------------ 1 file changed, 95 insertions(+), 42 deletions(-) diff --git a/docs/concepts/naming.md b/docs/concepts/naming.md index d17dd32f..a652c98c 100644 --- a/docs/concepts/naming.md +++ b/docs/concepts/naming.md @@ -2,64 +2,115 @@ 1. Names * Describe the measurement being collected + * Use short prefixes for categorization (max 2 levels) * Use camelCase - * Static - * Succinct + * Static - no dynamic content + * Succinct - avoid long names 2. Tags * Should be used for dimensional filtering - * Be careful about combinatorial explosion + * Be careful about combinatorial explosion and cardinality + * Tag combinations should be stable over time * Tag keys should be static * Use `id` to distinguish between instances -3. Use Base Units +3. Query Design + * Avoid the need for regex and expensive pattern matching + * Design for simple queries with incremental drill-down + * Support exact matches and simple filters +4. Use Base Units ## Names ### Describe the Measurement -### Use camelCase +Names should clearly describe what is being measured. A good name allows someone to understand the +metric without needing additional context. + +### Use Short Prefixes for Categorization + +Common names should use short prefixes to broadly categorize metrics, for example `ipc.server.call` +or `jvm.gc.pause`. The prefix should generally have no more than 2 levels to keep names succinct. +This is not a package hierarchy like in Java - it's simply a way to group related metrics. + +Examples of good prefixes: +* `ipc.*` for inter-process communication metrics +* `jvm.*` for Java Virtual Machine metrics +* `db.*` for database metrics -The main goal here is to promote consistency, which makes it easier for users. The choice of -style is somewhat arbitrary, but camelCase was chosen because: +The prefix provides just enough context to understand the broad category and perhaps a sub-category, +while the rest of the name specifies the actual measurement. Remember that metrics will already be +scoped by other dimensions like application name, instance, etc., so the name itself should focus +on describing the measurement rather than providing extensive context. Avoid unnecessary boiler +plate like `com.netflix.*`. -* Used by SNMP -* Used by Java -* It was commonly used at Netflix when the guideline was written +### Use camelCase + +For segments within a name, use camel case to distinguish words if needed. For example +`jvm.gc.concurrentPhaseTime`. -The exception to this rule is where there is an established common case. For example, with -Amazon regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more -common form. +The exception to this rule is where there is an established common case. For example, with Amazon +regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more common form. ### Static -There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric -names and tag keys are how users interact with the data, and dynamic values make them difficult -to use. Dynamic information is better suited for tag values, such as `nf.app` or `status`. +There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric names +and tag keys are how users interact with the data, and dynamic values make them difficult to use. +Dynamic information is better suited for tag values. ### Succinct -Long names should be avoided. In many cases, long names are the result of combining many pieces -of information together into a single string. In this case, consider either discarding information -that is not useful or encoding the information in tag values. +Long names should be avoided. In many cases, long names are the result of combining many pieces of +information together into a single string. In this case, consider either discarding information +that is not useful or encoding the information in tag values. Shorter names are easier to read, +type, and view when working with the data. ## Tags -Historically, tags have been used to play one of two roles: - -* **Dimensions.** This is the primary use of tags and this feature allows the data to be filtered -into subsets by values of interest. -* **Namespace.** Similar to packages in Java, this allows grouping related data. This type of usage -is discouraged. +Tags should be used for dimensional filtering - they allow data to be filtered into subsets by +values of interest. Using tags as a namespace mechanism is discouraged. As a general rule, it should be possible to use the name as a pivot. If only the name is selected, then the user should be able to use other dimensions to filter the data and successfully reason -about the value being shown. +about the aggregate value being shown. + +### Cardinality Considerations + +**Keep combinatorial complexity in mind.** The full combination of tags creates unique time series, +and each combination consumes storage and processing resources. Tag combinations should be stable +over time to avoid constantly creating new time series. + +Consider the cardinality impact: +* A metric with 3 tag keys, each with 10 possible values = 1,000 potential time series +* A metric with 5 tag keys, each with 10 possible values = 100,000 potential time series + +Guidelines for managing cardinality: +* **Limit high-cardinality dimensions.** Avoid tags with unbounded or very large value sets +* **Use stable identifiers.** Tag values should remain consistent over time + +### Design for Simple Queries + +**Avoid regex and expensive pattern matching.** Design metric names and tag structures so they can +be queried simply and allow users to incrementally drill into the data. This improves both query +performance and user experience. + +Good query patterns: +* `name,threadpool.size,:eq` - exact match on name +* `name,threadpool.size,:eq,id,server-requests,:eq,:and` - add exact tag filter +* `name,threadpool.*,:re` - simple prefix pattern (use sparingly) + +Avoid patterns that require expensive operations: +* Complex regex patterns that must scan many metric names +* Queries that require examining all tag combinations to find matches +* Dynamic name construction that makes direct queries impossible + +Design principle: Users should be able to start with a broad query and progressively add filters +to narrow down to the specific data they need. As a concrete example, suppose we have two metrics: 1. The number of threads currently in a thread pool. 2. The number of rows in a database table. -### Discouraged Approach +#### Discouraged Approach ```java Id poolSize = registry.createId("size") @@ -68,30 +119,32 @@ Id poolSize = registry.createId("size") Id poolSize = registry.createId("size") .withTag("class", "Database") - .withTag("table", "users"); + .withTag("table", "users"); ``` In this approach, if you select the name `size`, then it will match both the `ThreadPool` and -`Database` classes. This results in a value that is the an aggregate of the number of threads -and the number of items in a database, which has no meaning. +`Database` classes. This results in a value that is an aggregate of the number of threads and the +number of items in a database, which has no meaning. -### Recommended Approach +#### Recommended Approach ```java Id poolSize = registry.createId("threadpool.size") .withTag("id", "server-requests"); Id poolSize = registry.createId("db.size") - .withTag("table", "users"); + .withTag("table", "users"); ``` -This variation provides enough context, so that if just the name is selected, the value can be -reasoned about and is at least potentially meaningful. +This variation provides enough context in the name so that the meaning is more apparent and you can +successfully reason about the values. For example, if you select `threadpool.size`, then you can +see the total number of threads in all pools. You can then group by or select an `id` to further +filter the data to a subset in which you have an interest. -This variation provides enough context in the name so that the meaning is more apparent and you -can successfully reason about the values. For example, if you select `threadpool.size`, then you -can see the total number of threads in all pools. You can then group by or select an `id` to -further filter the data to a subset in which you have an interest. +This approach also supports simple queries without regex patterns: +* `name,threadpool.size,:eq` gives you all thread pool sizes +* `name,db.size,:eq` gives you all database sizes +* `name,threadpool.size,:eq,id,server-requests,:eq,:and` drills down to a specific pool ## Use Base Units @@ -105,11 +158,11 @@ have an obvious meaning, such as: * `1 k` meaning `1 kilobyte`, as opposed to `1 kilo-megabyte`, for disk sizes. * `1 M` meaning `1 megabyte/second`, as opposed to `1 mega-kilobyte`, for network rates. -Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report -the magnitude of values, while keeping them within the view window. +Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report the +magnitude of values, while keeping them within the view window. -Some meters in some clients, such as [Java Timers], will automatically constrain values to base -units in their implementations. +Some meters in some clients, such as [Java Timers], will automatically constrain values to base units +in their implementations. [tick labels]: ../api/graph/tick.md [Java Timers]: ../spectator/lang/java/meters/timer.md#units