Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 95 additions & 42 deletions docs/concepts/naming.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,64 +2,115 @@

1. Names
* Describe the measurement being collected
* Use short prefixes for categorization (max 2 levels)
* Use camelCase
* Static
* Succinct
* Static - no dynamic content
* Succinct - avoid long names
2. Tags
* Should be used for dimensional filtering
* Be careful about combinatorial explosion
* Be careful about combinatorial explosion and cardinality
* Tag combinations should be stable over time
* Tag keys should be static
* Use `id` to distinguish between instances
3. Use Base Units
3. Query Design
* Avoid the need for regex and expensive pattern matching
* Design for simple queries with incremental drill-down
* Support exact matches and simple filters
4. Use Base Units

## Names

### Describe the Measurement

### Use camelCase
Names should clearly describe what is being measured. A good name allows someone to understand the
metric without needing additional context.

### Use Short Prefixes for Categorization

Common names should use short prefixes to broadly categorize metrics, for example `ipc.server.call`
or `jvm.gc.pause`. The prefix should generally have no more than 2 levels to keep names succinct.
This is not a package hierarchy like in Java - it's simply a way to group related metrics.

Examples of good prefixes:
* `ipc.*` for inter-process communication metrics
* `jvm.*` for Java Virtual Machine metrics
* `db.*` for database metrics

The main goal here is to promote consistency, which makes it easier for users. The choice of
style is somewhat arbitrary, but camelCase was chosen because:
The prefix provides just enough context to understand the broad category and perhaps a sub-category,
while the rest of the name specifies the actual measurement. Remember that metrics will already be
scoped by other dimensions like application name, instance, etc., so the name itself should focus
on describing the measurement rather than providing extensive context. Avoid unnecessary boiler
plate like `com.netflix.*`.

* Used by SNMP
* Used by Java
* It was commonly used at Netflix when the guideline was written
### Use camelCase

For segments within a name, use camel case to distinguish words if needed. For example
`jvm.gc.concurrentPhaseTime`.

The exception to this rule is where there is an established common case. For example, with
Amazon regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more
common form.
The exception to this rule is where there is an established common case. For example, with Amazon
regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more common form.

### Static

There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric
names and tag keys are how users interact with the data, and dynamic values make them difficult
to use. Dynamic information is better suited for tag values, such as `nf.app` or `status`.
There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric names
and tag keys are how users interact with the data, and dynamic values make them difficult to use.
Dynamic information is better suited for tag values.

### Succinct

Long names should be avoided. In many cases, long names are the result of combining many pieces
of information together into a single string. In this case, consider either discarding information
that is not useful or encoding the information in tag values.
Long names should be avoided. In many cases, long names are the result of combining many pieces of
information together into a single string. In this case, consider either discarding information
that is not useful or encoding the information in tag values. Shorter names are easier to read,
type, and view when working with the data.

## Tags

Historically, tags have been used to play one of two roles:

* **Dimensions.** This is the primary use of tags and this feature allows the data to be filtered
into subsets by values of interest.
* **Namespace.** Similar to packages in Java, this allows grouping related data. This type of usage
is discouraged.
Tags should be used for dimensional filtering - they allow data to be filtered into subsets by
values of interest. Using tags as a namespace mechanism is discouraged.

As a general rule, it should be possible to use the name as a pivot. If only the name is selected,
then the user should be able to use other dimensions to filter the data and successfully reason
about the value being shown.
about the aggregate value being shown.

### Cardinality Considerations

**Keep combinatorial complexity in mind.** The full combination of tags creates unique time series,
and each combination consumes storage and processing resources. Tag combinations should be stable
over time to avoid constantly creating new time series.

Consider the cardinality impact:
* A metric with 3 tag keys, each with 10 possible values = 1,000 potential time series
* A metric with 5 tag keys, each with 10 possible values = 100,000 potential time series

Guidelines for managing cardinality:
* **Limit high-cardinality dimensions.** Avoid tags with unbounded or very large value sets
* **Use stable identifiers.** Tag values should remain consistent over time

### Design for Simple Queries

**Avoid regex and expensive pattern matching.** Design metric names and tag structures so they can
be queried simply and allow users to incrementally drill into the data. This improves both query
performance and user experience.

Good query patterns:
* `name,threadpool.size,:eq` - exact match on name
* `name,threadpool.size,:eq,id,server-requests,:eq,:and` - add exact tag filter
* `name,threadpool.*,:re` - simple prefix pattern (use sparingly)

Avoid patterns that require expensive operations:
* Complex regex patterns that must scan many metric names
* Queries that require examining all tag combinations to find matches
* Dynamic name construction that makes direct queries impossible

Design principle: Users should be able to start with a broad query and progressively add filters
to narrow down to the specific data they need.

As a concrete example, suppose we have two metrics:

1. The number of threads currently in a thread pool.
2. The number of rows in a database table.

### Discouraged Approach
#### Discouraged Approach

```java
Id poolSize = registry.createId("size")
Expand All @@ -68,30 +119,32 @@ Id poolSize = registry.createId("size")

Id poolSize = registry.createId("size")
.withTag("class", "Database")
.withTag("table", "users");
.withTag("table", "users");
```

In this approach, if you select the name `size`, then it will match both the `ThreadPool` and
`Database` classes. This results in a value that is the an aggregate of the number of threads
and the number of items in a database, which has no meaning.
`Database` classes. This results in a value that is an aggregate of the number of threads and the
number of items in a database, which has no meaning.

### Recommended Approach
#### Recommended Approach

```java
Id poolSize = registry.createId("threadpool.size")
.withTag("id", "server-requests");

Id poolSize = registry.createId("db.size")
.withTag("table", "users");
.withTag("table", "users");
```

This variation provides enough context, so that if just the name is selected, the value can be
reasoned about and is at least potentially meaningful.
This variation provides enough context in the name so that the meaning is more apparent and you can
successfully reason about the values. For example, if you select `threadpool.size`, then you can
see the total number of threads in all pools. You can then group by or select an `id` to further
filter the data to a subset in which you have an interest.

This variation provides enough context in the name so that the meaning is more apparent and you
can successfully reason about the values. For example, if you select `threadpool.size`, then you
can see the total number of threads in all pools. You can then group by or select an `id` to
further filter the data to a subset in which you have an interest.
This approach also supports simple queries without regex patterns:
* `name,threadpool.size,:eq` gives you all thread pool sizes
* `name,db.size,:eq` gives you all database sizes
* `name,threadpool.size,:eq,id,server-requests,:eq,:and` drills down to a specific pool

## Use Base Units

Expand All @@ -105,11 +158,11 @@ have an obvious meaning, such as:
* `1 k` meaning `1 kilobyte`, as opposed to `1 kilo-megabyte`, for disk sizes.
* `1 M` meaning `1 megabyte/second`, as opposed to `1 mega-kilobyte`, for network rates.

Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report
the magnitude of values, while keeping them within the view window.
Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report the
magnitude of values, while keeping them within the view window.

Some meters in some clients, such as [Java Timers], will automatically constrain values to base
units in their implementations.
Some meters in some clients, such as [Java Timers], will automatically constrain values to base units
in their implementations.

[tick labels]: ../api/graph/tick.md
[Java Timers]: ../spectator/lang/java/meters/timer.md#units