From 88ea2083fe3680fc7376f23da733b17086cd7564 Mon Sep 17 00:00:00 2001
From: Brian Harrington <brharrington@netflix.com>
Date: Wed, 24 Sep 2025 16:24:57 +0000
Subject: [PATCH] update naming context

A bit of cleanup and clarify details about cardinality.
---
 docs/concepts/naming.md | 137 ++++++++++++++++++++++++++++------------
 1 file changed, 95 insertions(+), 42 deletions(-)

diff --git a/docs/concepts/naming.md b/docs/concepts/naming.md
index d17dd32f..a652c98c 100644
--- a/docs/concepts/naming.md
+++ b/docs/concepts/naming.md
@@ -2,64 +2,115 @@
 
 1. Names
     * Describe the measurement being collected
+    * Use short prefixes for categorization (max 2 levels)
     * Use camelCase
-    * Static
-    * Succinct
+    * Static - no dynamic content
+    * Succinct - avoid long names
 2. Tags
     * Should be used for dimensional filtering
-    * Be careful about combinatorial explosion
+    * Be careful about combinatorial explosion and cardinality
+    * Tag combinations should be stable over time
     * Tag keys should be static
     * Use `id` to distinguish between instances
-3. Use Base Units
+3. Query Design
+    * Avoid the need for regex and expensive pattern matching
+    * Design for simple queries with incremental drill-down
+    * Support exact matches and simple filters
+4. Use Base Units
 
 ## Names
 
 ### Describe the Measurement
 
-### Use camelCase
+Names should clearly describe what is being measured. A good name allows someone to understand the
+metric without needing additional context.
+
+### Use Short Prefixes for Categorization
+
+Common names should use short prefixes to broadly categorize metrics, for example `ipc.server.call`
+or `jvm.gc.pause`. The prefix should generally have no more than 2 levels to keep names succinct.
+This is not a package hierarchy like in Java - it's simply a way to group related metrics.
+
+Examples of good prefixes:
+* `ipc.*` for inter-process communication metrics
+* `jvm.*` for Java Virtual Machine metrics
+* `db.*` for database metrics
 
-The main goal here is to promote consistency, which makes it easier for users. The choice of
-style is somewhat arbitrary, but camelCase was chosen because:
+The prefix provides just enough context to understand the broad category and perhaps a sub-category,
+while the rest of the name specifies the actual measurement. Remember that metrics will already be
+scoped by other dimensions like application name, instance, etc., so the name itself should focus
+on describing the measurement rather than providing extensive context. Avoid unnecessary boiler
+plate like `com.netflix.*`.
 
-* Used by SNMP
-* Used by Java
-* It was commonly used at Netflix when the guideline was written
+### Use camelCase
+
+For segments within a name, use camel case to distinguish words if needed. For example
+`jvm.gc.concurrentPhaseTime`.
 
-The exception to this rule is where there is an established common case. For example, with
-Amazon regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more
-common form.
+The exception to this rule is where there is an established common case. For example, with Amazon
+regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more common form.
 
 ### Static
 
-There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric
-names and tag keys are how users interact with the data, and dynamic values make them difficult
-to use. Dynamic information is better suited for tag values, such as `nf.app` or `status`. 
+There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric names
+and tag keys are how users interact with the data, and dynamic values make them difficult to use.
+Dynamic information is better suited for tag values.
 
 ### Succinct
 
-Long names should be avoided. In many cases, long names are the result of combining many pieces
-of information together into a single string. In this case, consider either discarding information
-that is not useful or encoding the information in tag values.  
+Long names should be avoided. In many cases, long names are the result of combining many pieces of
+information together into a single string. In this case, consider either discarding information
+that is not useful or encoding the information in tag values. Shorter names are easier to read,
+type, and view when working with the data.
 
 ## Tags
 
-Historically, tags have been used to play one of two roles:
-
-* **Dimensions.** This is the primary use of tags and this feature allows the data to be filtered
-into subsets by values of interest.
-* **Namespace.** Similar to packages in Java, this allows grouping related data. This type of usage
-is discouraged.
+Tags should be used for dimensional filtering - they allow data to be filtered into subsets by
+values of interest. Using tags as a namespace mechanism is discouraged.
 
 As a general rule, it should be possible to use the name as a pivot. If only the name is selected,
 then the user should be able to use other dimensions to filter the data and successfully reason
-about the value being shown. 
+about the aggregate value being shown.
+
+### Cardinality Considerations
+
+**Keep combinatorial complexity in mind.** The full combination of tags creates unique time series,
+and each combination consumes storage and processing resources. Tag combinations should be stable
+over time to avoid constantly creating new time series.
+
+Consider the cardinality impact:
+* A metric with 3 tag keys, each with 10 possible values = 1,000 potential time series
+* A metric with 5 tag keys, each with 10 possible values = 100,000 potential time series
+
+Guidelines for managing cardinality:
+* **Limit high-cardinality dimensions.** Avoid tags with unbounded or very large value sets
+* **Use stable identifiers.** Tag values should remain consistent over time
+
+### Design for Simple Queries
+
+**Avoid regex and expensive pattern matching.** Design metric names and tag structures so they can
+be queried simply and allow users to incrementally drill into the data. This improves both query
+performance and user experience.
+
+Good query patterns:
+* `name,threadpool.size,:eq` - exact match on name
+* `name,threadpool.size,:eq,id,server-requests,:eq,:and` - add exact tag filter
+* `name,threadpool.*,:re` - simple prefix pattern (use sparingly)
+
+Avoid patterns that require expensive operations:
+* Complex regex patterns that must scan many metric names
+* Queries that require examining all tag combinations to find matches
+* Dynamic name construction that makes direct queries impossible
+
+Design principle: Users should be able to start with a broad query and progressively add filters
+to narrow down to the specific data they need.
 
 As a concrete example, suppose we have two metrics:
 
 1. The number of threads currently in a thread pool.
 2. The number of rows in a database table.
 
-### Discouraged Approach
+#### Discouraged Approach
 
 ```java
 Id poolSize = registry.createId("size")
@@ -68,30 +119,32 @@ Id poolSize = registry.createId("size")
   
 Id poolSize = registry.createId("size")
   .withTag("class", "Database")
-  .withTag("table", "users");  
+  .withTag("table", "users");
 ```
 
 In this approach, if you select the name `size`, then it will match both the `ThreadPool` and
-`Database` classes. This results in a value that is the an aggregate of the number of threads
-and the number of items in a database, which has no meaning. 
+`Database` classes. This results in a value that is an aggregate of the number of threads and the
+number of items in a database, which has no meaning.
 
-### Recommended Approach
+#### Recommended Approach
 
 ```java
 Id poolSize = registry.createId("threadpool.size")
   .withTag("id", "server-requests");
   
 Id poolSize = registry.createId("db.size")
-  .withTag("table", "users");  
+  .withTag("table", "users");
 ```
 
-This variation provides enough context, so that if just the name is selected, the value can be
-reasoned about and is at least potentially meaningful.
+This variation provides enough context in the name so that the meaning is more apparent and you can
+successfully reason about the values. For example, if you select `threadpool.size`, then you can
+see the total number of threads in all pools. You can then group by or select an `id` to further
+filter the data to a subset in which you have an interest.
 
-This variation provides enough context in the name so that the meaning is more apparent and you
-can successfully reason about the values. For example, if you select `threadpool.size`, then you
-can see the total number of threads in all pools. You can then group by or select an `id` to
-further filter the data to a subset in which you have an interest.
+This approach also supports simple queries without regex patterns:
+* `name,threadpool.size,:eq` gives you all thread pool sizes
+* `name,db.size,:eq` gives you all database sizes
+* `name,threadpool.size,:eq,id,server-requests,:eq,:and` drills down to a specific pool
 
 ## Use Base Units
 
@@ -105,11 +158,11 @@ have an obvious meaning, such as:
 * `1 k` meaning `1 kilobyte`, as opposed to `1 kilo-megabyte`, for disk sizes.
 * `1 M` meaning `1 megabyte/second`, as opposed to `1 mega-kilobyte`, for network rates.
 
-Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report
-the magnitude of values, while keeping them within the view window.
+Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report the
+magnitude of values, while keeping them within the view window.
 
-Some meters in some clients, such as [Java Timers], will automatically constrain values to base
-units in their implementations.
+Some meters in some clients, such as [Java Timers], will automatically constrain values to base units
+in their implementations.
 
 [tick labels]: ../api/graph/tick.md
 [Java Timers]: ../spectator/lang/java/meters/timer.md#units