Problem
DJ's unit metadata has two structural issues that have been workable so far but increasingly limit what users can express and what DJ can reason about.
- Units live in the wrong place (today they're on
metricmetadata, joined to a noderevision via metric_metadata_id). This means that only metric nodes have unit metadata, so a transform output col like clicks_per_hour or a dimension output column like bytes_received cannot declare its unit, even though "this value is denominated in USD / hours / bytes" is a property of the column and useful to have.
- The unit model can't express common shapes and doesn't generalize well:
- Currencies don't generalize. Today the only currency value in production is
DOLLAR, but adding additional currencies will each require a new enum member, a new migration, and a new release.
- Compound units have no representation. Rate-shaped quantities like CTR (clicks/impressions), QPS (queries/second), throughput etc cannot be represented cleanly. Compound units would either need their own enum members per combination (combinatorial explosion) or string-encoded structure (
clicks_per_second), neither of which is ideal.
- Sentinels conflate "no unit set" with "unitless."
Goal
Move unit metadata to the right level (column) and the right shape (structured, supports compound units), without breaking any existing YAML or API consumers.
- Any column on any node type can declare a unit.
- A metric's unit is just the unit of its output column.
- Atomic units carry kind + optional code; the code validation rules are kind-specific (ISO 4217 regex for currency, closed sets for time / data size, free-form for count).
- Compound units (numerator / denominator) are first-class.
- Existing YAML using
metric_metadata.unit: <flat string> keeps working forever; no user needs to migrate their files.
- API clients reading
metric.metric_metadata.unit keep working; values are derived from the canonical column.unit and returned in the legacy flat-enum shape when expressible. Values that the legacy shape can't represent (non-USD currencies, compound units, custom count labels) come back as null on the legacy field.
Proposed model
class UnitKind(str, Enum):
CURRENCY = "currency" # code: ISO 4217 (^[A-Z]{3}$)
TIME = "time" # code in {ms, s, min, h, d, wk, mo, yr}
DATA_SIZE = "data_size" # code in {B, KB, MB, GB, TB, PB, KiB, MiB,
GiB, TiB}
PERCENTAGE = "percentage" # no code (dimensionless, displayed 0–100)
PROPORTION = "proportion" # no code (dimensionless, displayed 0–1)
COUNT = "count" # free-form code (clicks, impressions, ...)
UNITLESS = "unitless" # no code; "explicitly no unit"
class AtomicUnit(BaseModel):
kind: UnitKind
code: str | None = None # validated per kind
class CompoundUnit(BaseModel):
numerator: AtomicUnit
denominator: AtomicUnit
Unit = AtomicUnit | CompoundUnit
# Discriminator at parse time: presence of `numerator` → CompoundUnit
This is stored as JSONB on column.unit.
Examples
unit: {kind: currency, code: USD} # atomic
unit: {kind: percentage} # atomic, no code
unit: {kind: count, code: clicks} # atomic, free-form code
unit: {kind: data_size, code: GB} # atomic
unit: {numerator: {kind: count, code: clicks},
denominator: {kind: count, code: impressions}} # CTR
unit: {numerator: {kind: count}, denominator: {kind: time, code: s}} # QPS
Problem
DJ's unit metadata has two structural issues that have been workable so far but increasingly limit what users can express and what DJ can reason about.
metricmetadata, joined to anoderevisionviametric_metadata_id). This means that only metric nodes have unit metadata, so a transform output col likeclicks_per_houror a dimension output column likebytes_receivedcannot declare its unit, even though "this value is denominated in USD / hours / bytes" is a property of the column and useful to have.DOLLAR, but adding additional currencies will each require a new enum member, a new migration, and a new release.clicks_per_second), neither of which is ideal.Goal
Move unit metadata to the right level (column) and the right shape (structured, supports compound units), without breaking any existing YAML or API consumers.
metric_metadata.unit: <flat string>keeps working forever; no user needs to migrate their files.metric.metric_metadata.unitkeep working; values are derived from the canonicalcolumn.unitand returned in the legacy flat-enum shape when expressible. Values that the legacy shape can't represent (non-USD currencies, compound units, custom count labels) come back as null on the legacy field.Proposed model
This is stored as
JSONBoncolumn.unit.Examples