Skip to content

Migrate unit metadata from metric to column level, with a richer unit model #2150

@shangyian

Description

@shangyian

Problem

DJ's unit metadata has two structural issues that have been workable so far but increasingly limit what users can express and what DJ can reason about.

  1. Units live in the wrong place (today they're on metricmetadata, joined to a noderevision via metric_metadata_id). This means that only metric nodes have unit metadata, so a transform output col like clicks_per_hour or a dimension output column like bytes_received cannot declare its unit, even though "this value is denominated in USD / hours / bytes" is a property of the column and useful to have.
  2. The unit model can't express common shapes and doesn't generalize well:
    • Currencies don't generalize. Today the only currency value in production is DOLLAR, but adding additional currencies will each require a new enum member, a new migration, and a new release.
    • Compound units have no representation. Rate-shaped quantities like CTR (clicks/impressions), QPS (queries/second), throughput etc cannot be represented cleanly. Compound units would either need their own enum members per combination (combinatorial explosion) or string-encoded structure (clicks_per_second), neither of which is ideal.
    • Sentinels conflate "no unit set" with "unitless."

Goal

Move unit metadata to the right level (column) and the right shape (structured, supports compound units), without breaking any existing YAML or API consumers.

  • Any column on any node type can declare a unit.
  • A metric's unit is just the unit of its output column.
  • Atomic units carry kind + optional code; the code validation rules are kind-specific (ISO 4217 regex for currency, closed sets for time / data size, free-form for count).
  • Compound units (numerator / denominator) are first-class.
  • Existing YAML using metric_metadata.unit: <flat string> keeps working forever; no user needs to migrate their files.
  • API clients reading metric.metric_metadata.unit keep working; values are derived from the canonical column.unit and returned in the legacy flat-enum shape when expressible. Values that the legacy shape can't represent (non-USD currencies, compound units, custom count labels) come back as null on the legacy field.

Proposed model

  class UnitKind(str, Enum):
      CURRENCY = "currency"      # code: ISO 4217 (^[A-Z]{3}$)
      TIME = "time"              # code in {ms, s, min, h, d, wk, mo, yr}
      DATA_SIZE = "data_size"    # code in {B, KB, MB, GB, TB, PB, KiB, MiB,
   GiB, TiB}
      PERCENTAGE = "percentage"  # no code (dimensionless, displayed 0–100)
      PROPORTION = "proportion"  # no code (dimensionless, displayed 0–1)
      COUNT = "count"            # free-form code (clicks, impressions, ...)
      UNITLESS = "unitless"      # no code; "explicitly no unit"

  class AtomicUnit(BaseModel):
      kind: UnitKind
      code: str | None = None  # validated per kind

  class CompoundUnit(BaseModel):
      numerator: AtomicUnit
      denominator: AtomicUnit

  Unit = AtomicUnit | CompoundUnit
  # Discriminator at parse time: presence of `numerator` → CompoundUnit

This is stored as JSONB on column.unit.

Examples

unit: {kind: currency, code: USD}                                       # atomic
unit: {kind: percentage}                                                # atomic, no code
unit: {kind: count, code: clicks}                                       # atomic, free-form code
unit: {kind: data_size, code: GB}                                       # atomic
unit: {numerator: {kind: count, code: clicks},
       denominator: {kind: count, code: impressions}}                   # CTR
unit: {numerator: {kind: count}, denominator: {kind: time, code: s}}    # QPS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions