The current /preaggs/plan endpoint assumes DJ owns the materialization (it generates SQL, runs it, and tracks availability). There's no path for systems that already have pre-aggregated tables built by external ETL pipelines to register those tables as pre-aggs in DJ and benefit from DJ's grain resolution.
This also surfaces a gap in DJ's metric model: a metric today can be either an atomic aggregation (SUM(view_secs)) or a derived expression (SUM(view_secs) / COUNT(sessions)), but DJ treats both identically and decomposes the derived expression at a later point to manage materialization. External pre-agg registration requires distinguishing these clearer for the client, because only atomic aggregations can be mapped 1:1 to a column in an externally-built table.
Proposal
- Add
is_measure flag on metric nodes. A metric is a measure if its expression is a single aggregation call with no cross-measure arithmetic (e.g., SUM(x), COUNT(x), AVG(x) etc). This is computed on-the-fly based on the metric's expression. This flag is the primitive that makes external registration safe: only is_measure=true metrics can be mapped directly to pre-agg columns. Derived metrics are automatically satisfiable by an external agg if all their component measures are covered.
- Add an optional field
source_column to PreAggMeasure, which captures the column name in the external table.
- Add a new
POST /preaggs/register endpoint, which is for adopting externally-built tables:
POST /preaggs/register
{
"metrics": ["${prefix}view_rate"],
"dimensions": ["${prefix}page_d.page_id", "${prefix}country_d.country_id"],
"table": {
"catalog": "catalog",
"schema": "schema",
"table": "views_agg",
"valid_through_ts": 1234567890
},
"measure_columns": {
"events.view_secs_sum": "view_secs_sum",
"events.session_cnt": "session_cnt"
}
}
DJ validates:
- Every key in
measure_columns has is_measure=true
- All component measures of any derived metric in metrics are covered by
measure_columns
- Declared columns exist in the table (via catalog schema query)
On success:
- Creates the
PreAggregation record with grain + measures (with source_column set)
- Auto-sets availability pointing at the provided table
The current
/preaggs/planendpoint assumes DJ owns the materialization (it generates SQL, runs it, and tracks availability). There's no path for systems that already have pre-aggregated tables built by external ETL pipelines to register those tables as pre-aggs in DJ and benefit from DJ's grain resolution.This also surfaces a gap in DJ's metric model: a metric today can be either an atomic aggregation (
SUM(view_secs)) or a derived expression (SUM(view_secs) / COUNT(sessions)), but DJ treats both identically and decomposes the derived expression at a later point to manage materialization. External pre-agg registration requires distinguishing these clearer for the client, because only atomic aggregations can be mapped 1:1 to a column in an externally-built table.Proposal
is_measureflag on metric nodes. A metric is a measure if its expression is a single aggregation call with no cross-measure arithmetic (e.g.,SUM(x),COUNT(x),AVG(x)etc). This is computed on-the-fly based on the metric's expression. This flag is the primitive that makes external registration safe: onlyis_measure=truemetrics can be mapped directly to pre-agg columns. Derived metrics are automatically satisfiable by an external agg if all their component measures are covered.source_columntoPreAggMeasure, which captures the column name in the external table.POST /preaggs/registerendpoint, which is for adopting externally-built tables:DJ validates:
measure_columnshasis_measure=truemeasure_columnsOn success:
PreAggregationrecord with grain + measures (withsource_columnset)