fix(mixed-timeseries): preserve all-NaN metric columns after pivot when Jinja evaluates to NULL#40004
fix(mixed-timeseries): preserve all-NaN metric columns after pivot when Jinja evaluates to NULL#40004aminghadersohi wants to merge 1 commit into
Conversation
…en Jinja evaluates to NULL
When a Jinja conditional metric expression (e.g. using filter_values()) evaluates
to null/NULL because no dashboard filter is selected, the SQL query returns an
all-NULL column. pandas pivot_table with dropna=True (controlled by
drop_missing_columns, default True) then silently drops that column.
Downstream post-processing steps like rename and rolling use validate_column_args
to assert referenced columns exist before executing. Because the metric column was
dropped, they raise InvalidPostProcessingError("Referenced columns not available
in DataFrame"), which surfaces as an error on mixed-timeseries charts.
Fix: introduce _restore_dropped_metric_columns() which re-adds any expected metric
columns that pivot_table dropped due to all-NaN values. This keeps the DataFrame
schema consistent with what the frontend and subsequent post-processing expect,
while still allowing sparse category combinations to be dropped.
Fixes SC-100398
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review Agent Run #fd63b5Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40004 +/- ##
==========================================
- Coverage 63.83% 63.83% -0.01%
==========================================
Files 2589 2589
Lines 137821 137837 +16
Branches 31928 31935 +7
==========================================
+ Hits 87978 87984 +6
- Misses 48327 48334 +7
- Partials 1516 1519 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Closing in favour of PR from correct fork. |
| category_values = ( | ||
| df.columns.get_level_values(-1).unique() | ||
| if len(df.columns) > 0 | ||
| else [None] | ||
| ) | ||
| for metric in missing: | ||
| for cat in category_values: | ||
| df[(metric, cat)] = float("nan") |
There was a problem hiding this comment.
Suggestion: The restoration logic for MultiIndex pivots only uses the last column level (get_level_values(-1)) and then writes 2-tuples ((metric, cat)), which is incorrect when the pivot has more than one columns dimension. For pivots like columns=["a","b"], this creates malformed/incomplete column keys (or can raise when tuple depth does not match), so all-NaN metrics are not restored correctly and downstream operations can still fail. Build restored keys using the full non-metric tuple for every existing column combination (all levels after metric), not just the last level. [logic error]
Severity Level: Major ⚠️
- ❌ Multi-column pivot post-processing can crash during pivot().
- ❌ All-NaN metrics not restored for multi-level pivots.
- ⚠️ Mixed-timeseries charts using multi-column pivots may fail.
- ⚠️ Rolling/rename post-processing fails for affected metrics.Steps of Reproduction ✅
1. Configure a query with post-processing in `superset/common/query_object.py:500-534` so
that `exec_post_processing()` includes a pivot operation with `operation: "pivot"` and
`options` containing `columns=["a", "b"]`, `drop_missing_columns=True` (default), and
multiple metrics (e.g. `"metric": {"operator": "mean"}` and `"metric2": {"operator":
"mean"}`).
2. Ensure the database result feeding this query contains multiple grouping columns `a`
and `b` and that, for at least one metric (e.g. `"metric"`), all values are NULL/NaN for
every row; this matches the all-NaN metric scenario already covered for single-column
pivots in
`tests/unit_tests/pandas_postprocessing/test_pivot.py:test_pivot_preserves_all_nan_metric_flat`
(lines 70-92), but extended here to `columns=["a", "b"]` as in
`test_pivot_eliminate_cartesian_product_columns` (lines 16-66).
3. When `exec_post_processing()` calls `pandas_postprocessing.pivot()` (implemented in
`superset/utils/pandas_postprocessing/pivot.py:60-142`) with `columns=["a", "b"]`,
`df.pivot_table(...)` (lines 122-131) produces a MultiIndex on `df.columns` where level 0
is the metric name and subsequent levels correspond to each `columns` entry (`a`, then
`b`). Because one metric is entirely NaN and `dropna=drop_missing_columns=True`, pandas
drops that metric's columns completely.
4. `_restore_dropped_metric_columns()` (lines 31-57 in `pivot.py`) then runs for this
MultiIndex DataFrame. In the MultiIndex branch (lines 41-52), it collects only the last
column level via `df.columns.get_level_values(-1).unique()` (line 45) and, for each
missing metric, assigns new columns using 2‑tuples `df[(metric, cat)] = float("nan")`
(line 52). For a MultiIndex with more than two levels (metric + `a` + `b`), this key has
the wrong depth: on current pandas versions this either (a) raises a pandas error when
attempting to insert a column with a tuple shorter than `df.columns.nlevels`, or (b)
creates malformed/incomplete column labels that don't match the original `(metric, a, b)`
shape. In either case, the all-NaN metric is not properly restored for every `(a, b)`
combination, so subsequent post-processing such as `rename` and `rolling` (both decorated
with `validate_column_args` in `superset/utils/pandas_postprocessing/utils.py:114-131`)
can still fail with `InvalidPostProcessingError("Referenced columns not available in
DataFrame.")` when they reference that metric.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/utils/pandas_postprocessing/pivot.py
**Line:** 45:52
**Comment:**
*Logic Error: The restoration logic for MultiIndex pivots only uses the last column level (`get_level_values(-1)`) and then writes 2-tuples (`(metric, cat)`), which is incorrect when the pivot has more than one `columns` dimension. For pivots like `columns=["a","b"]`, this creates malformed/incomplete column keys (or can raise when tuple depth does not match), so all-NaN metrics are not restored correctly and downstream operations can still fail. Build restored keys using the full non-metric tuple for every existing column combination (all levels after metric), not just the last level.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix|
The flagged issue is correct: the restoration logic incorrectly uses only the last column level for MultiIndex pivots, creating tuples with wrong depth for multi-column pivots. To resolve, modify the code to collect all unique combinations of non-metric levels and build full tuples. No other comments found in the PR. superset/utils/pandas_postprocessing/pivot.py |
SUMMARY
When a Jinja conditional metric expression (e.g.
{% if filter_values('x') %}SUM(col){% else %}null{% endif %}) evaluates tonullbecause no dashboard filter is selected, the SQL query returns an all-NULL column for that metric. pandaspivot_tablewithdropna=True(the default, controlled bydrop_missing_columns: !show_empty_columnsinpivotOperator.ts) then silently drops that column from the pivot result.Downstream post-processing steps —
renameandrolling— usevalidate_column_argsto assert that all referenced columns exist before executing. Because the metric column was dropped, they raise:This surfaces as an error on mixed-timeseries charts whenever
rename(triggered by groupby +truncate_metric) orrollingis configured.Root cause chain:
filter_values()evaluates tonullwhen no filter selectedpivot_table(dropna=True)drops the all-NaN columnrename/rollingpost-processing fails because column is missingFix: Introduce
_restore_dropped_metric_columns()inpivot.pywhich re-adds any expected metric columns thatpivot_tabledropped due to all-NaN values. This preserves the expected DataFrame schema for downstream post-processing, while still dropping sparse category combinations (the intended behavior ofdrop_missing_columns).BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before: Mixed-timeseries chart with Jinja SQL metric and no active dashboard filter shows error "Referenced columns not available in DataFrame."
After: Chart renders correctly (empty series since metric evaluates to NULL, but no error).
TESTING INSTRUCTIONS
{% if filter_values('my_column') %} SUM(CASE WHEN my_column IN {{ filter_values('my_column') | where_in }} THEN value ELSE 0 END) {% else %} null {% endif %}ADDITIONAL INFORMATION
Regression tests added in
tests/unit_tests/pandas_postprocessing/test_pivot.py:test_pivot_preserves_all_nan_metric_flat: flat pivot (no groupby) — metric column restored as all-NaNtest_pivot_preserves_all_nan_metric_with_columns: MultiIndex pivot (with groupby) — metric level preserved at level 0