feat(explore): dataset macro: dttm filter context #25950

giftig · 2023-11-10T15:08:17Z

SUMMARY

Pass from_dttm and to_dttm as additional context to the dataset macro, allowing any jinja templating in the underlying dataset query text to refer to this context.

This is useful in situations where virtual dataset A incorporates from_dttm and/or to_dttm into the query, in order to make the underlying query aware of chart filter contexts, and then virtual dataset B is built on top of dataset A and used in charts.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before

After

TESTING INSTRUCTIONS

Make sure ENABLE_TEMPLATE_PROCESSING is turned on, and use a standard dev superset environment with the samples database available. You can also tweak these steps to use other data or backends as long as you apply a time filter in your dashboard.

Using the samples database, run the following query in SQL Lab:

SELECT ds, gender, name, num_boys, num_girls, '{{ from_dttm or "empty" }}' from_dttm FROM public.birth_names

and save it as a virtual dataset called from_dttm. Currently the from_dttm field will show empty because it won't be populated until a filter is applied on a chart in a later step.

Create a table chart from this dataset, select all fields, and add a time filter on ds. Add this chart to a new dashboard, and configure a time filter for ds on that dashboard. Set that filter to the past 30 years so we can see some data. Notice that now from_dttm displays a datetime value.
In SQL Lab again, run the following query:

SELECT * FROM {{ dataset(25) }};

you'll need to make sure 25 here is the ID of your virtual dataset from_dttm in step 1. Create a chart exactly like the first one, and add it to the same dashboard so you can see the two charts side by side.

Compare the two tables and you will see from_dttm is correctly populated in both charts; prior to this change the second one, using the dataset macro, will show empty instead as it does not propagate sufficient context from the chart through the macro.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags: ENABLE_TEMPLATE_PROCESSING
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

giftig · 2023-11-10T15:11:48Z

Note that I tried to capture this scenario in a test case, but since it's quite complex, with one dataset referring to another and needing to render jinja in both, I couldn't see a sensible way of doing it in a unit test -- other unit tests for the dataset macro mock the method which would be rendering the jinja in the underlying dataset to begin with. I also found that the integration tests for jinja were much simpler and didn't cover the dataset macro.

The test coverage for dataset_macro is sufficient to show that there isn't a regression in that macro, though; all I've done is add some extra context which may be used. I think that's ok in terms of coverage, but happy for feedback on that.

Edit: I also initially broke a whole slew of integration tests by getting the time format wrong, so general coverage seems quite good here.

giftig · 2023-11-10T15:34:56Z

There's a slight wrinkle here in that the datetime format actually varies depending on the engine, which the integration tests have highlighted. Unfortunately to preserve the information to the underlying dataset we need to parse it back into a datetime regardless, since we only have a stringified version in the context here, and when going direct, we have a datetime instead.

That means I've had to be a little bit flexible in how the datetime is parsed, so I've handled the most major formats and done it in a safe way, falling back to omitting it if not present.

Not sure if we can somehow preserve this information better; I think this is the best we can do without considering some major changes to the way this is working.

john-bodley

Thanks @giftig for the PR. One question I had is why restrict this to only from_dttm and to_dttm, i.e., should the entire context be exposed?

Additionally would you mind sharing in your PR description a screenshot of how your dataset is configured with Jinja2 templating? I sense this would help to provide more context for people to review.

superset/jinja_context.py

giftig · 2023-11-13T09:20:51Z

@john-bodley

Thanks for taking a look at this so quickly.

re exposing the entire context: I'll take a look what else is available which would make sense to pass in; not sure how easy it'd be to expose + test the whole context, since with these parameters I had to do the date parsing to get it back to its original form so it worked seamlessly, but if there are a couple of extra obvious parameters to include I'll add those too. If I add the whole context without being aware of what's in those fields, we might see the wrong types for other fields too, and that may be more confusing than just not having the context.

re dataset configuration: in the testing instructions I gave instructions for a (fairly contrived) example dataset to show the principle of when and why this causes issues. Hopefully that provides enough context. Our actual usecase is a custom macro for querying a table which applies a sensible default filter to prevent accidentally querying too much data, and it uses the dashboard filter instead of a short default when available, but I think the simpler example illustrates what's currently missing more clearly and also makes it easy to reproduce / test the change.

Pass from_dttm and to_dttm as additional context to the dataset macro, allowing any jinja templating in the underlying dataset query text to refer to this context. This is useful in situations where virtual dataset A incorporates from_dttm and/or to_dttm into the query, in order to make the underlying query aware of chart filter contexts, and then virtual dataset B is built on top of dataset A and used in charts

giftig · 2023-11-14T09:53:54Z

@john-bodley I tried passing the rest of the context alongside these parameters but we're actually using this data to add to a query_obj which we use to call this method:

    def get_sqla_query(  # pylint: disable=too-many-arguments,too-many-locals,too-many-branches,too-many-statements
        self,
        apply_fetch_values_predicate: bool = False,
        columns: Optional[list[Column]] = None,
        extras: Optional[dict[str, Any]] = None,
        filter: Optional[  # pylint: disable=redefined-builtin
            list[utils.QueryObjectFilterClause]
        ] = None,
        from_dttm: Optional[datetime] = None,
        granularity: Optional[str] = None,
        groupby: Optional[list[Column]] = None,
        inner_from_dttm: Optional[datetime] = None,
        inner_to_dttm: Optional[datetime] = None,
        is_rowcount: bool = False,
        is_timeseries: bool = True,
        metrics: Optional[list[Metric]] = None,
        orderby: Optional[list[OrderBy]] = None,
        order_desc: bool = True,
        to_dttm: Optional[datetime] = None,
        series_columns: Optional[list[Column]] = None,
        series_limit: Optional[int] = None,
        series_limit_metric: Optional[Metric] = None,
        row_limit: Optional[int] = None,
        row_offset: Optional[int] = None,
        timeseries_limit: Optional[int] = None,
        timeseries_limit_metric: Optional[Metric] = None,
        time_shift: Optional[str] = None,

so we can't pass everything; in my case it contained time_column, for example, which isn't a valid argument to this method. I guess what we really need to do is intelligently pass all arguments of this method through which we might have in the context, and that might also involve fixing some types as it did in my case. I captured some context from my chart, and saw:

{'columns': ['ds', 'gender', 'name', 'num_boys', 'num_girls', 'from_dttm'],
 'filter': [{'col': 'ds',
             'op': 'TEMPORAL_RANGE',
             'val': 'DATEADD(DATETIME("now"), -30, year) : now'}],
 'from_dttm': '1993-11-14T09:16:20',
 'groupby': None,
 'metrics': None,
 'row_limit': 1000,
 'row_offset': 0,
 'table_columns': ['ds',
                   'gender',
                   'name',
                   'num_boys',
                   'num_girls',
                   'from_dttm'],
 'time_column': None,
 'time_grain': 'P1D',
 'to_dttm': '2023-11-14T09:16:20'}

So at a glance I see basic properties of the dataset are present, e.g. columns, filter, groupby, but are in a serialised form so we'd need to recover these somehow. I don't want to creep beyond my original scope too much here, as I don't have good test cases for expected behaviour if I try to pass in some of these fields and this is a huge method, several hundred lines long, so I'd suggest we raise this as a possible future improvement.

pull-request-size bot added the size/S label Nov 10, 2023

giftig force-pushed the dataset-macro-context branch 2 times, most recently from 1e5f843 to f0b719b Compare November 10, 2023 15:32

pull-request-size bot added size/M and removed size/S labels Nov 10, 2023

giftig force-pushed the dataset-macro-context branch from f0b719b to b3ce064 Compare November 10, 2023 15:43

john-bodley requested review from john-bodley and betodealmeida November 10, 2023 17:58

john-bodley reviewed Nov 10, 2023

View reviewed changes

superset/jinja_context.py Outdated Show resolved Hide resolved

superset/jinja_context.py Outdated Show resolved Hide resolved

giftig force-pushed the dataset-macro-context branch from b3ce064 to 8adc63d Compare November 13, 2023 09:30

john-bodley approved these changes Nov 14, 2023

View reviewed changes

john-bodley merged commit f6ba75a into apache:master Nov 14, 2023
29 checks passed

josedev-union pushed a commit to Ortege-xyz/studio that referenced this pull request Jan 22, 2024

feat(explore): dataset macro: dttm filter context (apache#25950)

b6d9747

cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024

feat(explore): dataset macro: dttm filter context (apache#25950)

efd21b1

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024

sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024

feat(explore): dataset macro: dttm filter context (apache#25950)

ebe62ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(explore): dataset macro: dttm filter context #25950

feat(explore): dataset macro: dttm filter context #25950

giftig commented Nov 10, 2023

giftig commented Nov 10, 2023 •

edited

Loading

giftig commented Nov 10, 2023

john-bodley left a comment

giftig commented Nov 13, 2023 •

edited

Loading

giftig commented Nov 14, 2023

feat(explore): dataset macro: dttm filter context #25950

feat(explore): dataset macro: dttm filter context #25950

Conversation

giftig commented Nov 10, 2023

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before

After

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

giftig commented Nov 10, 2023 • edited Loading

giftig commented Nov 10, 2023

john-bodley left a comment

Choose a reason for hiding this comment

giftig commented Nov 13, 2023 • edited Loading

giftig commented Nov 14, 2023

giftig commented Nov 10, 2023 •

edited

Loading

giftig commented Nov 13, 2023 •

edited

Loading