[cache warm_up] warm_up slice with dashboard default_filters #9311

graceguo-supercat · 2020-03-16T18:43:51Z

SUMMARY

Superset offers a couple of ways to warm_up cache:

Superset will generate a query like form_data={slice_id: _slice_id_} so that we can warm_up query saved with a given slice_id.

Now a lot dashboard are having default filters (and filters can have scopes), warm_up single slice without dashboard context make our cache hit rate declining. Currently in airbnb we have 10% dashboard with default filters, but 20% of dashboards are landed with default_filters.

This PR is to add dashboard context into the cache warm_up call. You can pass extra dashboard_id parameter to wam_up API, indicating the slice is called with a given dashboard. It should generate a query like

form_data = {"slice_id":926, "extra_filters": [
    {
      "col": "region",
      "op": "in",
      "val": [
        "East Asia & Pacific",
        "Latin America & Caribbean"
      ]
    }
  ]}

TEST PLAN

added new unit tests

REVIEWERS

@john-bodley @serenajiang @etr2460 @dpgaspar @mistercrunch

john-bodley · 2020-03-16T18:50:34Z

superset/views/utils.py

+CONTAINER_TYPES = ["COLUMN", "GRID", "TABS", "TAB", "ROW"]
+
+
+def get_dashboard_extra_filters(


@graceguo-supercat I thought this logic already existed in the backend and the idea we discussed was possibly refactoring the logic to ensure that this as well as the Celery warmup logic was using the same dashboard filter logic.

Celery task is here: https://github.com/apache/incubator-superset/blob/000a038af193f2770dfe23a053bd1ff5e9e72bd4/superset/tasks/cache.py#L39

It missed a logic to check default_filters's scope if the scope is not globally, but pretty close. So my plan is to add logic for warm_up endpoint, then update Celery task to re-use this function in another PR.

oh never mind, the Celery warmup change is very simple. So i just commit Celery warmup changes here. And all unit tests can be re-used without any change!

codecov-io · 2020-03-16T19:28:42Z

Codecov Report

Merging #9311 into master will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #9311   +/-   ##
=======================================
  Coverage   59.08%   59.08%           
=======================================
  Files         374      374           
  Lines       12202    12205    +3     
  Branches     2986     2989    +3     
=======================================
+ Hits         7209     7211    +2     
- Misses       4814     4815    +1     
  Partials      179      179

Impacted Files	Coverage Δ
...plore/components/controls/FixedOrMetricControl.jsx	`51.78% <0.00%> (+0.84%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1916a1...9752ff4. Read the comment docs.

etr2460

looking good so far!

after looking through this and not knowing potential values for the filter object, i'm wondering if most of the .get("value", {}) and .get("value", []) statements should be .get("value") or {} and .get("value") or []

superset/views/utils.py

etr2460 · 2020-03-17T16:21:50Z

superset/views/utils.py

+    slice_id: int, dashboard_id: int
+) -> List[Dict[str, Any]]:
+    session = db.session()
+    slc = session.query(Slice).filter_by(id=slice_id).one_or_none()


seems like you don't actually need slc here, instead you could look for a slice in dashboard.slices based on the slice_id passed into the function. That would save a db call

superset/views/utils.py

etr2460 · 2020-03-17T16:31:50Z

superset/views/utils.py

+    if (
+        node_type == "CHART"
+        and node.get("meta")
+        and node.get("meta").get("chartId", 0) == slice_id


you don't need the default value of 0 for get here because None will never be equal to slice_id

You could concat lines 336 and 337 to be,

and node.get("meta", {}).get("chartId") == slice_id

etr2460 · 2020-03-17T16:33:33Z

superset/views/utils.py

+
+    if node_type in CONTAINER_TYPES:
+        children = node.get("children", [])
+        if children:


this should be unnecessary, if you're concerned that children could be None or 0, then i'd recommend making the line above children = node.get("children") or []

john-bodley

@betodealmeida would you mind taking a look as this PR refactors some of the logic related to your Celery task work.

john-bodley · 2020-03-17T16:54:26Z

superset/views/utils.py

+    if (
+        node_type == "CHART"
+        and node.get("meta")
+        and node.get("meta").get("chartId", 0) == slice_id


You could concat lines 336 and 337 to be,

and node.get("meta", {}).get("chartId") == slice_id

serenajiang · 2020-03-17T17:49:57Z

superset/views/utils.py

+        children = node.get("children", [])
+        if children:
+            # for child_id in children_ids:
+            if any(


Since this is within the last if-statement, I think you can just write return any(...) instead of if any(...): return True.

graceguo-supercat · 2020-03-17T21:45:28Z

looking good so far!

after looking through this and not knowing potential values for the filter object, i'm wondering if most of the .get("value", {}) and .get("value", []) statements should be .get("value") or {} and .get("value") or []

@etr2460 Thanks for code review! I follow documentation here:
https://docs.python.org/2/library/stdtypes.html#dict.get
and
https://stackoverflow.com/questions/33263623/dict-getkey-default-vs-dict-getkey-or-default

I would say most of my usage is for the dict that doesn't have key. Why do you think they should be .get("value") or {} and .get("value") or []?

graceguo-supercat · 2020-03-17T21:48:51Z

superset/views/utils.py

+        scopes_by_filter_field = filter_scopes.get(filter_id, {})
+        for col, val in columns.items():
+            current_field_scopes = scopes_by_filter_field.get(col, {})
+            scoped_container_ids = current_field_scopes.get("scope", ["ROOT_ID"])


at this place, current_field_scopes.get("scope") the returned value could be [], which means this filter currently didn't apply to any tab. If i use current_field_scopes.get("scope") or ["ROOT_ID"]
the returned value will become ["ROOT_ID"] (which means globally apply)

etr2460 · 2020-03-17T21:50:03Z

The main difference here applies if it's possible for a key to be set to a falsey value. If you have the dict:

d = {
  "key": None,
  "foo": False,
}

then:

d.get("key", []) # equals None
d.get("foo", {}) # equals False
d.get("key") or [] # equals []
d.get("foo") or {} # equals {}

This is really important to get right if you try to iterate through an array after "getting" it like this, as iterating over False or None will throw

graceguo-supercat · 2020-03-17T21:55:23Z

The main difference here applies if it's possible for a key to be set to a falsey value. If you have the dict:
d = {
  "key": None,
  "foo": False,
}
then:
d.get("key", []) # equals None
d.get("foo", {}) # equals False
d.get("key") or [] # equals []
d.get("foo") or {} # equals {}
This is really important to get right if you try to iterate through an array after "getting" it like this, as iterating over False or None will throw

Correct. If the key exists, I should use falsy value, instead of assign a default value. See my example above.

etr2460 · 2020-03-17T22:04:50Z

What i'm saying is if you default it to [] with or then you don't even need to check if it's falsey. The empty array will simply bypass the for loop

graceguo-supercat · 2020-03-17T22:17:30Z

What i'm saying is if you default it to [] with or then you don't even need to check if it's falsey. The empty array will simply bypass the for loop

My question is this comment:
after looking through this and not knowing potential values for the filter object, i'm wondering if most of the .get("value", {}) and .get("value", []) statements should be .get("value") or {} and .get("value") or []

Why should change most of dict.get(key, default_value) to dict.get(key) or default_value?

etr2460 · 2020-03-17T22:25:09Z

Here's a specific example:

You have filter_scopes = json_metadata.get("filter_scopes", {}) in the code, and then you later assume filter_scopes is a Dict. However, because the json_metadata can be edited by the user, it could be None or False. This would break with your current code, but not with json_metadata.get("filter_scopes") or {}

In general, I think we need to be a lot more defensive with parsing the json_metadata blob here, because an invalid metadata field isn't validated on save and it would break the code here

graceguo-supercat · 2020-03-18T00:36:53Z

superset/views/utils.py

+    # is chart in this dashboard?
+    if (
+        dashboard is None
+        or not dashboard.json_metadata


here i check json_metadata exists and not None.

graceguo-supercat · 2020-03-18T01:17:16Z

superset/views/utils.py

+    ):
+        return []
+
+    try:


add try catch exception for parsing dashboard metadata

graceguo-supercat · 2020-03-18T01:19:39Z

superset/views/utils.py

+    filter_scopes: Dict,
+    default_filters: Dict[str, Dict[str, List]],
+    slice_id: int,
+) -> List[Dict[str, Any]]:


added type check for layout, filter_scopes and default_filters, make sure they are Dictionary

The typecheck here isn't quite right because you've already assumed the type is correct in the function statement. I'd recommend moving the type check into the parent function so that the types are correct here

etr2460

lgtm with one other comment. you might want to get a stamp from John or Beto too since i'm not super familiar with this code

etr2460 · 2020-03-18T01:25:09Z

superset/views/utils.py

+    filter_scopes: Dict,
+    default_filters: Dict[str, Dict[str, List]],
+    slice_id: int,
+) -> List[Dict[str, Any]]:


The typecheck here isn't quite right because you've already assumed the type is correct in the function statement. I'd recommend moving the type check into the parent function so that the types are correct here

john-bodley · 2020-03-18T04:29:39Z

superset/tasks/cache.py

-            if col not in immune_fields:
-                extra_filters.append({"col": col, "op": "in", "val": val})
+    layout = json.loads(dashboard.position_json or "{}")
+    # do not apply filters if chart is immune to them


Nit. I would nix the comment as the following function has not reference to immunity.

john-bodley · 2020-03-18T04:36:25Z

superset/views/utils.py

+        dashboard is None
+        or not dashboard.json_metadata
+        or not dashboard.slices
+        or slice_id not in [slc.id for slc in dashboard.slices]


This is probably preferred as it should short circuit,

or not any([slc for slc in dashboard.slices if slc.id == slice_id])

pull-request-size bot added the size/L label Mar 16, 2020

john-bodley reviewed Mar 16, 2020

View reviewed changes

Grace added 2 commits March 16, 2020 23:25

[cache warm_up] warm_up slice with dashboard default_filters

f679e09

update Celery warmup tasks

7de2aed

graceguo-supercat force-pushed the gg-WarmupSliceInWithDash branch from 8f8f517 to 7de2aed Compare March 17, 2020 06:27

etr2460 reviewed Mar 17, 2020

View reviewed changes

john-bodley reviewed Mar 17, 2020

View reviewed changes

graceguo-supercat requested a review from betodealmeida March 17, 2020 17:05

serenajiang reviewed Mar 17, 2020

View reviewed changes

fix code review comments

9752ff4

graceguo-supercat force-pushed the gg-WarmupSliceInWithDash branch from 2db3885 to 9752ff4 Compare March 17, 2020 21:33

graceguo-supercat commented Mar 17, 2020

View reviewed changes

graceguo-supercat commented Mar 18, 2020

View reviewed changes

graceguo-supercat force-pushed the gg-WarmupSliceInWithDash branch from aa779f0 to a439997 Compare March 18, 2020 01:15

graceguo-supercat commented Mar 18, 2020

View reviewed changes

superset/views/utils.py

):

return []

try:

Copy link

Author

graceguo-supercat Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add try catch exception for parsing dashboard metadata

graceguo-supercat commented Mar 18, 2020

View reviewed changes

etr2460 approved these changes Mar 18, 2020

View reviewed changes

add try catch and type checking for parsed dash metadata

2c2b020

graceguo-supercat force-pushed the gg-WarmupSliceInWithDash branch from a439997 to 2c2b020 Compare March 18, 2020 01:38

john-bodley reviewed Mar 18, 2020

View reviewed changes

extra code review fix

ee963e5

john-bodley approved these changes Mar 18, 2020

View reviewed changes

graceguo-supercat merged commit adebd40 into apache:master Mar 18, 2020

graceguo-supercat deleted the gg-WarmupSliceInWithDash branch March 22, 2020 05:06

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cache warm_up] warm_up slice with dashboard default_filters #9311

[cache warm_up] warm_up slice with dashboard default_filters #9311

graceguo-supercat commented Mar 16, 2020 •

edited

john-bodley Mar 16, 2020

graceguo-supercat Mar 16, 2020 •

edited

graceguo-supercat Mar 16, 2020

codecov-io commented Mar 16, 2020 •

edited

etr2460 left a comment

etr2460 Mar 17, 2020

graceguo-supercat Mar 17, 2020

etr2460 Mar 17, 2020

john-bodley Mar 17, 2020

etr2460 Mar 17, 2020

graceguo-supercat Mar 17, 2020

john-bodley left a comment

john-bodley Mar 17, 2020

serenajiang Mar 17, 2020

graceguo-supercat Mar 17, 2020

graceguo-supercat commented Mar 17, 2020 •

edited

graceguo-supercat Mar 17, 2020 •

edited

etr2460 commented Mar 17, 2020 •

edited

graceguo-supercat commented Mar 17, 2020 •

edited

etr2460 commented Mar 17, 2020

graceguo-supercat commented Mar 17, 2020 •

edited

etr2460 commented Mar 17, 2020

graceguo-supercat Mar 18, 2020

graceguo-supercat Mar 18, 2020

graceguo-supercat Mar 18, 2020

etr2460 Mar 18, 2020

etr2460 left a comment

etr2460 Mar 18, 2020

john-bodley Mar 18, 2020

john-bodley Mar 18, 2020

		CONTAINER_TYPES = ["COLUMN", "GRID", "TABS", "TAB", "ROW"]


		def get_dashboard_extra_filters(

[cache warm_up] warm_up slice with dashboard default_filters #9311

[cache warm_up] warm_up slice with dashboard default_filters #9311

Conversation

graceguo-supercat commented Mar 16, 2020 • edited

CATEGORY

SUMMARY

TEST PLAN

REVIEWERS

Choose a reason for hiding this comment

graceguo-supercat Mar 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Mar 16, 2020 • edited

Codecov Report

etr2460 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-bodley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graceguo-supercat commented Mar 17, 2020 • edited

graceguo-supercat Mar 17, 2020 • edited

Choose a reason for hiding this comment

etr2460 commented Mar 17, 2020 • edited

graceguo-supercat commented Mar 17, 2020 • edited

etr2460 commented Mar 17, 2020

graceguo-supercat commented Mar 17, 2020 • edited

etr2460 commented Mar 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etr2460 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

graceguo-supercat commented Mar 16, 2020 •

edited

graceguo-supercat Mar 16, 2020 •

edited

codecov-io commented Mar 16, 2020 •

edited

graceguo-supercat commented Mar 17, 2020 •

edited

graceguo-supercat Mar 17, 2020 •

edited

etr2460 commented Mar 17, 2020 •

edited

graceguo-supercat commented Mar 17, 2020 •

edited

graceguo-supercat commented Mar 17, 2020 •

edited