New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cache warm_up] warm_up slice with dashboard default_filters #9311
[cache warm_up] warm_up slice with dashboard default_filters #9311
Conversation
CONTAINER_TYPES = ["COLUMN", "GRID", "TABS", "TAB", "ROW"] | ||
|
||
|
||
def get_dashboard_extra_filters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@graceguo-supercat I thought this logic already existed in the backend and the idea we discussed was possibly refactoring the logic to ensure that this as well as the Celery warmup logic was using the same dashboard filter logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Celery task is here: https://github.com/apache/incubator-superset/blob/000a038af193f2770dfe23a053bd1ff5e9e72bd4/superset/tasks/cache.py#L39
It missed a logic to check default_filters's scope if the scope is not globally, but pretty close. So my plan is to add logic for warm_up endpoint, then update Celery task to re-use this function in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh never mind, the Celery warmup change is very simple. So i just commit Celery warmup changes here. And all unit tests can be re-used without any change!
Codecov Report
@@ Coverage Diff @@
## master #9311 +/- ##
=======================================
Coverage 59.08% 59.08%
=======================================
Files 374 374
Lines 12202 12205 +3
Branches 2986 2989 +3
=======================================
+ Hits 7209 7211 +2
- Misses 4814 4815 +1
Partials 179 179
Continue to review full report at Codecov.
|
8f8f517
to
7de2aed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good so far!
after looking through this and not knowing potential values for the filter object, i'm wondering if most of the .get("value", {})
and .get("value", [])
statements should be .get("value") or {}
and .get("value") or []
superset/views/utils.py
Outdated
slice_id: int, dashboard_id: int | ||
) -> List[Dict[str, Any]]: | ||
session = db.session() | ||
slc = session.query(Slice).filter_by(id=slice_id).one_or_none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like you don't actually need slc
here, instead you could look for a slice in dashboard.slices
based on the slice_id
passed into the function. That would save a db call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
superset/views/utils.py
Outdated
if ( | ||
node_type == "CHART" | ||
and node.get("meta") | ||
and node.get("meta").get("chartId", 0) == slice_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the default value of 0 for get
here because None
will never be equal to slice_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could concat lines 336 and 337 to be,
and node.get("meta", {}).get("chartId") == slice_id
superset/views/utils.py
Outdated
|
||
if node_type in CONTAINER_TYPES: | ||
children = node.get("children", []) | ||
if children: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be unnecessary, if you're concerned that children could be None
or 0
, then i'd recommend making the line above children = node.get("children") or []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida would you mind taking a look as this PR refactors some of the logic related to your Celery task work.
superset/views/utils.py
Outdated
if ( | ||
node_type == "CHART" | ||
and node.get("meta") | ||
and node.get("meta").get("chartId", 0) == slice_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could concat lines 336 and 337 to be,
and node.get("meta", {}).get("chartId") == slice_id
superset/views/utils.py
Outdated
children = node.get("children", []) | ||
if children: | ||
# for child_id in children_ids: | ||
if any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is within the last if-statement, I think you can just write return any(...)
instead of if any(...): return True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
2db3885
to
9752ff4
Compare
@etr2460 Thanks for code review! I follow documentation here: I would say most of my usage is for the dict that doesn't have key. Why do you think they should be |
scopes_by_filter_field = filter_scopes.get(filter_id, {}) | ||
for col, val in columns.items(): | ||
current_field_scopes = scopes_by_filter_field.get(col, {}) | ||
scoped_container_ids = current_field_scopes.get("scope", ["ROOT_ID"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this place, current_field_scopes.get("scope")
the returned value could be [], which means this filter currently didn't apply to any tab. If i use current_field_scopes.get("scope") or ["ROOT_ID"]
the returned value will become ["ROOT_ID"]
(which means globally apply)
The main difference here applies if it's possible for a key to be set to a falsey value. If you have the dict: d = {
"key": None,
"foo": False,
} then: d.get("key", []) # equals None
d.get("foo", {}) # equals False
d.get("key") or [] # equals []
d.get("foo") or {} # equals {} This is really important to get right if you try to iterate through an array after "getting" it like this, as iterating over False or None will throw |
Correct. If the key exists, I should use falsy value, instead of assign a default value. See my example above. |
What i'm saying is if you default it to |
My question is this comment: Why should change most of |
Here's a specific example: You have In general, I think we need to be a lot more defensive with parsing the json_metadata blob here, because an invalid metadata field isn't validated on save and it would break the code here |
# is chart in this dashboard? | ||
if ( | ||
dashboard is None | ||
or not dashboard.json_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here i check json_metadata exists and not None.
aa779f0
to
a439997
Compare
): | ||
return [] | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add try catch exception for parsing dashboard metadata
filter_scopes: Dict, | ||
default_filters: Dict[str, Dict[str, List]], | ||
slice_id: int, | ||
) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added type check for layout, filter_scopes and default_filters, make sure they are Dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typecheck here isn't quite right because you've already assumed the type is correct in the function statement. I'd recommend moving the type check into the parent function so that the types are correct here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with one other comment. you might want to get a stamp from John or Beto too since i'm not super familiar with this code
filter_scopes: Dict, | ||
default_filters: Dict[str, Dict[str, List]], | ||
slice_id: int, | ||
) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typecheck here isn't quite right because you've already assumed the type is correct in the function statement. I'd recommend moving the type check into the parent function so that the types are correct here
a439997
to
2c2b020
Compare
superset/tasks/cache.py
Outdated
if col not in immune_fields: | ||
extra_filters.append({"col": col, "op": "in", "val": val}) | ||
layout = json.loads(dashboard.position_json or "{}") | ||
# do not apply filters if chart is immune to them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. I would nix the comment as the following function has not reference to immunity.
superset/views/utils.py
Outdated
dashboard is None | ||
or not dashboard.json_metadata | ||
or not dashboard.slices | ||
or slice_id not in [slc.id for slc in dashboard.slices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably preferred as it should short circuit,
or not any([slc for slc in dashboard.slices if slc.id == slice_id])
CATEGORY
Choose one
SUMMARY
Superset offers a couple of ways to warm_up cache:
https://github.com/apache/incubator-superset/blob/8764ae385206c8bdccba1e7aa42def1929f6fba7/superset/views/core.py#L1645
https://github.com/apache/incubator-superset/blob/000a038af193f2770dfe23a053bd1ff5e9e72bd4/superset/tasks/cache.py#L257
Superset will generate a query like
form_data={slice_id: _slice_id_}
so that we can warm_up query saved with a given slice_id.Now a lot dashboard are having default filters (and filters can have scopes), warm_up single slice without dashboard context make our cache hit rate declining. Currently in airbnb we have 10% dashboard with default filters, but 20% of dashboards are landed with default_filters.
This PR is to add dashboard context into the cache warm_up call. You can pass extra dashboard_id parameter to
wam_up
API, indicating the slice is called with a given dashboard. It should generate a query likeTEST PLAN
added new unit tests
REVIEWERS
@john-bodley @serenajiang @etr2460 @dpgaspar @mistercrunch