Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time grain blacklist and addons to config.py #5380

Merged
merged 11 commits into from
Jul 31, 2018
Merged

Add time grain blacklist and addons to config.py #5380

merged 11 commits into from
Jul 31, 2018

Conversation

villebro
Copy link
Member

@villebro villebro commented Jul 12, 2018

  • Refactor time grain logic to avoid the use of conflicting names and durations
  • Harmonize discrepancies in time grains, e.g. P3M (Druid) vs P0.25Y (all others)
  • Add TIME_GRAIN_BLACKLIST to config.py to be able to remove unwanted time grains, e.g. 5/10/15 mins
  • Add TIME_GRAIN_ADDONS and TIME_GRAIN_ADDON_FUNCTIONS to config.py to add custom time grains not natively supported by Superset
  • Add tests for blacklist and validity of defined time grains

@codecov-io
Copy link

codecov-io commented Jul 12, 2018

Codecov Report

Merging #5380 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5380      +/-   ##
==========================================
+ Coverage   63.28%   63.31%   +0.03%     
==========================================
  Files         349      349              
  Lines       22121    22141      +20     
  Branches     2457     2457              
==========================================
+ Hits        13999    14019      +20     
  Misses       8108     8108              
  Partials       14       14
Impacted Files Coverage Δ
superset/models/core.py 86.95% <100%> (ø) ⬆️
superset/db_engine_specs.py 55.26% <100%> (+1.2%) ⬆️
superset/config.py 92.48% <100%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b6cafc...df61c68. Read the comment docs.

@mistercrunch
Copy link
Member

@betodealmeida

@villebro
Copy link
Member Author

Also added functionality to be able to add new custom time grains in config.py (TIME_GRAIN_ADDONS and TIME_GRAIN_ADDON_FUNCTIONS). This could be used to move some less used time grains out of db_engine_specs, while still allowing for end users to define their own custom time grains where needed.

Question: in config.py, which would be preferable: using a blacklist to remove predefined options (e.g. VIZ_TYPE_BLACKLIST), or effectively defining a whitelist (e.g. LANGUAGES)?

@villebro villebro changed the title Add time grain blacklist and tests for validity Add time grain blacklist and addons to config.py Jul 21, 2018
@villebro
Copy link
Member Author

villebro commented Jul 23, 2018

Comments would be much appreciated @betodealmeida @mistercrunch . This PR would make it easier to manage and add new time grains in production environments. If this PR is not aligned with the future direction of Superset I will be happy make necessary changes or close the PR.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @villebro! It greatly simplifies the definition of new durations and reduces a lot of the duplication.

I have just a few comments on the code, mostly nits, and there's one place where the function is wrong (probably due to an accidental deletion).

Question: in config.py, which would be preferable: using a blacklist to remove predefined options (e.g. VIZ_TYPE_BLACKLIST), or effectively defining a whitelist (e.g. LANGUAGES)?

I think doing a blacklist works better here.

'P1Y': 'year',
'1969-12-28T00:00:00Z/P1W': 'week_start_sunday',
'1969-12-29T00:00:00Z/P1W': 'week_start_monday',
'P1W/1970-01-03T00:00:00Z': 'week_ending_saturday',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add 'P1W/1970-01-04T00:00:00Z': 'week_ending_sunday', here, for completeness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.



def _create_time_grains_tuple(time_grains, time_grain_functions, blacklist):
ret_list = list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use [] to instantiate an empty list. It's negligibly faster, but the preferred style for Superset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will try to remember this in the future.

blacklist = blacklist if blacklist else []
for duration, func in time_grain_functions.items():
if duration not in blacklist:
name = time_grains.get(duration, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: {}.get already returns None when the key is not present, so this can be written simply as:

name = time_grains.get(duration)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@@ -65,12 +94,22 @@ class BaseEngineSpec(object):
"""Abstract class for database engine specific configurations"""

engine = 'base' # str as defined in sqlalchemy.engine.engine
time_grains = tuple()
time_grain_functions = dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use {} when instantiating an empty dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

'P1M': "DATE_TRUNC('month', {col}) AT TIME ZONE 'UTC'",
'P0.25Y': "DATE_TRUNC('quarter', {col}) AT TIME ZONE 'UTC'",
'P1Y': "DATE_TRUNC('year', {col}) AT TIME ZONE 'UTC'",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is much cleaner now. Love it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -52,6 +52,35 @@

Grain = namedtuple('Grain', 'name label function duration')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of the name attribute in Grain. I checked the code and it seems like it's not being used anywhere, and grains can be uniquely identified by the duration now that they're ISO durations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I misremembered that core.grains_dict() created a dict that mapped duration -> grain and name -> grain, but the latter was in fact label -> grain. I've now removed name.

grain_functions = cls.time_grain_functions.copy()
grain_addon_functions = config.get('TIME_GRAIN_ADDON_FUNCTIONS', {})
grain_functions.update(grain_addon_functions.get(cls.engine, {}))
return _create_time_grains_tuple(grains, grain_functions, blacklist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to call this get_time_grains, in order to be more explicit and also because it has changed from an attribute to a method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

)
time_grain_functions = {
None: '{col}',
'PT1S': "{col}) AT TIME ZONE 'UTC'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'PT1S': "DATE_TRUNC('second', {col}) AT TIME ZONE 'UTC'",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, good catch!

None: '{col}',
'PT1S': "DATE_TRUNC('SECOND', {col})",
'PT1M': "DATE_TRUNC('MINUTE', {col})",
'PT5M': "DATEADD(MINUTE, FLOOR(DATE_PART(MINUTE, {col}) / 5) * 5, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Were you able to test these against Snowflake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; not sure if it's the most efficient way of doing it, but managed to get good performance on a semi-large dataset, so should be good to go.

Copy link
Member Author

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All proposed changes implemented.

@villebro
Copy link
Member Author

@betodealmeida I tried to remove name from Grain and updating time_column_grains() and data() in connectors/sqla/models.py, but got slightly overwhelmed by the frontend code. For instance, controls.jsx seems to have similar but slightly different grains, and am unsure where these definitions are used:

https://github.com/apache/incubator-superset/blob/bea0a0aa1590d073365ecf74d75a054643dbe3b7/superset/assets/src/explore/controls.jsx#L750-L776

I'm sure I will be able to sort out how the frontend logic works, but it will take time, so I propose merging this and removing name in a separate PR.

@betodealmeida
Copy link
Member

@villebro, no worries, I'll take a stab at removing name from Grain.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you so much for doing this, @villebro!

@villebro
Copy link
Member Author

Great, thanks @betodealmeida !

@villebro
Copy link
Member Author

@betodealmeida FYI rebased to resolve conflict, any chance of getting this merged out of the way?

@betodealmeida
Copy link
Member

@mistercrunch, can you merge this? I still don't have merging privilege.

@mistercrunch mistercrunch merged commit c1e6c68 into apache:master Jul 31, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
* Add interim grains

* Refactor and add blacklist

* Change PT30M to PT0.5H

* Linting

* Linting

* Add time grain addons to config.py and refactor engine spec logic

* Remove redundant import and clean up config.py

* Fix bad rebase

* Implement changes proposed by @betodealmeida

* Revert removal of name from Grain

* Linting
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants