Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style(mypy): enforcing mypy typing for superset.connectors module #9824

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented May 17, 2020

SUMMARY

Adding mypy type enforcement to the superset.connectors module.

Note this PR was much more involved/complex than I had originally thought given the dependency on other modules which were untyped. Additionally the overall code quality is quite poor; large complex functions, numerous variables which were either optional or with mixed types (Union), variable redefinition, etc. meant that the logic was quite hard to grok.

Note this PR may introduce some regressions due to the necessary restructuring, however I sense long-term benefits of having the entire repo typed outweighs the short-term concerns regarding possibly (minor) regressions.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from 9939433 to 507086a Compare May 17, 2020 16:41
@villebro
Copy link
Member

Oh boy, gonna need to take some sedatives before this 😄 Thanks for taking this on @john-bodley , I'll review shortly.

@codecov-io
Copy link

codecov-io commented May 17, 2020

Codecov Report

Merging #9824 into master will decrease coverage by 4.54%.
The diff coverage is 84.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9824      +/-   ##
==========================================
- Coverage   70.88%   66.33%   -4.55%     
==========================================
  Files         585      586       +1     
  Lines       30432    30670     +238     
  Branches     3152     3158       +6     
==========================================
- Hits        21571    20346    -1225     
- Misses       8749    10143    +1394     
- Partials      112      181      +69     
Flag Coverage Δ
#cypress ?
#javascript 59.24% <ø> (-0.02%) ⬇️
#python 71.27% <84.36%> (+0.25%) ⬆️
Impacted Files Coverage Δ
superset/viz_sip38.py 0.00% <0.00%> (ø)
superset/jinja_context.py 81.73% <40.00%> (-3.99%) ⬇️
superset/views/base.py 73.36% <41.66%> (+0.14%) ⬆️
superset/connectors/druid/models.py 82.45% <88.00%> (-0.15%) ⬇️
superset/viz.py 72.02% <91.66%> (+0.11%) ⬆️
superset/connectors/sqla/models.py 88.83% <95.00%> (+0.27%) ⬆️
superset/connectors/base/models.py 90.11% <100.00%> (-0.05%) ⬇️
superset/connectors/base/views.py 75.00% <100.00%> (+3.57%) ⬆️
superset/connectors/druid/views.py 68.91% <100.00%> (+0.42%) ⬆️
superset/connectors/sqla/views.py 77.86% <100.00%> (+0.87%) ⬆️
... and 181 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53b58ed...a7e5c32. Read the comment docs.

@john-bodley
Copy link
Member Author

@villebro I started out thinking this would be a somewhat mindless/trivial change to kill some time whilst sheltering-in-place. It definitely became more complex/tedious than I initially had hoped.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch 2 times, most recently from 368dc7e to 9cea662 Compare May 17, 2020 18:52
@john-bodley john-bodley changed the title style: enforcing mypy typing for connectors chore(mypy): enforcing mypy typing for connectors May 17, 2020
@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from 9cea662 to c9c7a59 Compare May 17, 2020 20:09
@@ -841,6 +858,7 @@ def get_sqla_query( # sqla

where_clause_and = []
having_clause_and: List = []
assert filter
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this could be None per the function signature and thus line #862 would throw an exception, however to the best of my knowledge this hasn't happened and thus filter is probably never None.

"""
Delete function logic, override to implement diferent logic
deletes the record with primary_key = primary_key

:param primary_key:
record primary key to delete
"""
item = self.datamodel.get(primary_key, self._base_filters)
item = self.datamodel.get(primary_key, self._base_filters) # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because FAB isn't typed and there's no stubs it seems simpler to just ignore these.

self.x_metric = form_data.get("x")
self.y_metric = form_data.get("y")
self.z_metric = form_data.get("size")
self.x_metric = form_data["x"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I perceive these are all required otherwise utils.get_metric_name(...) etc. would return None and cause issues when trying to index the Pandas DataFrame.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from c9c7a59 to 8142a6c Compare May 18, 2020 02:16
@john-bodley john-bodley changed the title chore(mypy): enforcing mypy typing for connectors style(mypy): enforcing mypy typing for connectors May 18, 2020
@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch 3 times, most recently from eef8562 to a7e5c32 Compare May 18, 2020 05:11
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass

@@ -93,7 +93,7 @@ class BaseDatasource(
update_from_object_fields: List[str]

@declared_attr
def slices(self):
def slices(self) -> relationship:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 185 to 186
def select_star(self) -> str:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be Optional[str] or alternatively raise NotImplementedError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro a NotImplementedError causes havoc for Druid hence why this is pass. I made this an Optional[str].

@@ -449,7 +460,7 @@ def get_perm(self) -> Optional[str]:

@classmethod
def import_obj(cls, i_metric: "DruidMetric") -> "DruidMetric":
def lookup_obj(lookup_metric: DruidMetric) -> Optional[DruidMetric]:
def lookup_obj(lookup_metric: "DruidMetric") -> Optional["DruidMetric"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how a nested function referencing its parent method's class is handled by mypy, but if this was working before, we should probably be ok leaving this without quotes?

DbapiDescriptionRow = Tuple[
str, str, Optional[str], Optional[str], Optional[int], Optional[int], bool
]
DbapiDescription = Union[List[DbapiDescriptionRow], Tuple[DbapiDescriptionRow, ...]]
DbapiResult = List[Union[List[Any], Tuple[Any, ...]]]
FilterValue = Union[float, int, str]
FilterValues = Union[FilterValue, List[FilterValue], Tuple[FilterValue]]
Granularity = Union[str, Dict[str, Union[str, float]]]
Metric = Union[Dict[str, str], str]
QueryObject = Dict[str, Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably call this QueryObjectDict to distinguish from superset.common.query_object.QueryObject.

@@ -558,8 +569,8 @@ def get_perm(self) -> str:
obj=self
)

def update_from_object(self, obj):
return NotImplementedError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

@@ -817,7 +829,7 @@ def granularity(
"year": "P1Y",
}

granularity: Dict[str, Union[str, float]] = {"type": "period"}
granularity = {"type": "period"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought this turns into a Dict[str, str] unless being explicitly being defined as Granularity, and thus causing problems below when numeric values are added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro I guess it differs from the Granularity type which I defined previously (which adds to the confusion). When including the typing here it causes a conflict with the return type as it can also be a string per line 812.

A couple of learnings from having typed a bunch of Python code:

  1. Optional[...] values are difficult to handle and thus should be avoided if possible. It seems they're often used by laziness.
  2. Mixed types, i.e., those requiring a Union[...] normally indicates poor quality and unnecessarily complex code.

I think the return type of granularity is a clear indication of (2). Also I believe the true return type should be Union[Dict[str, Union[str, float]], str] but mypy complains on this.

@@ -1240,7 +1268,7 @@ def run_query( # druid
qry["limit"] = row_limit
client.scan(**qry)
elif (IS_SIP_38 and columns) or (
not IS_SIP_38 and len(groupby) == 0 and not having_filters
not IS_SIP_38 and (not groupby or len(groupby) == 0) and not having_filters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a simple falsy check here would be more pythonic: not groupby? I can't think of an edge case for which it wouldn't give the correct results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro thanks. I agree and made this change elsewhere.


order_by = None
order_by = None # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we define this as order_by: Optional[Metric], we can probably omit the ignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro the issue here is that order_by has already been defined on line 1291,

order_by = utils.get_metric_name(timeseries_limit_metric)

I declared the type as order_by: Optional[str] = None.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from a7e5c32 to dac1337 Compare May 18, 2020 20:21
@john-bodley
Copy link
Member Author

@villebro thanks for the feedback. I've addressed your comments. Note #9833 is a prerequisite to fix the docs check which is failing.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch 2 times, most recently from 66c673b to 867d57f Compare May 19, 2020 04:29
@codecov-commenter
Copy link

codecov-commenter commented May 19, 2020

Codecov Report

Merging #9824 into master will decrease coverage by 0.00%.
The diff coverage is 84.39%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9824      +/-   ##
==========================================
- Coverage   71.22%   71.21%   -0.01%     
==========================================
  Files         585      585              
  Lines       30828    30864      +36     
  Branches     3237     3237              
==========================================
+ Hits        21957    21981      +24     
- Misses       8762     8774      +12     
  Partials      109      109              
Flag Coverage Δ
#cypress 53.77% <ø> (-0.08%) ⬇️
#javascript 59.38% <ø> (ø)
#python 71.40% <84.39%> (+<0.01%) ⬆️
Impacted Files Coverage Δ
superset/viz_sip38.py 0.00% <0.00%> (ø)
superset/jinja_context.py 81.73% <40.00%> (-3.99%) ⬇️
superset/views/base.py 73.36% <46.15%> (-0.09%) ⬇️
superset/connectors/druid/models.py 82.47% <88.00%> (-0.21%) ⬇️
superset/viz.py 71.97% <90.90%> (ø)
superset/connectors/sqla/models.py 88.83% <95.00%> (+0.23%) ⬆️
superset/connectors/base/models.py 90.11% <100.00%> (-0.35%) ⬇️
superset/connectors/base/views.py 75.00% <100.00%> (+3.57%) ⬆️
superset/connectors/druid/views.py 68.91% <100.00%> (+0.21%) ⬆️
superset/connectors/sqla/views.py 77.86% <100.00%> (+0.34%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9edfc8f...eeb2729. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from 867d57f to 8c4c77a Compare May 19, 2020 05:15
@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from 8c4c77a to eef2c96 Compare May 19, 2020 20:57
@@ -985,11 +988,11 @@ def is_adhoc_metric(metric) -> bool:
)


def get_metric_name(metric):
return metric["label"] if is_adhoc_metric(metric) else metric
def get_metric_name(metric: Metric) -> str:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro the code works if the metric is None (this can happen when a key exists in the form-data but the value is None) even though the Metric type is explicitly Union[Dict[str, str], str]. Generally I think we should strive to ensure only valid inputs are provided to functions (otherwise the logic can get quite complex), i.e., I prefer,

if metric:
    name = get_metric_name(metric)
    ...

rather than,

name = get_metric_name(metric)

if name:
   ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from eef2c96 to b66071f Compare May 19, 2020 23:14
@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch 2 times, most recently from 6c67fde to 3c40dd7 Compare May 20, 2020 17:24
@john-bodley
Copy link
Member Author

@villebro would you mind taking another pass at this PR? All the CI tasks are now passing.

@john-bodley john-bodley changed the title style(mypy): enforcing mypy typing for connectors style(mypy): enforcing mypy typing for superset.connectors May 22, 2020
@villebro
Copy link
Member

Sure thing,@john-bodley , I'll review during the weekend

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non blocking comment/question, apart from that looking very good.

Comment on lines 40 to 42
FlaskResponse = Union[
flask.wrappers.Response,
werkzeug.wrappers.Response,
Base,
Union[Base, Status],
Union[Base, Status, Headers],
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we already have a wrapping Union here, couldn't this be expressed as Union[..., Base, Status, Headers]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro, that's actually a typo as the later Union should be a Tuple. Also given that flask.wrappers.Response is derived from werkzeug.wrappers.Response per here this could be:

FlaskResponse = Union[
    werkzeug.wrappers.Response,
    Base,
    Tuple[Base, Status],
    Tuple[Base, Status, Headers],
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. Unrelated comment, it's really strange how practically none of the major libraries have implemented type annotations yet. Yes, stubs exist for many, but that doesn't really help when reading code.

@@ -985,11 +988,11 @@ def is_adhoc_metric(metric) -> bool:
)


def get_metric_name(metric):
return metric["label"] if is_adhoc_metric(metric) else metric
def get_metric_name(metric: Metric) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from 3c40dd7 to fb2bcaa Compare May 25, 2020 18:26
@villebro
Copy link
Member

LGTM after the Union to Tuple change 👍

@john-bodley john-bodley force-pushed the john-bodley--mypy-enforcement-connectors branch from fb2bcaa to eeb2729 Compare May 25, 2020 19:07
@john-bodley john-bodley merged commit 7f6dbf8 into apache:master May 25, 2020
@john-bodley john-bodley deleted the john-bodley--mypy-enforcement-connectors branch May 25, 2020 19:32
pkdotson pushed a commit to preset-io/superset that referenced this pull request May 26, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
@john-bodley john-bodley changed the title style(mypy): enforcing mypy typing for superset.connectors style(mypy): enforcing mypy typing for superset.connectors module May 29, 2020
@@ -256,8 +259,6 @@ def parse_human_datetime(s: Optional[str]) -> Optional[datetime]:
>>> year_ago_1 == year_ago_2
True
"""
if not s:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FOUND IT. @john-bodley

john-bodley added a commit to john-bodley/superset that referenced this pull request Jun 3, 2020
john-bodley added a commit that referenced this pull request Jun 3, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.37.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 0.37.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants