New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hive Engine Spec] Fix latest partition logic #8098
[Hive Engine Spec] Fix latest partition logic #8098
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments.
superset/db_engine_specs/hive.py
Outdated
@@ -299,16 +299,19 @@ def get_columns( | |||
@classmethod | |||
def where_latest_partition(cls, table_name, schema, database, qry, columns=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be identical with what's in BaseEngineSpec
: https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L565-L572
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed
superset/db_engine_specs/hive.py
Outdated
if c.get("name") == col_name: | ||
qry = qry.where(Column(col_name) == value) | ||
|
||
return qry | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup, that got changed in the base engine spec but not updated here in your PR i think
superset/db_engine_specs/hive.py
Outdated
@@ -343,7 +346,7 @@ def select_star( | |||
latest_partition: bool = True, | |||
cols: Optional[List[Dict[str, Any]]] = None, | |||
) -> str: | |||
return BaseEngineSpec.select_star( | |||
return super(PrestoEngineSpec, cls).select_star( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As HiveEngineSpec
already extends PrestoEngineSpec
, this wouldn't really be needed. It is my understanding, that the reason for calling BaseEngineSpec
here is because we specifically don't want to use the default behaviour of PrestoEngineSpect
, but rather what's defined in BaseEngineSpec
. Not being familiar with how Hive/Presto works in this case someone with deeper knowledge of their respective intricacies would probably need to check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you're right that we want to call the BaseEngineSpec
version of select_star
, but that function calls where_latest_partition
which we need from the HiveEngineSpec
. the Presto version of select_star
doesn't do much more than call super().select_star
, but I think updating it to super(BaseEngineSpec, cls)
is probably the right move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this was right before. using super(PrestoEngineSpec, cls)
calls HiveEngineSpec
's grandparent's (BaseEngineSpec
) version of select_star
. Then, by passing in cls
afterwards, we go back to running Hive's version of functions after the base class's select_star
gets called
"""Hive partitions look like ds={partition name}""" | ||
if not df.empty: | ||
return df.ix[:, 0].max().split("=")[1] | ||
return [df.ix[:, 0].max().split("=")[1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return None
here to make sure there's an explicit return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
986b275
to
5db953c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments addressed
superset/db_engine_specs/hive.py
Outdated
if c.get("name") == col_name: | ||
qry = qry.where(Column(col_name) == value) | ||
|
||
return qry | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yup, that got changed in the base engine spec but not updated here in your PR i think
superset/db_engine_specs/hive.py
Outdated
@@ -299,16 +299,19 @@ def get_columns( | |||
@classmethod | |||
def where_latest_partition(cls, table_name, schema, database, qry, columns=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, fixed
"""Hive partitions look like ds={partition name}""" | ||
if not df.empty: | ||
return df.ix[:, 0].max().split("=")[1] | ||
return [df.ix[:, 0].max().split("=")[1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
superset/db_engine_specs/hive.py
Outdated
@@ -343,7 +346,7 @@ def select_star( | |||
latest_partition: bool = True, | |||
cols: Optional[List[Dict[str, Any]]] = None, | |||
) -> str: | |||
return BaseEngineSpec.select_star( | |||
return super(PrestoEngineSpec, cls).select_star( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you're right that we want to call the BaseEngineSpec
version of select_star
, but that function calls where_latest_partition
which we need from the HiveEngineSpec
. the Presto version of select_star
doesn't do much more than call super().select_star
, but I think updating it to super(BaseEngineSpec, cls)
is probably the right move.
2b6077b
to
c2feb0a
Compare
c2feb0a
to
aa6630b
Compare
Codecov Report
@@ Coverage Diff @@
## master #8098 +/- ##
==========================================
- Coverage 65.93% 65.92% -0.01%
==========================================
Files 485 485
Lines 22890 22895 +5
Branches 2521 2521
==========================================
+ Hits 15092 15094 +2
- Misses 7667 7670 +3
Partials 131 131
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this; I'm bewildered by why the linter complains about the super
call, but LGTM!
CATEGORY
Choose one
SUMMARY
When the Presto db engine spec was changed to support multiple partition columns, it broke Hive because it extends Presto. Table previews and copy select star broke because they wouldn't add partition filters. This PR updates the functions that the Hive spec extends to return and accept the same data types
TEST PLAN
ADDITIONAL INFORMATION
REVIEWERS
@michellethomas @villebro @betodealmeida @serenajiang