Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hive Engine Spec] Fix latest partition logic #8098

Merged

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Aug 23, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

When the Presto db engine spec was changed to support multiple partition columns, it broke Hive because it extends Presto. Table previews and copy select star broke because they wouldn't add partition filters. This PR updates the functions that the Hive spec extends to return and accept the same data types

TEST PLAN

  • Test copy select statement and ensure it has a partition
  • Test table preview
  • Test jinja latest partition function

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@michellethomas @villebro @betodealmeida @serenajiang

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

@@ -299,16 +299,19 @@ def get_columns(
@classmethod
def where_latest_partition(cls, table_name, schema, database, qry, columns=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

if c.get("name") == col_name:
qry = qry.where(Column(col_name) == value)

return qry
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yup, that got changed in the base engine spec but not updated here in your PR i think

@@ -343,7 +346,7 @@ def select_star(
latest_partition: bool = True,
cols: Optional[List[Dict[str, Any]]] = None,
) -> str:
return BaseEngineSpec.select_star(
return super(PrestoEngineSpec, cls).select_star(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As HiveEngineSpec already extends PrestoEngineSpec, this wouldn't really be needed. It is my understanding, that the reason for calling BaseEngineSpec here is because we specifically don't want to use the default behaviour of PrestoEngineSpect, but rather what's defined in BaseEngineSpec. Not being familiar with how Hive/Presto works in this case someone with deeper knowledge of their respective intricacies would probably need to check this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're right that we want to call the BaseEngineSpec version of select_star, but that function calls where_latest_partition which we need from the HiveEngineSpec. the Presto version of select_star doesn't do much more than call super().select_star, but I think updating it to super(BaseEngineSpec, cls) is probably the right move.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this was right before. using super(PrestoEngineSpec, cls) calls HiveEngineSpec's grandparent's (BaseEngineSpec) version of select_star. Then, by passing in cls afterwards, we go back to running Hive's version of functions after the base class's select_star gets called

"""Hive partitions look like ds={partition name}"""
if not df.empty:
return df.ix[:, 0].max().split("=")[1]
return [df.ix[:, 0].max().split("=")[1]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return None here to make sure there's an explicit return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@etr2460 etr2460 force-pushed the erik-ritter--fix-hive-latest-partition branch from 986b275 to 5db953c Compare August 23, 2019 17:02
Copy link
Member Author

@etr2460 etr2460 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments addressed

if c.get("name") == col_name:
qry = qry.where(Column(col_name) == value)

return qry
return False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yup, that got changed in the base engine spec but not updated here in your PR i think

@@ -299,16 +299,19 @@ def get_columns(
@classmethod
def where_latest_partition(cls, table_name, schema, database, qry, columns=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

"""Hive partitions look like ds={partition name}"""
if not df.empty:
return df.ix[:, 0].max().split("=")[1]
return [df.ix[:, 0].max().split("=")[1]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -343,7 +346,7 @@ def select_star(
latest_partition: bool = True,
cols: Optional[List[Dict[str, Any]]] = None,
) -> str:
return BaseEngineSpec.select_star(
return super(PrestoEngineSpec, cls).select_star(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're right that we want to call the BaseEngineSpec version of select_star, but that function calls where_latest_partition which we need from the HiveEngineSpec. the Presto version of select_star doesn't do much more than call super().select_star, but I think updating it to super(BaseEngineSpec, cls) is probably the right move.

@etr2460 etr2460 force-pushed the erik-ritter--fix-hive-latest-partition branch 2 times, most recently from 2b6077b to c2feb0a Compare August 23, 2019 17:32
@etr2460 etr2460 force-pushed the erik-ritter--fix-hive-latest-partition branch from c2feb0a to aa6630b Compare August 23, 2019 19:41
@codecov-io
Copy link

Codecov Report

Merging #8098 into master will decrease coverage by <.01%.
The diff coverage is 22.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8098      +/-   ##
==========================================
- Coverage   65.93%   65.92%   -0.01%     
==========================================
  Files         485      485              
  Lines       22890    22895       +5     
  Branches     2521     2521              
==========================================
+ Hits        15092    15094       +2     
- Misses       7667     7670       +3     
  Partials      131      131
Impacted Files Coverage Δ
superset/db_engine_specs/hive.py 40.55% <20%> (-0.3%) ⬇️
superset/db_engine_specs/presto.py 77.98% <33.33%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 610b35a...aa6630b. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this; I'm bewildered by why the linter complains about the super call, but LGTM!

@michellethomas michellethomas merged commit 5d8da6a into apache:master Aug 26, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.35.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 0.35.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants