-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add drop_partition functionality for HiveMetastoreHook #9472
Conversation
Looks like this is failing static checks. I recommend running pre-commit checks |
addressing Ace's review comments
@vanka56 can you rebase and try the static tests again |
@jhtimmins i tried that. its still failing. do you know why |
@ashb Why are the checks always failing? |
@vanka56 loook at the error logs
|
fix build issues
Fixing build issues
Lazy formatting for logging
@Acehaidrey Thank you! I corrected those errors now. |
still |
Lazy logging
can you fix that |
trailing white-space
@jhtimmins @turbaszek all static checks have passed now. is it good now? |
def test_drop_partition(self): | ||
self.assertTrue(self.hook.drop_partitions(self.table, db=self.database, | ||
part_vals=[DEFAULT_DATE_DS])) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this tests requires some external service? Does it create side effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turbaszek Yes. it uses Hivemetastore Thrift client. it does the partition from the test table set up for unit testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can mock the client? In this way we will reduce the side effects and the test will not require any external service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turbaszek Methods like def test_max_partition(self) also does the same. Moreover, it should not be an expensive operation. Do you think we really have to mock the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side effects in tests are not a good practise and Airflow already has number of flaky tests. So I would be in favor of using mocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fair. Let me get this changed :)
addressing review comments
mocking thrift drop_partition
flake issues
removing white space
lint issues
corrected typos
assert_called_once
mock object path change
@mock.patch('airflow.providers.apache.hive.hooks.hive.HiveMetastoreHook.drop_partitions') | ||
def test_drop_partition(self, thrift_mock): | ||
self.hook.drop_partitions(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS]) | ||
thrift_mock.assert_called_once_with(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test now check if drop_partitions
was called. And it was a line above the assertion :)
To use mock we should mock the metastore client, because we believe it was tested and works as expected.So I would suggest something like this:
@mock.patch('airflow.providers.apache.hive.hooks.hive.HiveMetastoreHook.table_exists')
@mock.patch('airflow.providers.apache.hive.hooks.hive.HiveMetastoreHook.metastore')
def test_drop_partition(self, metastore_mock, table_exist_mock):
# Here we mock behaviour of `with self.metastore as client`
client_drop_partition = metastore_mock.__enter__.return_value
# Here we want to be sure that we enter the right place of if clause
table_exist_mock.return_value = True
# Here we call the method
self.hook.drop_partitions(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS])
# First lets check if we check if table exists
table_exist_mock.assert_called_once_with(self.table, self.database)
# And now we check if the underlying client.drop_partition method was called
client_drop_partition.assert_called_once_with(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS])
The comments can be skipped, I'm also not sure if the __enter__
mock is 100% right (something like this for sure). In case of any questions I'm happy to help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@turbaszek Lol :). Made further changed as per your advice. Let me know what do you think
addressed review comments
lints
Adding drop partition method for HiveMetastoreHook. This becomes handy for operators involving Hive operations. Added a a unit test case as well.
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.