Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop_partition functionality for HiveMetastoreHook #9472

Merged
merged 31 commits into from
Jul 20, 2020

Conversation

vanka56
Copy link
Contributor

@vanka56 vanka56 commented Jun 22, 2020

Adding drop partition method for HiveMetastoreHook. This becomes handy for operators involving Hive operations. Added a a unit test case as well.

Make sure to mark the boxes below before creating PR: [x]

  • [ x] Description above provides context of the change
  • [ x] Unit tests coverage for changes (not needed for documentation changes)
  • [ x] Target Github ISSUE in description if exists
  • [ x] Commits follow "How to write a good git commit message"
  • [ x] Relevant documentation is updated including usage instructions.
  • [ x] I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

airflow/providers/apache/hive/hooks/hive.py Outdated Show resolved Hide resolved
@jhtimmins
Copy link
Contributor

Looks like this is failing static checks. I recommend running pre-commit checks

addressing Ace's review comments
@Acehaidrey
Copy link
Contributor

@vanka56 can you rebase and try the static tests again

@vanka56
Copy link
Contributor Author

vanka56 commented Jun 26, 2020

@jhtimmins i tried that. its still failing. do you know why

@vanka56
Copy link
Contributor Author

vanka56 commented Jun 26, 2020

@ashb Why are the checks always failing?

@Acehaidrey
Copy link
Contributor

@vanka56 loook at the error logs

There were some pylint errors. Exiting

###########################################################################################
                   EXITING /opt/airflow/scripts/ci/in_container/run_pylint_tests.sh WITH STATUS CODE 1
###########################################################################################
************* Module tests.providers.apache.hive.hooks.test_hive
tests/providers/apache/hive/hooks/test_hive.py:561:0: C0301: Line too long (115/110) (line-too-long)
tests/providers/apache/hive/hooks/test_hive.py:561:8: W1503: Redundant use of assertTrue with constant value True (redundant-unittest-assert)

fix build issues
Fixing build issues
Lazy formatting for logging
@vanka56
Copy link
Contributor Author

vanka56 commented Jun 29, 2020

@Acehaidrey Thank you! I corrected those errors now.

@Acehaidrey
Copy link
Contributor

Acehaidrey commented Jun 29, 2020

@vanka56

airflow/providers/apache/hive/hooks/hive.py:793:26: W1202: Use lazy % formatting in logging functions (logging-format-interpolation)

still

Lazy logging
@Acehaidrey
Copy link
Contributor

@vanka56

Trim Trailing Whitespace..........................................................................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

can you fix that

@vanka56
Copy link
Contributor Author

vanka56 commented Jul 1, 2020

@jhtimmins @turbaszek all static checks have passed now. is it good now?

def test_drop_partition(self):
self.assertTrue(self.hook.drop_partitions(self.table, db=self.database,
part_vals=[DEFAULT_DATE_DS]))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this tests requires some external service? Does it create side effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbaszek Yes. it uses Hivemetastore Thrift client. it does the partition from the test table set up for unit testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can mock the client? In this way we will reduce the side effects and the test will not require any external service

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbaszek Methods like def test_max_partition(self) also does the same. Moreover, it should not be an expensive operation. Do you think we really have to mock the call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side effects in tests are not a good practise and Airflow already has number of flaky tests. So I would be in favor of using mocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair. Let me get this changed :)

Comment on lines 560 to 564
@mock.patch('airflow.providers.apache.hive.hooks.hive.HiveMetastoreHook.drop_partitions')
def test_drop_partition(self, thrift_mock):
self.hook.drop_partitions(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS])
thrift_mock.assert_called_once_with(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now check if drop_partitions was called. And it was a line above the assertion :)
To use mock we should mock the metastore client, because we believe it was tested and works as expected.So I would suggest something like this:

 @mock.patch('airflow.providers.apache.hive.hooks.hive.HiveMetastoreHook.table_exists')
 @mock.patch('airflow.providers.apache.hive.hooks.hive.HiveMetastoreHook.metastore')
      def test_drop_partition(self, metastore_mock, table_exist_mock):
          # Here we mock behaviour of `with self.metastore as client`
          client_drop_partition = metastore_mock.__enter__.return_value 

          # Here we want to be sure that we enter the right place of if clause
          table_exist_mock.return_value = True

          # Here we call the method
          self.hook.drop_partitions(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS])

          # First lets check if we check if table exists
          table_exist_mock.assert_called_once_with(self.table, self.database)

          # And now we check if the underlying client.drop_partition method was called
          client_drop_partition.assert_called_once_with(self.table, db=self.database, part_vals=[DEFAULT_DATE_DS])

The comments can be skipped, I'm also not sure if the __enter__ mock is 100% right (something like this for sure). In case of any questions I'm happy to help

Copy link
Contributor Author

@vanka56 vanka56 Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbaszek Lol :). Made further changed as per your advice. Let me know what do you think

@turbaszek turbaszek merged commit 5013fda into apache:master Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants