Skip to content

[AIRFLOW-4879] add poll_interval and schema to PrestoHook#5515

Closed
eladkal wants to merge 4 commits intoapache:masterfrom
eladkal:4879
Closed

[AIRFLOW-4879] add poll_interval and schema to PrestoHook#5515
eladkal wants to merge 4 commits intoapache:masterfrom
eladkal:4879

Conversation

@eladkal
Copy link
Contributor

@eladkal eladkal commented Jul 2, 2019

Make sure you have checked all steps below.

Jira

Description

Added support for poll_interval and schema
Added docs to the hook

Future work: password is deprecated https://github.com/dropbox/PyHive/blob/master/pyhive/presto.py#L100
Currently Airflow bypass this by passing password to requests_kwargs.
We should stop doing that and align the hook with PyHive.
Since this is a breaking change I'll do that in a separated PR later to include this in Airflow 2.0

Tests

Added tests for connection and improved existing tests (some code was taken from another unmerged PR #2206
This is the first time that I'm writing tests so go easy on me :)

@eladkal eladkal changed the title [AIRFLOW-4879] - add poll_interval and schema to PrestoHook [AIRFLOW-4879] add poll_interval and schema to PrestoHook Jul 14, 2019
@mik-laj
Copy link
Member

mik-laj commented Jul 17, 2019

Travis is sad. Can you look at this?

49) ERROR: test_get_conn_port (tests.hooks.test_presto_hook.TestPrestoHookConn)
----------------------------------------------------------------------
   Traceback (most recent call last):
    /usr/lib/python3.5/unittest/mock.py line 1149 in patched
      arg = patching.__enter__()
    /usr/lib/python3.5/unittest/mock.py line 1205 in __enter__
      self.target = self.getter()
    /usr/lib/python3.5/unittest/mock.py line 1375 in <lambda>
      getter = lambda: _importer(target)
    /usr/lib/python3.5/unittest/mock.py line 1062 in _importer
      thing = _dot_lookup(thing, comp, import_path)
    /usr/lib/python3.5/unittest/mock.py line 1051 in _dot_lookup
      __import__(import_path)
   ImportError: No module named 'airflow.hooks.presto_hook.pyhive'; 'airflow.hooks.presto_hook' is not a package
======================================================================
48) ERROR: test_get_conn (tests.hooks.test_presto_hook.TestPrestoHookConn)
----------------------------------------------------------------------
   Traceback (most recent call last):
    /usr/lib/python3.5/unittest/mock.py line 1149 in patched
      arg = patching.__enter__()
    /usr/lib/python3.5/unittest/mock.py line 1205 in __enter__
      self.target = self.getter()
    /usr/lib/python3.5/unittest/mock.py line 1375 in <lambda>
      getter = lambda: _importer(target)
    /usr/lib/python3.5/unittest/mock.py line 1062 in _importer
      thing = _dot_lookup(thing, comp, import_path)
    /usr/lib/python3.5/unittest/mock.py line 1051 in _dot_lookup
      __import__(import_path)
   ImportError: No module named 'airflow.hooks.presto_hook.pyhive'; 'airflow.hooks.presto_hook' is not a package
======================================================================

@eladkal
Copy link
Contributor Author

eladkal commented Jul 18, 2019

@mik-Iaj I'm aware yet I'm not sure how to fix this.
The test is very similar to the postgres test https://github.com/apache/airflow/blob/master/tests/hooks/test_postgres_hook.py#L46

can you give some pointers to why
@mock.patch('airflow.hooks.postgres_hook.psycopg2.connect')
works but
@mock.patch('airflow.hooks.presto_hook.pyhive.presto.connect')
fails on No module named 'airflow.hooks.presto_hook.pyhive';

Adding poll_interval and schema to PrestoHook.
poll_interval  - int - how often to ask the Presto REST interface for a progress update, defaults to a second
schema -  string - defaults to ``default`
https://github.com/dropbox/PyHive/blob/master/pyhive/presto.py#L75

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.poll_interval = kwargs.pop("poll_interval")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you need to use *args and **kwargs here? It seems to me that normal parameters will also work here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This can be set from the Connection, however if you want to overwrite the defaults in the constructor you must pull it from kwargs. This is how it's done in other db hooks:
https://github.com/apache/airflow/blob/master/airflow/hooks/mysql_hook.py#L48
https://github.com/apache/airflow/blob/master/airflow/hooks/mssql_hook.py#L36

Would you be more comfortable with specifying the parameters like:
https://github.com/apache/airflow/blob/master/airflow/contrib/hooks/sqoop_hook.py#L57

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mik-laj let me know what do you think so I'll continue with the approach which will be accepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a list of parameters because it provides better IDE support. args kwargs should only be used if another solution cannot be used. I would be happy if you would add type annotations too.

@eladkal
Copy link
Contributor Author

eladkal commented Nov 6, 2019

Issues with rebasing.
Will open new PR

@eladkal eladkal closed this Nov 6, 2019
@eladkal eladkal deleted the 4879 branch November 6, 2019 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants