-
Notifications
You must be signed in to change notification settings - Fork 16.7k
Description
Apache Airflow version: 1.10.10
(appears to also affect master)
What happened:
When airflow is installed with Hive support using apache-airflow[hive], using HiveServer2Hook to run a query throws the following exception:
...
File "/usr/local/lib/python3.7/site-packages/airflow/hooks/hive_hooks.py", line 828, in get_conn
database=schema or db.schema or 'default')
File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 94, in connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 152, in __init__
import sasl
ModuleNotFoundError: No module named 'sasl'
What you expected to happen:
The error should not appear.
How to reproduce it:
Any minimal dag that uses HiveServer2Hook generates the error. Connection to a working Hive cluster is not required since the required dependencies are not installed
Probable Reason:
For pyhive to work with Hive, it should be installed as pyhive[hive]. The hive extra brings in the sasl package.
This is not caught testing since tests run with all the dependencies, and the packages required by pyhive[hive] happen to be also used for kerberos.
I know this will be a much bigger conversation, but maybe it's worth it to consider testing operators only with the dependencies they're supposed to rely on?