Skip to content

pyhive is installed without Hive dependencies #8933

@snazzyfox

Description

@snazzyfox

Apache Airflow version: 1.10.10

(appears to also affect master)

What happened:

When airflow is installed with Hive support using apache-airflow[hive], using HiveServer2Hook to run a query throws the following exception:

...
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/hive_hooks.py", line 828, in get_conn
    database=schema or db.schema or 'default')
  File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 94, in connect
    return Connection(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pyhive/hive.py", line 152, in __init__
    import sasl
ModuleNotFoundError: No module named 'sasl'

What you expected to happen:

The error should not appear.

How to reproduce it:

Any minimal dag that uses HiveServer2Hook generates the error. Connection to a working Hive cluster is not required since the required dependencies are not installed

Probable Reason:

For pyhive to work with Hive, it should be installed as pyhive[hive]. The hive extra brings in the sasl package.

This is not caught testing since tests run with all the dependencies, and the packages required by pyhive[hive] happen to be also used for kerberos.

I know this will be a much bigger conversation, but maybe it's worth it to consider testing operators only with the dependencies they're supposed to rely on?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugThis is a clearly a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions