Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove thrift as a core dependency #13471

Merged
merged 1 commit into from Jan 7, 2021
Merged

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Jan 4, 2021

thrift is a dependency for Apache Hive and it is not required by Core Airflow:

airflow/providers/apache/hive/hooks/hive.py:489:        # This is for pickling to work despite the thrift hive client not
airflow/providers/apache/hive/hooks/hive.py:500:        """Returns a Hive thrift client."""
airflow/providers/apache/hive/hooks/hive.py:502:        from thrift.protocol import TBinaryProtocol
airflow/providers/apache/hive/hooks/hive.py:503:        from thrift.transport import TSocket, TTransport
airflow/providers/apache/hive/hooks/hive.py:531:            from thrift_sasl import TSaslClientTransport
airflow/providers/apache/hive/sensors/hive_partition.py:41:    :param metastore_conn_id: reference to the metastore thrift service
airflow/providers/apache/hive/sensors/metastore_partition.py:28:    queries generated by the Metastore thrift service when hitting
airflow/providers/apache/hive/sensors/named_hive_partition.py:36:    :param metastore_conn_id: reference to the metastore thrift service

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@kaxil kaxil requested review from ashb, potiuk and XD-DENG January 4, 2021 23:27
@kaxil
Copy link
Member Author

kaxil commented Jan 4, 2021

Test failure is unrelated -- Master is broken

@github-actions
Copy link

github-actions bot commented Jan 5, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

Copy link
Member

@XD-DENG XD-DENG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change itself is good to me.

One more point (not sure if we should cover together here): dependency thrift_sasl is under kerberos currently, but actually its only usage (imported) is in Hive Hook. Is a fix needed for that?

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 5, 2021
@github-actions
Copy link

github-actions bot commented Jan 5, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we remove (and only when you remove) anything from the list here, we should increase PIP_DEPENDENCIES_EPOCH_NUMBER approproately. I updated the outdated description in in #13409 and we should update increase the PIP_DEPENDENCIES_EPOCH_NUMBER number to point to both Dockerfile and Dockerfile.ci and to the right name. Let's merge #13409 first and then we can rebase on top of it, change the EPOCH and get it removed.

@ashb
Copy link
Member

ashb commented Jan 7, 2021

When we remove (and only when you remove) anything from the list here, we should increase PIP_DEPENDENCIES_EPOCH_NUMBER approproately. I updated the outdated description in in #13409 and we should update increase the PIP_DEPENDENCIES_EPOCH_NUMBER number to point to both Dockerfile and Dockerfile.ci and to the right name. Let's merge #13409 first and then we can rebase on top of it, change the EPOCH and get it removed.

Except in this case we haven't actually removed the dep, at least not what is installed via the extras ;)

`thrift` is a dependency for Apache Hive and it is not required by Core Airflow:

```
airflow/providers/apache/hive/hooks/hive.py:489:        # This is for pickling to work despite the thrift hive client not
airflow/providers/apache/hive/hooks/hive.py:500:        """Returns a Hive thrift client."""
airflow/providers/apache/hive/hooks/hive.py:502:        from thrift.protocol import TBinaryProtocol
airflow/providers/apache/hive/hooks/hive.py:503:        from thrift.transport import TSocket, TTransport
airflow/providers/apache/hive/hooks/hive.py:531:            from thrift_sasl import TSaslClientTransport
airflow/providers/apache/hive/sensors/hive_partition.py:41:    :param metastore_conn_id: reference to the metastore thrift service
airflow/providers/apache/hive/sensors/metastore_partition.py:28:    queries generated by the Metastore thrift service when hitting
airflow/providers/apache/hive/sensors/named_hive_partition.py:36:    :param metastore_conn_id: reference to the metastore thrift service
```
@potiuk
Copy link
Member

potiuk commented Jan 7, 2021

Except in this case we haven't actually removed the dep, at least not what is installed via the extras ;)

True. I just prefer to be cautious and do it always. The way how PIP treats install requires when doing eager dependency update (so after it gets merged) might work in unexpected ways - depending if thrift is pre-installed before or not.

But yeah, probably in this case there are no scenarios where it would hurt us.

@potiuk
Copy link
Member

potiuk commented Jan 7, 2021

It's OK for me. Just do it on your own risk ;)

@kaxil kaxil merged commit b5d921b into apache:master Jan 7, 2021
@kaxil kaxil deleted the move-thrift-req branch January 7, 2021 20:00
kaxil added a commit that referenced this pull request Jan 21, 2021
`thrift` is a dependency for Apache Hive and it is not required by Core Airflow:

```
airflow/providers/apache/hive/hooks/hive.py:489:        # This is for pickling to work despite the thrift hive client not
airflow/providers/apache/hive/hooks/hive.py:500:        """Returns a Hive thrift client."""
airflow/providers/apache/hive/hooks/hive.py:502:        from thrift.protocol import TBinaryProtocol
airflow/providers/apache/hive/hooks/hive.py:503:        from thrift.transport import TSocket, TTransport
airflow/providers/apache/hive/hooks/hive.py:531:            from thrift_sasl import TSaslClientTransport
airflow/providers/apache/hive/sensors/hive_partition.py:41:    :param metastore_conn_id: reference to the metastore thrift service
airflow/providers/apache/hive/sensors/metastore_partition.py:28:    queries generated by the Metastore thrift service when hitting
airflow/providers/apache/hive/sensors/named_hive_partition.py:36:    :param metastore_conn_id: reference to the metastore thrift service
```

(cherry picked from commit b5d921b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants