Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update our approach for executor-bound dependencies #22573

Merged
merged 1 commit into from
Mar 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -781,6 +781,8 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
env:
USE_AIRFLOW_VERSION: "wheel"
PACKAGE_FORMAT: "wheel"
- name: "Replace non-compliant providers with their 2.1-compliant versions"
run: ./scripts/ci/provider_packages/ci_make_providers_2_1_compliant.sh
- name: "Install and test provider packages and airflow on Airflow 2.1 files"
run: ./scripts/ci/provider_packages/ci_install_and_test_provider_packages.sh
env:
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,14 @@ The important dependencies are:
are very likely to introduce breaking changes across those so limiting it to MAJOR version makes sense
* `werkzeug`: the library is known to cause problems in new versions. It is tightly coupled with Flask
libraries, and we should update them together
* `celery`: Celery is crucial component of Airflow as it used for CeleryExecutor (and similar). Celery
[follows SemVer](https://docs.celeryq.dev/en/stable/contributing.html?highlight=semver#versions), so
we should upper-bound it to the next MAJOR version. Also when we bump the upper version of the library,
we should make sure Celery Provider minimum Airflow version is updated).
* `kubernetes`: Kubernetes is a crucial component of Airflow as it is used for the KubernetesExecutor
(and similar). Kubernetes Python library [follows SemVer](https://github.com/kubernetes-client/python#compatibility),
so we should upper-bound it to the next MAJOR version. Also when we bump the upper version of the library,
we should make sure Kubernetes Provider minimum Airflow version is updated.

### Approach for dependencies in Airflow Providers and extras

Expand Down
3 changes: 1 addition & 2 deletions airflow/providers/celery/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ versions:
- 1.0.0

additional-dependencies:
- apache-airflow>=2.1.0
- celery~=5.1,>=5.1.2
- apache-airflow>=2.2.0

integrations:
- integration-name: Celery
Expand Down
2 changes: 1 addition & 1 deletion airflow/providers/cncf/kubernetes/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ versions:
- 1.0.0

additional-dependencies:
- apache-airflow>=2.1.0
- apache-airflow>=2.3.0

integrations:
- integration-name: Kubernetes
Expand Down
2 changes: 0 additions & 2 deletions scripts/ci/constraints/ci_generate_constraints.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,4 @@ shift

build_images::prepare_ci_build

build_images::rebuild_ci_image_if_needed_with_group
Copy link
Member Author

@potiuk potiuk Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed (it just makes constraint generation longer - we already have the images pulled at this stage)

      - name: >
          Wait for CI images
          ${{ needs.build-info.outputs.pythonVersions }}:${{ env.GITHUB_REGISTRY_PULL_IMAGE_TAG }}
        run: ./scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh


runs::run_generate_constraints
27 changes: 27 additions & 0 deletions scripts/ci/provider_packages/ci_make_providers_2_1_compliant.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# shellcheck source=scripts/ci/libraries/_script_init.sh
. "$( dirname "${BASH_SOURCE[0]}" )/../libraries/_script_init.sh"

# Some of our provider sources are not Airflow 2.1 compliant any more
# We replace them with 2.1 compliant versions from PyPI to run the checks

cd "${AIRFLOW_SOURCES}" || exit 1
rm -rvf dist/apache_airflow_providers_cncf_kubernetes* dist/apache_airflow_providers_celery*
pip download --no-deps --dest dist apache-airflow-providers-cncf-kubernetes==3.0.0 \
apache-airflow-providers-celery==2.1.3
4 changes: 3 additions & 1 deletion scripts/in_container/_in_container_utils.sh
Original file line number Diff line number Diff line change
Expand Up @@ -280,8 +280,10 @@ function install_all_providers_from_pypi_with_eager_upgrade() {
# Installing it with Airflow makes sure that the version of package that matches current
# Airflow requirements will be used.
# shellcheck disable=SC2086
# NOTE! Until we unyank the cncf.kubernetes provider, we explicitly install yanked 3.1.2 version
# TODO:(potiuk) REMOVE IT WHEN provider is released
pip install -e ".[${NO_PROVIDERS_EXTRAS}]" "${packages_to_install[@]}" ${EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS} \
--upgrade --upgrade-strategy eager
--upgrade --upgrade-strategy eager apache-airflow-providers-cncf-kubernetes==3.1.2

}

Expand Down
43 changes: 36 additions & 7 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,16 @@ def write_version(filename: str = os.path.join(*[my_dir, "airflow", "git_version
'cassandra-driver>=3.13.0',
]
celery = [
'celery>=5.2.3',
# The Celery is known to introduce problems when upgraded to a MAJOR version. Airflow Core
# Uses Celery for CeleryExecutor, and we also know that Kubernetes Python client follows SemVer
# (https://docs.celeryq.dev/en/stable/contributing.html?highlight=semver#versions).
# This is a crucial component of Airflow, so we should limit it to the next MAJOR version and only
# deliberately bump the version when we tested it, and we know it can be bumped.
# Bumping this version should also be connected with
# limiting minimum airflow version supported in cncf.kubernetes provider, due to the
# potential breaking changes in Airflow Core as well (celery is added as extra, so Airflow
# core is not hard-limited via install-requirements, only by extra).
'celery>=5.2.3,<6',
'flower>=1.0.0',
]
cgroups = [ # type:ignore
Expand Down Expand Up @@ -419,7 +428,15 @@ def write_version(filename: str = os.path.join(*[my_dir, "airflow", "git_version
]
kubernetes = [
'cryptography>=2.0.0',
'kubernetes>=21.7.0',
# The Kubernetes API is known to introduce problems when upgraded to a MAJOR version. Airflow Core
# Uses Kubernetes for Kubernetes executor, and we also know that Kubernetes Python client follows SemVer
# (https://github.com/kubernetes-client/python#compatibility). This is a crucial component of Airflow
# So we should limit it to the next MAJOR version and only deliberately bump the version when we
# tested it, and we know it can be bumped. Bumping this version should also be connected with
# limiting minimum airflow version supported in cncf.kubernetes provider, due to the
# potential breaking changes in Airflow Core as well (kubernetes is added as extra, so Airflow
# core is not hard-limited via install-requirements, only by extra).
'kubernetes>=21.7.0,<24',
]
kylin = ['kylinpy>=2.6']
ldap = [
Expand Down Expand Up @@ -745,7 +762,7 @@ def write_version(filename: str = os.path.join(*[my_dir, "airflow", "git_version
# To airflow core. They do not have separate providers because they do not have any operators/hooks etc.
CORE_EXTRAS_REQUIREMENTS: Dict[str, List[str]] = {
'async': async_packages,
'celery': celery, # also has provider, but it extends the core with the Celery executor
'celery': celery, # also has provider, but it extends the core with the CeleryExecutor
'cgroups': cgroups,
'cncf.kubernetes': kubernetes, # also has provider, but it extends the core with the KubernetesExecutor
'dask': dask,
Expand Down Expand Up @@ -1033,17 +1050,29 @@ def replace_extra_requirement_with_provider_packages(extra: str, providers: List
['simple-salesforce>=1.0.0', 'tableauserverclient']

So transitively 'salesforce' extra has all the requirements it needs and in case the provider
changes it's dependencies, they will transitively change as well.
changes its dependencies, they will transitively change as well.

In the constraint mechanism we save both - provider versions and it's dependencies
version, which means that installation using constraints is repeatable.

For K8s, Celery which are both "Core executors" and "Providers" we have to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For K8s, Celery which are both "Core executors" and "Providers" we have to
For K8s and Celery which are both "Core executors" and "Providers" we have to

add the base dependencies to the core as well - in order to mitigate problems where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add the base dependencies to the core as well - in order to mitigate problems where
add the base dependencies to core as well, in order to mitigate problems where

newer version of provider will have less strict limits. This should be done for both
extras and their deprecated aliases. This is not a full protection however, the way
extras work, this will not add "hard" limits for Airflow and the user who does not use
constraints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constraints
constraints.


:param extra: Name of the extra to add providers to
:param providers: list of provider ids
"""
EXTRAS_REQUIREMENTS[extra] = [
get_provider_package_from_package_id(package_name) for package_name in providers
]
if extra in ['cncf.kubernetes', 'kubernetes', 'celery']:
potiuk marked this conversation as resolved.
Show resolved Hide resolved
EXTRAS_REQUIREMENTS[extra].extend(
[get_provider_package_from_package_id(package_name) for package_name in providers]
)
else:
EXTRAS_REQUIREMENTS[extra] = [
get_provider_package_from_package_id(package_name) for package_name in providers
]


def add_provider_packages_to_extra_requirements(extra: str, providers: List[str]) -> None:
Expand Down