Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update our approach for executor-bound dependencies #22573

Merged
merged 1 commit into from
Mar 29, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Mar 28, 2022

Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of cncf.kubernetes provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

  • for Celery and K8S (and Dask but this is rather an afterhought)
    we do not strip-off the dependencies from the extra (so for
    example [cncf.kubernetes] extra will have dependencies on
    both 'apache-airflow-providers-cncf-kubernetes' as well as
    directly on kubernetes library

  • We add upper-bound limits for both Celery and Kubernetes to prevent
    from accidental upgrades. Both Celery and Kubernetes Python library
    follow SemVer, and they are crucial components of Airlfow so they
    both squarely fit our "do not upper-bound" exceptions.

  • We also add a rule that whenever dependency upper-bound limit is
    raised, we should also make sure that additional testing is done
    and appropriate apache-airflow lower-bound limit is added for
    the apache-airflow-providers-cncf-kubernetes and
    apache-airflow-providers-celery providers.

As part of this change we also had to fix two issues:

  • the image was needlesly rebuilt during constraint generation as
    we already have the image and we even warn that it should
    be built before we run constraint generation

  • after this change, the currently released, unyanked cncf.kubernetes
    provider cannot be installed with airflow, because it has
    conflicting requirements for kubernetes library (provider has
    <11 and airflow has > 22.7). Therefore during constraint
    generation with PyPI providers we install providers from PyPI, we
    explicitly install the yanked 3.1.2 version. This should be
    removed after we release the next K8S provider version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: #22560, #21727


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
airflow/providers/celery/provider.yaml Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the update-dependencies-approach branch from 852f6c9 to cc98ee9 Compare March 28, 2022 20:03
@potiuk
Copy link
Member Author

potiuk commented Mar 28, 2022

Should be all fixed. I also updated the 2.1 compatibility tests to remove celery and K8S

setup.py Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the update-dependencies-approach branch from cc98ee9 to 2d838c8 Compare March 29, 2022 09:40
@potiuk potiuk requested a review from mik-laj as a code owner March 29, 2022 09:40
@potiuk potiuk force-pushed the update-dependencies-approach branch 2 times, most recently from cafbb72 to e05861f Compare March 29, 2022 11:55
@@ -28,6 +28,4 @@ shift

build_images::prepare_ci_build

build_images::rebuild_ci_image_if_needed_with_group
Copy link
Member Author

@potiuk potiuk Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed (it just makes constraint generation longer - we already have the images pulled at this stage)

      - name: >
          Wait for CI images
          ${{ needs.build-info.outputs.pythonVersions }}:${{ env.GITHUB_REGISTRY_PULL_IMAGE_TAG }}
        run: ./scripts/ci/images/ci_wait_for_and_verify_all_ci_images.sh

@potiuk potiuk force-pushed the update-dependencies-approach branch from e05861f to ad6a527 Compare March 29, 2022 12:17
@potiuk
Copy link
Member Author

potiuk commented Mar 29, 2022

All green!

.github/workflows/ci.yml Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the update-dependencies-approach branch 4 times, most recently from 69b6081 to ec77fc2 Compare March 29, 2022 14:50
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

As part of this change we also had to fix two issues:
* the image was needlesly rebuilt during constraint generation as
  we already have the image and we even warn that it should
  be built before we run constraint generation

* after this change, the currently released, unyanked cncf.kubernetes
  provider cannot be installed with airflow, because it has
  conflicting requirements for kubernetes library (provider has
  <11 and airflow has > 22.7). Therefore during constraint
  generation with PyPI providers we install providers from PyPI, we
  explicitly install the yanked 3.1.2 version. This should be
  removed after we release the next K8S provider version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: apache#22560, apache#21727
@potiuk potiuk force-pushed the update-dependencies-approach branch from ec77fc2 to 51e0a2d Compare March 29, 2022 14:51
Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, otherwise looks good!


In the constraint mechanism we save both - provider versions and it's dependencies
version, which means that installation using constraints is repeatable.

For K8s, Celery which are both "Core executors" and "Providers" we have to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For K8s, Celery which are both "Core executors" and "Providers" we have to
For K8s and Celery which are both "Core executors" and "Providers" we have to


In the constraint mechanism we save both - provider versions and it's dependencies
version, which means that installation using constraints is repeatable.

For K8s, Celery which are both "Core executors" and "Providers" we have to
add the base dependencies to the core as well - in order to mitigate problems where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add the base dependencies to the core as well - in order to mitigate problems where
add the base dependencies to core as well, in order to mitigate problems where

newer version of provider will have less strict limits. This should be done for both
extras and their deprecated aliases. This is not a full protection however, the way
extras work, this will not add "hard" limits for Airflow and the user who does not use
constraints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constraints
constraints.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Mar 29, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk potiuk merged commit 6db30f3 into apache:main Mar 29, 2022
@potiuk potiuk deleted the update-dependencies-approach branch March 29, 2022 18:26
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers full tests needed We need to run full set of tests for this PR to merge provider:cncf-kubernetes Kubernetes provider related issues type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants