-
Notifications
You must be signed in to change notification settings - Fork 14.2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLAlchemy constraint for apache-airflow-snowflake-provider installation #17453
Comments
@wolfier - just a comment - while we can add it in snowflake (and @jedcunningham added it in #17452) for the future there is another way of installing the provider: You can install the provider with latest constraints from main : https://github.com/apache/airflow/blob/constraints-main/constraints-3.6.txt#L33 This is the most "certain" way of installing the most recently released providers. I will also think about adding some specific tags per-provider, so that you do not have to use |
Also I am a bit surprised that this happened. By default, when @wolfier -> could you please describe exactly what you have done and how you instlaled the newer version of snowflake provider? Did you have airflow already installed ? which version? what was the command you used to upgrade the snowflake provider? Which version of PIP you used? (or was it a different way of installing - with poetry or pip-tools maybe?) |
interestingly, I was unable to replicate with OSS Airflow. The installation process is little different for my environment so I'll look into that. Here is how I tested the OSS installation. I have a python base image.
Installation with the constraints for 2.1.0.
installed
Now, I will install
The version remained the same
Though for some reason it installed |
I think the difference comes down to what version of pip you have. pip 19.3.1 pulls down 1.3.1, while pip 21.1.2 and 21.2.3 pull down 1.2.5. |
I think the right answer here is "upgrade pip so the new resolver is used". @potiuk, I'm really curious about the advice to use the constraints from main when installing newer providers though. That seems like a recipe for trouble, especially as time goes on. I guess the new pip resolver makes that safer, but still. Do we have the "upgrade a provider to a newer version than what was released when your core airflow version was released" scenario documented anywhere? |
Yeah it's for more 'adventurous' ones :). It is not really 'documented' yet and It does not have the same 'guarantees' as the airflow constraints. But it might prevent some obvious problems like this one when we have a new 'breaking' release of dependency. And if some main changes happened since the last provider release, they might break. The way how it should be done (and I might actually implement it in our processes) is to to tag the 'main' constraints at the moment we release the providers (with separate tag per provider). That can be easily automated, and I might add it to our process. We can even track down historical constraints and re-tag the history for all provider releases. Should be easy. It has a few caveats and i think it is basically almost unsolvable problem - it will only fully work when you have the same set of provider versions as those at the moment of tagging. It is possible that there is a combination of providers that has conflicting dependencies in different versions and that provider a in version 1.0.0 will never work with provider b in version 2.0.0. So when you install provider a and you already have provider b, they might conflict. However, the last problem is somewhat solvable if we tag constraints with provider versions. Our constraints (the default ones - because we actually have three sets of constraints - PyPI providers, no-providers, source-providers) contain actually all the providers with versions. So if you have a conflicting provider (and once we bring the tags in) we should be able to do something like:
And it will not only install Google provider in the right version and all it's dependencies in the right version but it will also update the Amazon provider (and it's dependencies) to the version that was released at the same time as the Google 5.0.0 (thus - with guaranteed, not conflicting dependencies) |
Also just one point - conflicting dependencies are not and will never be as huge problem as you might think. I made sure of it when designing the whole system of constraints and automated upgrades. I thought and experimented a lot with that over the last few years and my experiences from last few years are rather positive here. The 'general' approach of almost all active direct dependencies we use is that they are updating dependencies pretty fast and by simply updating to latest versions of dependent packages solves most of the problems. We also advise all our users to update to latest versions of providers when they can (also it is kind of given when we release image and constraints we always use latest released versions of providers and dependencies that work). We are continuously bumping the constraints with 'eager' upgrade (they are updated after all tests pass). And we have mechanisms (which i utilize) to handle exceptions. If you look at Dockerfile.ci and Dockerfile - there are a few hard -limited dependencies in them that handle few cases that are otherwise not easy to handle. But there are just a few of those. Those are helping in making the automated upgrades work. That's why it is important to have constraints, ci, tests, Dockerfile, Dockerfile.ci together in one monorepo. We can only do that because all of those pieces are connected and they are helping each other - Dockerfile and Dockerfile.ci use constraints to build itself, the tests are using resulting images, we can run those images in eager upgrade mode later and re-run the tests, the constraints Are updated after the tests are succesful, they are pushed to the repo, and both docker images arerebuilt with those new upgraded constraints - it is all nicely connected and work in continuous circles of build-test-upgrade. Also the way we approach our dependencies in setup.py makes it quite difficult to get into conflict situation when you look closer - we rarely update minimum versions (mainly when we handle a CVE or incompatible change in implementing certain APIs). We also usually do not add upper-bounds for our dependencies - unless we know they are breaking something. On one hand it is risky (but our constraints and the fact that we only upgrade after successful tests mitigate the risk) but also handle the situations when even major upgrades of dependencies work without any fixes. Not everyone uses SemVer, we cannot rely on that, so we actually will never know if things are going to break. Constraints and making them essential while installing airflow solve the problem very nicely. This actually makes it pretty possible to keep it all working and conflicts are far and few between (and usually can be solved with proper constraints use). |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Apache Airflow version: 2.1.0 or really any Airflow versions
What happened:
When I try to install the snowflake provider, the version of SQLAlchemy also get upgraded. Due to the dependency of packages installed by the snowflake provider, more specifically the requirements of
snowflake-sqlalchemy
,SQLAlchemy
is forced to be upgraded.Without a version ceiling constraint, the snowflake provider will install the latest
snowflake-sqlalchemy
at1.3.1
.snowflake-sqlalchemy==1.3.1
will then require SQLAlchemy with the restraint <2.0.0, >=1.4.0.The current SQLAlchemy version that fits those restraints is
1.4.22
, which is what was installed.This upgrade caused some issues with the webserver startup, which generated this unhelpful error log.
What you expected to happen:
I expect to install a provider package without needing to worry about it breaking Airflow's dependency. Providers are suppose to be Airflow version agnostic post 2.0.
How to reproduce it:
Install any version of the snowflake provider package.
The text was updated successfully, but these errors were encountered: