Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy constraint for apache-airflow-snowflake-provider installation #17453

Closed
wolfier opened this issue Aug 5, 2021 · 7 comments
Closed
Labels
kind:bug This is a clearly a bug

Comments

@wolfier
Copy link
Contributor

wolfier commented Aug 5, 2021

Apache Airflow version: 2.1.0 or really any Airflow versions

What happened:

When I try to install the snowflake provider, the version of SQLAlchemy also get upgraded. Due to the dependency of packages installed by the snowflake provider, more specifically the requirements of snowflake-sqlalchemy, SQLAlchemy is forced to be upgraded.

Without a version ceiling constraint, the snowflake provider will install the latest snowflake-sqlalchemy at 1.3.1.

snowflake-sqlalchemy==1.3.1 will then require SQLAlchemy with the restraint <2.0.0, >=1.4.0.

The current SQLAlchemy version that fits those restraints is 1.4.22, which is what was installed.

astro@galactian-observatory-9366-scheduler-849c6db7b8-dwgq6:/usr/local/airflow$ pip list 
apache-airflow-providers-snowflake       2.0.0 
SQLAlchemy                               1.4.22 

This upgrade caused some issues with the webserver startup, which generated this unhelpful error log.

[2021-08-04 19:37:45,566] {abstract.py:229} ERROR - Failed to add operation for GET /api/v1/connections
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apis/abstract.py", line 209, in add_paths
    self.add_operation(path, method)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apis/abstract.py", line 173, in add_operation
    pass_context_arg_name=self.pass_context_arg_name
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/operations/__init__.py", line 8, in make_operation
    return spec.operation_cls.from_spec(spec, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/operations/openapi.py", line 138, in from_spec
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/operations/openapi.py", line 89, in __init__
    pass_context_arg_name=pass_context_arg_name
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/operations/abstract.py", line 96, in __init__
    self._resolution = resolver.resolve(self)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/resolver.py", line 40, in resolve
    return Resolution(self.resolve_function_from_operation_id(operation_id), operation_id)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/resolver.py", line 66, in resolve_function_from_operation_id
    raise ResolverError(str(e), sys.exc_info())
airflow._vendor.connexion.exceptions.ResolverError: <ResolverError: columns>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/usr/local/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 91, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/cli/commands/sync_perm_command.py", line 26, in sync_perm
    appbuilder = cached_app().appbuilder  # pylint: disable=no-member
  File "/usr/local/lib/python3.7/site-packages/airflow/www/app.py", line 146, in cached_app
    app = create_app(config=config, testing=testing)
  File "/usr/local/lib/python3.7/site-packages/airflow/www/app.py", line 130, in create_app
    init_api_connexion(flask_app)
  File "/usr/local/lib/python3.7/site-packages/airflow/www/extensions/init_views.py", line 186, in init_api_connexion
    specification='v1.yaml', base_path=base_path, validate_responses=True, strict_validation=True
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apps/flask_app.py", line 57, in add_api
    api = super(FlaskApp, self).add_api(specification, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apps/abstract.py", line 156, in add_api
    options=api_options.as_dict())
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apis/abstract.py", line 111, in __init__
    self.add_paths()
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apis/abstract.py", line 216, in add_paths
    self._handle_add_operation_error(path, method, err.exc_info)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/apis/abstract.py", line 231, in _handle_add_operation_error
    raise value.with_traceback(traceback)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/resolver.py", line 61, in resolve_function_from_operation_id
    return self.function_resolver(operation_id)
  File "/usr/local/lib/python3.7/site-packages/airflow/_vendor/connexion/utils.py", line 111, in get_function_from_name
    module = importlib.import_module(module_name)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.7/site-packages/airflow/api_connexion/endpoints/connection_endpoint.py", line 26, in <module>
    from airflow.api_connexion.schemas.connection_schema import (
  File "/usr/local/lib/python3.7/site-packages/airflow/api_connexion/schemas/connection_schema.py", line 42, in <module>
    class ConnectionSchema(ConnectionCollectionItemSchema):  # pylint: disable=too-many-ancestors
  File "/usr/local/lib/python3.7/site-packages/marshmallow/schema.py", line 117, in __new__
    dict_cls=dict_cls,
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 94, in get_declared_fields
    fields.update(mcs.get_auto_fields(fields, converter, opts, dict_cls))
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 108, in get_auto_fields
    for field_name, field in fields.items()
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 110, in <dictcomp>
    and field_name not in opts.exclude
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/schema/sqlalchemy_schema.py", line 28, in create_field
    return converter.field_for(model, column_name, **self.field_kwargs)
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/convert.py", line 171, in field_for
    return self.property2field(prop, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/convert.py", line 146, in property2field
    field_class = field_class or self._get_field_class_for_property(prop)
  File "/usr/local/lib/python3.7/site-packages/marshmallow_sqlalchemy/convert.py", line 210, in _get_field_class_for_property
    column = prop.columns[0]
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 1240, in __getattr__
    return self._fallback_getattr(key)
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 1214, in _fallback_getattr
    raise AttributeError(key)
AttributeError: columns

What you expected to happen:

I expect to install a provider package without needing to worry about it breaking Airflow's dependency. Providers are suppose to be Airflow version agnostic post 2.0.

How to reproduce it:

Install any version of the snowflake provider package.

apache-airflow-providers-snowflake
@wolfier wolfier added the kind:bug This is a clearly a bug label Aug 5, 2021
@potiuk
Copy link
Member

potiuk commented Aug 6, 2021

@wolfier - just a comment - while we can add it in snowflake (and @jedcunningham added it in #17452) for the future there is another way of installing the provider: You can install the provider with latest constraints from main : https://github.com/apache/airflow/blob/constraints-main/constraints-3.6.txt#L33

This is the most "certain" way of installing the most recently released providers. I will also think about adding some specific tags per-provider, so that you do not have to use main (it has some risks that newer version of constraints is used).

@potiuk
Copy link
Member

potiuk commented Aug 6, 2021

Also I am a bit surprised that this happened. By default, when pip install <PACKAGE> is run and the installed version of dependency falls in the limits of the dependencies, it should NOT upgrade it to later version (you need to specify eager version of upgrade in order that it happens by PIP flags).

@wolfier -> could you please describe exactly what you have done and how you instlaled the newer version of snowflake provider? Did you have airflow already installed ? which version? what was the command you used to upgrade the snowflake provider? Which version of PIP you used? (or was it a different way of installing - with poetry or pip-tools maybe?)

@wolfier
Copy link
Contributor Author

wolfier commented Aug 6, 2021

interestingly, I was unable to replicate with OSS Airflow. The installation process is little different for my environment so I'll look into that.

Here is how I tested the OSS installation.

I have a python base image.

➜  docker run -it python bash

pip version

root@ba82f0d0529c:/# pip --version
pip 21.2.2 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)

Installation with the constraints for 2.1.0.

root@f2aa35e4c573:/#  pip install apache-airflow==2.1.0 -c constraint.txt
...
Successfully installed Babel-2.9.1 Flask-Babel-1.0.0 Flask-JWT-Extended-3.25.1 Flask-OpenID-1.2.5 Flask-SQLAlchemy-2.5.1 Mako-1.1.4 WTForms-2.3.3 alembic-1.6.2 anyio-3.3.0 apache-airflow-2.1.0 apache-airflow-providers-ftp-1.1.0 apache-airflow-providers-imap-1.0.1 apache-airflow-providers-sqlite-1.0.2 apispec-3.3.2 argcomplete-1.12.3 attrs-20.3.0 blinker-1.4 cattrs-1.6.0 certifi-2020.12.5 cffi-1.14.5 click-7.1.2 clickclick-20.10.2 colorama-0.4.4 colorlog-5.0.1 commonmark-0.9.1 croniter-1.0.13 cryptography-3.4.7 defusedxml-0.7.1 dill-0.3.1.1 dnspython-1.16.0 docutils-0.17.1 email-validator-1.1.2 flask-1.1.2 flask-appbuilder-3.3.0 flask-caching-1.10.1 flask-login-0.4.1 flask-wtf-0.14.3 graphviz-0.16 gunicorn-20.1.0 h11-0.12.0 httpcore-0.13.6 httpx-0.18.2 idna-2.10 importlib-resources-1.5.0 inflection-0.5.1 iso8601-0.1.14 isodate-0.6.0 itsdangerous-1.1.0 jinja2-2.11.3 jsonschema-3.2.0 lazy-object-proxy-1.4.3 lockfile-0.12.2 markdown-3.3.4 markupsafe-1.1.1 marshmallow-3.12.1 marshmallow-enum-1.5.1 marshmallow-oneofschema-2.1.0 marshmallow-sqlalchemy-0.23.1 numpy-1.20.3 openapi-schema-validator-0.1.5 openapi-spec-validator-0.3.0 pandas-1.2.4 pendulum-2.1.2 prison-0.1.3 psutil-5.8.0 pycparser-2.20 pygments-2.9.0 pyjwt-1.7.1 pyrsistent-0.17.3 python-daemon-2.3.0 python-dateutil-2.8.1 python-editor-1.0.4 python-nvd3-0.15.0 python-slugify-4.0.1 python3-openid-3.2.0 pytz-2021.1 pytzdata-2020.1 pyyaml-5.4.1 rfc3986-1.5.0 rich-9.2.0 setproctitle-1.2.2 six-1.16.0 sniffio-1.2.0 sqlalchemy-1.3.24 sqlalchemy-jsonfield-1.0.0 sqlalchemy-utils-0.37.2 swagger-ui-bundle-0.0.8 tabulate-0.8.9 tenacity-6.2.0 termcolor-1.1.0 text-unidecode-1.3 typing-extensions-3.7.4.3 unicodecsv-0.14.1 werkzeug-1.0.1

installed SQLAlchemy version

root@ba82f0d0529c:/# pip list | grep -i sql
apache-airflow-providers-sqlite 1.0.2
Flask-SQLAlchemy                2.5.1
marshmallow-sqlalchemy          0.23.1
SQLAlchemy                      1.3.24

Now, I will install apache-airflow-providers-snowflake

root@ba82f0d0529c:/# pip install apache-airflow-providers-snowflake==2.0.0
...
Successfully installed apache-airflow-providers-snowflake-2.0.0 asn1crypto-1.4.0 azure-common-1.1.27 azure-core-1.17.0 azure-storage-blob-12.8.1 boto3-1.18.15 botocore-1.21.15 chardet-4.0.0 charset-normalizer-2.0.4 jmespath-0.10.0 msrest-0.6.21 oauthlib-3.1.1 oscrypto-1.2.1 pyOpenSSL-20.0.1 pycryptodomex-3.10.1 requests-2.26.0 requests-oauthlib-1.3.0 s3transfer-0.5.0 snowflake-connector-python-2.5.1 snowflake-sqlalchemy-1.2.5 urllib3-1.26.6

The version remained the same

root@ba82f0d0529c:/# pip list | grep -i snowflake
apache-airflow-providers-snowflake 2.0.0
snowflake-connector-python         2.5.1
snowflake-sqlalchemy               1.2.5

root@ba82f0d0529c:/# pip list | grep -i sql
apache-airflow-providers-sqlite    1.0.2
Flask-SQLAlchemy                   2.5.1
marshmallow-sqlalchemy             0.23.1
snowflake-sqlalchemy               1.2.5
SQLAlchemy                         1.3.24
SQLAlchemy-JSONField               1.0.0
SQLAlchemy-Utils                   0.37.2

Though for some reason it installed snowflake-sqlalchemy==1.2.5 and not 1.3.1 🤔

@jedcunningham
Copy link
Member

I think the difference comes down to what version of pip you have. pip 19.3.1 pulls down 1.3.1, while pip 21.1.2 and 21.2.3 pull down 1.2.5.

@jedcunningham
Copy link
Member

I think the right answer here is "upgrade pip so the new resolver is used".

@potiuk, I'm really curious about the advice to use the constraints from main when installing newer providers though. That seems like a recipe for trouble, especially as time goes on. I guess the new pip resolver makes that safer, but still. Do we have the "upgrade a provider to a newer version than what was released when your core airflow version was released" scenario documented anywhere?

@potiuk
Copy link
Member

potiuk commented Aug 7, 2021

Yeah it's for more 'adventurous' ones :). It is not really 'documented' yet and It does not have the same 'guarantees' as the airflow constraints. But it might prevent some obvious problems like this one when we have a new 'breaking' release of dependency.

And if some main changes happened since the last provider release, they might break.

The way how it should be done (and I might actually implement it in our processes) is to to tag the 'main' constraints at the moment we release the providers (with separate tag per provider). That can be easily automated, and I might add it to our process. We can even track down historical constraints and re-tag the history for all provider releases. Should be easy.

It has a few caveats and i think it is basically almost unsolvable problem - it will only fully work when you have the same set of provider versions as those at the moment of tagging. It is possible that there is a combination of providers that has conflicting dependencies in different versions and that provider a in version 1.0.0 will never work with provider b in version 2.0.0. So when you install provider a and you already have provider b, they might conflict.

However, the last problem is somewhat solvable if we tag constraints with provider versions. Our constraints (the default ones - because we actually have three sets of constraints - PyPI providers, no-providers, source-providers) contain actually all the providers with versions. So if you have a conflicting provider (and once we bring the tags in) we should be able to do something like:

pip install apache-airflow-provider-google==5.0.0 apache-airflow-provider-amazon --constraint http....apache-airflow-provider-google/5.0.0/constraints-3.6.txt

And it will not only install Google provider in the right version and all it's dependencies in the right version but it will also update the Amazon provider (and it's dependencies) to the version that was released at the same time as the Google 5.0.0 (thus - with guaranteed, not conflicting dependencies)

@potiuk
Copy link
Member

potiuk commented Aug 7, 2021

Also just one point - conflicting dependencies are not and will never be as huge problem as you might think. I made sure of it when designing the whole system of constraints and automated upgrades. I thought and experimented a lot with that over the last few years and my experiences from last few years are rather positive here.

The 'general' approach of almost all active direct dependencies we use is that they are updating dependencies pretty fast and by simply updating to latest versions of dependent packages solves most of the problems. We also advise all our users to update to latest versions of providers when they can (also it is kind of given when we release image and constraints we always use latest released versions of providers and dependencies that work).

We are continuously bumping the constraints with 'eager' upgrade (they are updated after all tests pass).

And we have mechanisms (which i utilize) to handle exceptions. If you look at Dockerfile.ci and Dockerfile - there are a few hard -limited dependencies in them that handle few cases that are otherwise not easy to handle. But there are just a few of those. Those are helping in making the automated upgrades work. That's why it is important to have constraints, ci, tests, Dockerfile, Dockerfile.ci together in one monorepo. We can only do that because all of those pieces are connected and they are helping each other - Dockerfile and Dockerfile.ci use constraints to build itself, the tests are using resulting images, we can run those images in eager upgrade mode later and re-run the tests, the constraints Are updated after the tests are succesful, they are pushed to the repo, and both docker images arerebuilt with those new upgraded constraints - it is all nicely connected and work in continuous circles of build-test-upgrade.

Also the way we approach our dependencies in setup.py makes it quite difficult to get into conflict situation when you look closer - we rarely update minimum versions (mainly when we handle a CVE or incompatible change in implementing certain APIs).

We also usually do not add upper-bounds for our dependencies - unless we know they are breaking something. On one hand it is risky (but our constraints and the fact that we only upgrade after successful tests mitigate the risk) but also handle the situations when even major upgrades of dependencies work without any fixes. Not everyone uses SemVer, we cannot rely on that, so we actually will never know if things are going to break. Constraints and making them essential while installing airflow solve the problem very nicely.

This actually makes it pretty possible to keep it all working and conflicts are far and few between (and usually can be solved with proper constraints use).

@apache apache locked and limited conversation to collaborators Sep 15, 2021
@potiuk potiuk closed this as completed Sep 15, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants