Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update return types of get_key methods on S3Hook #30923

Merged
merged 1 commit into from May 3, 2023

Conversation

jonshea
Copy link
Contributor

@jonshea jonshea commented Apr 27, 2023

S3Hook has two methods, get_key and get_wildcard_key, which use an AWS ServiceResource to fetch object data from S3. Both methods are correctly documented as returning an instance of S3.Object, but their return types are annotated with S3Transfer. This is incorrect. The actual return type, S3.Object, is not a subtype of S3Transfer, and the two types have many different methods.

This PR uses the mypy-boto3-s3 package to set a correct return type of S3 resource Object for get_key and get_wildcard_key.

@boring-cyborg
Copy link

boring-cyborg bot commented Apr 27, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@jonshea jonshea marked this pull request as ready for review April 28, 2023 17:04
Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, triggered CI and re-ran that failed PROD image build.

@potiuk
Copy link
Member

potiuk commented Apr 29, 2023

Ah no ... pre-commit install/run and docs build are recommende/

@jonshea jonshea force-pushed the update-s3hook-get_key-return-type branch 2 times, most recently from 373d3af to 67efa3d Compare May 1, 2023 02:16
@jonshea
Copy link
Contributor Author

jonshea commented May 1, 2023

I believe have fixed the CI failures on static-checks and build-docs.

I am slightly anxious about this change. In some sense, it is a breaking API change (at least at the type checking level), and it may result in type checking failures to airflow users when they upgrade, which is annoying for users to need to manage.

But on the other hand, the types are currently wrong, and users who rely on them may have runtime errors. That is how I found this issue. And in particular, get_wildcard_key was typed as returning just S3Transfer when it very clearly can return None (as well as returning S3.Object rather than S3Transfer).

I am not sure if Airflow has a policy on this kind of change. Personally, I would like the correct types. But if you want something different (or to close the PR without merging), I am happy to oblige.

@o-nikolas
Copy link
Contributor

I believe have fixed the CI failures on static-checks and build-docs.

Triggered CI again (the first time contributor workflow can be frustrating @jonshea, bear with us)

I am slightly anxious about this change. In some sense, it is a breaking API change (at least at the type checking level), and it may result in type checking failures to airflow users when they upgrade, which is annoying for users to need to manage.

But on the other hand, the types are currently wrong, and users who rely on them may have runtime errors. That is how I found this issue. And in particular, get_wildcard_key was typed as returning just S3Transfer when it very clearly can return None (as well as returning S3.Object rather than S3Transfer).

I am not sure if Airflow has a policy on this kind of change. Personally, I would like the correct types. But if you want something different (or to close the PR without merging), I am happy to oblige.

This is interesting, if we do consider it a breaking change, we also don't really have an (easy) way to cleanly deprecate type-related changes like this with a deprecation warning, while keeping it backwards compatible.

However, I think this can pretty clearly be seen as a bug fix, and as such can be shipped without backcompat/warnings. But I'd be interested to hear what others think.

@vincbeck
Copy link
Contributor

vincbeck commented May 1, 2023

However, I think this can pretty clearly be seen as a bug fix, and as such can be shipped without backcompat/warnings. But I'd be interested to hear what others think.

I strongly agree with this. This looks like a bug to me and should be fixed without any deprecation process

@potiuk
Copy link
Member

potiuk commented May 1, 2023

Type change is not really breaking IMHO. We do not have a definition of what Breaking change is and mainly because it is impossible. Hyrum's law succintly describes it and the obligatory XKCD comic shows a good example of it [1]. So we always have to rely on personal judgment on when a change is likely to break many people's "production" workflow.

This one might easily break someone's CI workflow for example. But it is extremely unlikely to break production workflow. There are few cases only when type-hinting is used at runtime with the capacity of breaking things (for example when it is used in Pydantic definition of data objects being serialized). If your code worked before the change and correctly handled S3 objects returned, it will continue to work, if it did not handle some edge cases, and your MyPy might not have flagged it before, but it will continue to NOT work the same way it did NOT work before. So this one is definitely not breaking. your CI might break, but this is not a breaking change for airlfow release.

[1] Obligatory XKCD

image

@potiuk
Copy link
Member

potiuk commented May 1, 2023

BTW. The reason build-docs have been failing for this one are workarounded in #31002 - this change has generated dependencies change which in turn require docs for all providers to be built, and it took all the time doc build job had allocated on Public Runners.

`S3Hook` has two methods, `get_key` and `get_wildcard_key`, which use an
AWS `ServiceResource` to fetch object data from S3. Both methods are
correctly documented as returning an instance of `S3.Object`, but their
return types are annotated with `S3Transfer`. This is incorrect.
The actual return type, `S3.Object`, is not a subtype of `S3Transfer`, and
the two types have many different methods.

This PR uses the `mypy-boto3-s3` package to set a correct return type of
S3 resource `Object` for `get_key` and `get_wildcard_key`.
@jonshea jonshea force-pushed the update-s3hook-get_key-return-type branch from 67efa3d to 75b7edc Compare May 2, 2023 16:42
@jonshea
Copy link
Contributor Author

jonshea commented May 2, 2023

I rebased on main to pick up #31002, and it seems like we are green now. Thank you all for your help and feedback. I think this might be the most attention that I have ever gotten on a first PR for a project, especially given that it is a fairly trivial change. I really appreciate it.

If there is anything else you need from me, or that I can help with, please let me know.

@o-nikolas o-nikolas merged commit cb71d41 into apache:main May 3, 2023
58 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented May 3, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants