Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed reading from zip package to default to text. #13962

Merged
merged 1 commit into from Jan 29, 2021

Conversation

levahim
Copy link
Contributor

@levahim levahim commented Jan 28, 2021

The open_maybe_zipped function returns different file-like objects depending on whether it's called for a plain file or for a file in a zip archive. The problem is that by default io.open (used for plain files) returns file in text mode and subsequent read on it returns strings. ZipFile on the other hand by default returns a binary file and subsequent read on it returns bytes.
The returned value for open_maybe_zipped should be the same regardless whether it's a zip or a plain file--it should be in text mode. Returning binaries for zip packages causes problems. For example, when saving DAG code is turned on, the DagCode model tries to save DAG source code in the metadata database. SQLAlchemy throws an error for DAGs that come from a zip package, because tries to save binary value in a string column.
This PR fixes the problem.

@boring-cyborg
Copy link

boring-cyborg bot commented Jan 28, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@mik-laj mik-laj merged commit cc70227 into apache:master Jan 29, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 29, 2021

Awesome work, congrats on your first merged pull request!

kaxil added a commit to astronomer/airflow that referenced this pull request Jan 29, 2021
kaxil added a commit that referenced this pull request Jan 29, 2021
@kaxil
Copy link
Member

kaxil commented Jan 29, 2021

The tests were failing for this PR -- so we had to revert as it broke the master, can you create a new PR please and investigate why the unit tests were failing @levahim

@levahim
Copy link
Contributor Author

levahim commented Jan 30, 2021

OK, will do. Not sure how a one-liner fix like the one in this PR can break the tests though, unless some tests specifically rely on the airflow.utils.open_maybe_zipped function return binary data in some cases (which would be testing for the incorrect function behavior). Let me check...

@levahim
Copy link
Contributor Author

levahim commented Jan 30, 2021

Found the problem, fixed the unit test. Resubmitted as PR #13984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants