Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file #3893

Merged
merged 8 commits into from
Oct 13, 2018
Merged

[AIRFLOW-2932] GoogleCloudStorageHook - allow compression of file #3893

merged 8 commits into from
Oct 13, 2018

Conversation

neil90
Copy link
Contributor

@neil90 neil90 commented Sep 13, 2018

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@Fokko Fokko requested a review from kaxil September 15, 2018 19:55
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work in general. Can you add a test for this?

@@ -37,6 +37,8 @@ class FileToGoogleCloudStorageOperator(BaseOperator):
:type google_cloud_storage_conn_id: str
:param mime_type: The mime-type string
:type mime_type: str
:type gzip: Allows for file to upload as gzip
:param gzip: boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect docstring, you have mixed type and param. Replace it to:

    :param gzip: Allows for file to upload as gzip
    :type gzip: boolean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to add more detail to this parameter and also at Line 28 that says that optionally you can also compress the file before uploading. An example, would be good.

https://github.com/apache/incubator-airflow/blob/e455e75107f84d5b47fa5b2cda1c68fb6016f0f4/airflow/contrib/operators/file_to_gcs.py#L28

@codecov-io
Copy link

codecov-io commented Sep 25, 2018

Codecov Report

Merging #3893 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #3893   +/-   ##
======================================
  Coverage    75.5%   75.5%           
======================================
  Files         199     199           
  Lines       15949   15949           
======================================
  Hits        12043   12043           
  Misses       3906    3906

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92b54bb...4acabf9. Read the comment docs.

@xnuinside
Copy link
Contributor

@neil90, Are you planning to update the PR with adding tests?

@neil90
Copy link
Contributor Author

neil90 commented Oct 2, 2018

Hi @xnuinside , sorry about that I have been swamped at work I will finish this task by the weekend my apologies.

@neil90
Copy link
Contributor Author

neil90 commented Oct 7, 2018

@kaxil and @xnuinside

I have updated the operator and added test case, I used the test_file_to_wasb.py as a template.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neil90 Two last comments, apart from that it looks good! 👍

@@ -172,7 +175,8 @@ def download(self, bucket, object, filename=None):
return downloaded_file_bytes

# pylint:disable=redefined-builtin
def upload(self, bucket, object, filename, mime_type='application/octet-stream'):
def upload(self, bucket, object, filename,
gzip=False, mime_type='application/octet-stream'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment, can you move the gzip argument to the very end? Otherwise backward compatibility will be broken.

@@ -49,6 +52,7 @@ def __init__(self,
bucket,
google_cloud_storage_conn_id='google_cloud_default',
mime_type='application/octet-stream',
gzip=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment, can you move the gzip argument to the very end? Otherwise backward compatibility will be broken.

@kaxil
Copy link
Member

kaxil commented Oct 12, 2018

I have made the changes. Let's wait for the CI to pass and it should be good to be merged :)

@neil90
Copy link
Contributor Author

neil90 commented Oct 12, 2018

Ohh thanks @kaxil sorry about that I was gonna try to get to today after work.

@Fokko
Copy link
Contributor

Fokko commented Oct 13, 2018

Thanks @neil90 @kaxil.

@Fokko Fokko merged commit 3535836 into apache:master Oct 13, 2018
Fokko pushed a commit to Fokko/incubator-airflow that referenced this pull request Oct 13, 2018
…ache#3893)

- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
ashb pushed a commit that referenced this pull request Oct 22, 2018
)

- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
ashb pushed a commit to ashb/airflow that referenced this pull request Oct 22, 2018
…ache#3893)

- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
galak75 pushed a commit to VilledeMontreal/incubator-airflow that referenced this pull request Nov 23, 2018
…ache#3893)

- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
…ache#3893)

- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Jan 23, 2019
- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
…ache#3893)

- Add gzip functionality to GoogleCloudStorageHook.upload
- Resolve docstring mistype, added additional information to
  tell user that there is option to compress
- Add test case for file_to_gcs
@nikhil-mahajan-pcln
Copy link

nikhil-mahajan-pcln commented Sep 20, 2020

HI, is there a 2GB limit on the compressed .gz file that can be sent ?
I get a "string longer than 2147483647 bytes" whenever the gz file file size exceeds 2GB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants