Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5769] Move the S3_hook to /providers/aws/hooks #6443

Merged
merged 3 commits into from
Oct 29, 2019

Conversation

mingrammer
Copy link
Contributor

@mingrammer mingrammer commented Oct 27, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5769
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Of course, this update will lead to the API breaking change. Therefore, all codes using S3_hook should change the import module names when this PR is merged.

So if you agree with this change, it would be better to include these updates in Airflow 2.0 rather than 1.x because it is kind of specification change.

@mik-laj
Copy link
Member

mik-laj commented Oct 27, 2019

Can you keep backwards compatibility and add missing tests?

@mingrammer
Copy link
Contributor Author

Right now, I have no ideas for keeping backward compatibility with Python's case-sensitive nature.

@mik-laj
Copy link
Member

mik-laj commented Oct 27, 2019

@mingrammer
Copy link
Contributor Author

mingrammer commented Oct 27, 2019

Ah! that is a good idea. I couldn't think about it. Thank you for help.

@mik-laj
Copy link
Member

mik-laj commented Oct 27, 2019

You should also add a note in the UPDATING.MD file

@mingrammer
Copy link
Contributor Author

Ok. I'll do that. Thank you for guiding me 👍

@mingrammer
Copy link
Contributor Author

mingrammer commented Oct 27, 2019

@mik-laj I encountered a problem.

Some file systems (including macOS's default one) are case-insensitive, so we can't have both s3_hook.py and S3_hook.py simultaneously in those fs.

So we should use another filename to separate that both. How do you think about that move the s3_hook.py to contrib/hooks like other AWS hooks (i.e. aws_athena_hook, aws_lambda_hook etc) named as aws_s3_hook.py (and change the class name to class AWSS3Hook)?

And they (aws-*-hooks) look similar to each other.

class AWSAthenaHook(AwsHook):
    """
    Interact with AWS Athena to run, poll queries and return query results

    :param aws_conn_id: aws connection to use.
    :type aws_conn_id: str
    :param sleep_time: Time to wait between two consecutive call to check query status on athena
    :type sleep_time: int
    """

    INTERMEDIATE_STATES = ('QUEUED', 'RUNNING',)
    FAILURE_STATES = ('FAILED', 'CANCELLED',)
    SUCCESS_STATES = ('SUCCEEDED',)
...
class S3Hook(AwsHook):
    """
    Interact with AWS S3, using the boto3 library.
    """

    def get_conn(self):
        return self.get_client_type('s3')

@mik-laj
Copy link
Member

mik-laj commented Oct 27, 2019

We want to move this hook to a completely different package.
More information:
#6439
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
https://lists.apache.org/thread.html/d8c1a72113fc01088d77a4f6d33371e465d20f7aa0d16232d8f93559@%3Cdev.airflow.apache.org%3E

Please note that a change of class name should be done in a separate PR.

@mingrammer
Copy link
Contributor Author

Well, I will follow the mentioned conventions. (/providers/hooks/~.py)

Yap. I’ll update this PR and send an additional PR for class name.

@feluelle feluelle added the provider:amazon-aws AWS/Amazon - related issues label Oct 27, 2019
@mingrammer mingrammer changed the title [AIRFLOW-5769] Rename the S3_hook to s3_hook [AIRFLOW-5769] Move the S3_hook to /providers/aws/hooks Oct 27, 2019
@mingrammer
Copy link
Contributor Author

mingrammer commented Oct 27, 2019

I'll update the UPDATING.md and other docs when #6446 is merged to avoid conflict and inconsistency.

(based on https://github.com/apache/airflow/pull/6446/files#diff-bb7ee25332d5fdca3ead2f545d72b359)

@mingrammer
Copy link
Contributor Author

mingrammer commented Oct 29, 2019

I updated the UPDATING.md and other reference file.

@codecov-io
Copy link

codecov-io commented Oct 29, 2019

Codecov Report

Merging #6443 into master will decrease coverage by 0.17%.
The diff coverage is 95.83%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6443      +/-   ##
=========================================
- Coverage   83.88%   83.7%   -0.18%     
=========================================
  Files         631     632       +1     
  Lines       36638   36642       +4     
=========================================
- Hits        30732   30672      -60     
- Misses       5906    5970      +64
Impacted Files Coverage Δ
airflow/hooks/S3_hook.py 100% <100%> (+4.25%) ⬆️
airflow/providers/aws/hooks/s3.py 95.74% <95.74%> (ø)
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/kube_client.py 33.33% <0%> (-41.67%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 70.14% <0%> (-28.36%) ⬇️
airflow/models/taskinstance.py 93.79% <0%> (+0.5%) ⬆️
airflow/jobs/scheduler_job.py 74.92% <0%> (+1.2%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee0a557...02244ac. Read the comment docs.

@mingrammer
Copy link
Contributor Author

mingrammer commented Oct 29, 2019

And, Should I change the all import paths with new module path in the other files including test files? (in separate PR)

@mingrammer
Copy link
Contributor Author

@mik-laj Please check it

@mik-laj mik-laj merged commit db29639 into apache:master Oct 29, 2019
@mik-laj
Copy link
Member

mik-laj commented Oct 29, 2019

And, Should I change the all import paths with new module path in the other files including test files? (in separate PR)

Oh. I didn't notice your question. Yes. Other references to this class should be improved.

@mingrammer mingrammer deleted the airflow-5769 branch October 29, 2019 12:38
@mingrammer
Copy link
Contributor Author

Ok. I'll send a PR for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants