Skip to content

[AIRFLOW-4085] FileSensor - adding wildcard option#5358

Merged
ashb merged 1 commit intoapache:masterfrom
zacharya19:z/FileSensorGlob
Sep 4, 2019
Merged

[AIRFLOW-4085] FileSensor - adding wildcard option#5358
ashb merged 1 commit intoapache:masterfrom
zacharya19:z/FileSensorGlob

Conversation

@zacharya19
Copy link
Contributor

@zacharya19 zacharya19 commented Jun 1, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-4085
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Changing FileSensor to use glob in order to allow wildcard and more functionality in this sensor.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Jun 1, 2019

Codecov Report

Merging #5358 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5358      +/-   ##
==========================================
- Coverage   79.03%   79.03%   -0.01%     
==========================================
  Files         481      481              
  Lines       30212    30210       -2     
==========================================
- Hits        23877    23875       -2     
  Misses       6335     6335
Impacted Files Coverage Δ
airflow/contrib/sensors/file_sensor.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1734e5...e034cc1. Read the comment docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dir_ = tempfile.mkdtemp()
dir = tempfile.mkdtemp()

The diff is much easier to read and easier to review if you don't make changes like this. Please revert it.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an "option" for wildcard but support for them always. This is probably okay (as most people don't put * or ? etc in paths, but at the very least this should be mentioned in UPDATING.md

@zacharya19
Copy link
Contributor Author

@ashb done.

@OmerJog
Copy link
Contributor

OmerJog commented Jul 22, 2019

@zacharya19 can you resolve the conflict?

@zacharya19
Copy link
Contributor Author

@OmerJog Done.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely looking okay, but a few changes to make. please mention/ping me when you rebase as otherwise I might miss the update.

@zacharya19 zacharya19 force-pushed the z/FileSensorGlob branch 2 times, most recently from e8358de to 0d986c9 Compare August 27, 2019 18:26
@zacharya19
Copy link
Contributor Author

@ashb Fixed conflicts and answered your comments.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good.

Tests were failing, I've restarted the tests, hopefully it was just a transient failure.

@ashb ashb merged commit c9e2d04 into apache:master Sep 4, 2019
@ashb
Copy link
Member

ashb commented Sep 4, 2019

Merged. Now I just need to decide if we merge this in for release with 1.10.6 or not. It is technically a breaking change, but it is just very unlikely to affect anyone. I think.

@zacharya19 zacharya19 deleted the z/FileSensorGlob branch December 27, 2019 17:15
@NBardelot
Copy link
Contributor

NBardelot commented Mar 29, 2021

FYI this behaviour is inconsistent.

If anyone like me comes to this PR after looking at the glob() behaviour, this cannot work for hooks that make use of a remote FS (like SFTPHook for example). As the Python documentation states that glob() uses a mix of os.scandir() and fnmatch.fnmatch() which make the code proposed here only adapted to a local FS.

Thus, trying to use a path with a glob pattern and a hook to a remote FS will have a random effect: either you're lucky and the glob() will not find the equivalent path locally and just return that the path does not exist (the sensor will never trigger); or in a worse case scenario you might trigger the sensor for a file that exists locally but not on the remote FS as expected...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants