Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recursive flag to glob in filesystem sensor #16894

Merged
merged 7 commits into from
Jul 12, 2021

Conversation

ShraddheyaS
Copy link
Contributor

This PR aims to fix #16725 by adding the recursive flag to glob in the filesystem sensor.

closes: #16725

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Jul 8, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 8, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@ShraddheyaS
Copy link
Contributor Author

@potiuk should these checks be failing? I didn't think this change would affect integrations with other components.

The exit code is 137, is this due to some memory constraint?

@potiuk
Copy link
Member

potiuk commented Jul 9, 2021

Yep. We have some instabilities of GitHub runners and memory - constant struggle but seem to intensify a bit recently

potiuk
potiuk previously requested changes Jul 9, 2021
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, while this change looks initially like it is fully backwards compatible, in fact it is not. If someone had a directory structure in his folder already, this change might detect files which were skipped before.

I think better (and fully backwards-compatible) solution will be to add optional recursive flag with default = False to the sensor constructor. This way you would not even have to add any explanation to UPDATING.md.

@ShraddheyaS
Copy link
Contributor Author

@potiuk Thanks for the feedback! Let me know if this looks OK now

@ShraddheyaS ShraddheyaS requested a review from potiuk July 9, 2021 18:05
@ShraddheyaS ShraddheyaS requested a review from kaxil July 11, 2021 17:26
Comment on lines 41 to 42
:param recursive: when set True, enables recursive directory matching behavior of
`**` in glob filepath parameter. Defaults to False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example would be nice too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean an example of invocation like below? Or an example scenario where this would be used like #16725?

FileSensor(
            task_id='test',
            filepath=temp_dir + "/**",
            fs_conn_id='fs_default',
            dag=self.dag,
            timeout=0,
            poke_interval=1,
            recursive=True,
        )

Also, would this example go to the same place? The docstring for the class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example in airflow/example_dags it was missing for filesystem_sensor, but it would be nice to add one since we are changing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param recursive: when set True, enables recursive directory matching behavior of
`**` in glob filepath parameter. Defaults to False.
:param recursive: when set to ``True``, enables recursive directory matching behavior of
``**`` in glob filepath parameter. Defaults to ``False``.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do that in a follow-up PR to add an example DAG

@kaxil kaxil dismissed potiuk’s stale review July 12, 2021 20:19

Stale Review

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jul 12, 2021
@kaxil
Copy link
Member

kaxil commented Jul 12, 2021

@potiuk mind taking another look. Since you had requested changes :)

@potiuk
Copy link
Member

potiuk commented Jul 12, 2021

Fine for me :)

@kaxil kaxil merged commit 789e0ea into apache:main Jul 12, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 12, 2021

Awesome work, congrats on your first merged pull request!

@ShraddheyaS ShraddheyaS deleted the globRecursive branch July 13, 2021 06:17
josh-fell pushed a commit to josh-fell/airflow that referenced this pull request Jul 19, 2021
This PR aims to fix apache#16725 by adding the `recursive` flag to `glob` in the filesystem sensor.

closes: apache#16725
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

filesensor wildcard matching does not recognize directories
3 participants