Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1729] [AIRFLOW-2729] Ignore whole directories in .airflowignore #3602

Closed
wants to merge 1 commit into from

Conversation

ashb
Copy link
Member

@ashb ashb commented Jul 13, 2018

Make sure you have checked all steps below.

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:

We can ignore whole directories by removing them from the dirs array
that os.walk() returns. Doing this means that we fewer disk ops if
someone has a set of modules in their dag folder that they want to
ignore.

Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a
parent dir as of #3717 -- that (expected) behaviour is now back again.

De-duplicate the walking code as well - we had two versions that had
gotten out of sync as of #3171. So that doesn't happen again we now only
have one version.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason: we don't currently have any mechanism in place to mock/test airflow ignore behaviour. I'm not happy about opening a PR that fixes a regression without a unittest to cover it :(

Commits

  • [Lx ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • Added note to make it clearer that ignore file is regex, not shell-glob.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@ashb ashb force-pushed the ignore-whole-dirs-airflowignore branch from 40c5ebd to 9c4d6e2 Compare July 13, 2018 09:11
@bolkedebruin
Copy link
Contributor

@ashb py2 builds seem to fail.

@ashb
Copy link
Member Author

ashb commented Jul 13, 2018

Damn, will look.

@ashb
Copy link
Member Author

ashb commented Jul 13, 2018

Looking (and testing it locally this time. I didn't think I used any py3 specific functionality. Guess I was wrong)

@bolkedebruin
Copy link
Contributor

Yes please test locally as it doesnt have explicit tests as you've mentioned (and it is pretty core)

@ashb
Copy link
Member Author

ashb commented Jul 13, 2018

Sorry - meant "test in Py2 locally" - we use Py3 locally so that's the venv I had configured and tested with.

We can ignore whole directories by removing them from the `dirs` array
that `os.walk()` returns. Doing this means that we fewer disk ops if
someone has a set of modules in their dag folder that they want to
ignore.

Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a
parent dir as of apache#3717 -- that (expected) behaviour is now back again.

De-duplicate the walking code as well - we had two versions that had
gotten out of sync as of apache#3171. So that doesn't happen again we now only
have one version.
@ashb ashb force-pushed the ignore-whole-dirs-airflowignore branch from 9c4d6e2 to 93c95ac Compare July 13, 2018 10:23
@ashb
Copy link
Member Author

ashb commented Jul 13, 2018

Fixed, tested on Py2 and Py3 - both working now.

Py2 doesn't have list.copy() - found a better way of doing it that doesn't copy until it needs to either.

@codecov-io
Copy link

codecov-io commented Jul 13, 2018

Codecov Report

Merging #3602 into master will increase coverage by <.01%.
The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3602      +/-   ##
==========================================
+ Coverage   76.86%   76.86%   +<.01%     
==========================================
  Files         204      204              
  Lines       15523    15511      -12     
==========================================
- Hits        11931    11922       -9     
+ Misses       3592     3589       -3
Impacted Files Coverage Δ
airflow/models.py 88.69% <100%> (+0.01%) ⬆️
airflow/utils/dag_processing.py 89.45% <77.77%> (+0.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5290688...93c95ac. Read the comment docs.

asfgit pushed a commit that referenced this pull request Jul 13, 2018
…n .airflowignore

We can ignore whole directories by removing them
from the `dirs` array
that `os.walk()` returns. Doing this means that we
fewer disk ops if
someone has a set of modules in their dag folder
that they want to
ignore.

Also fixes [AIRFLOW-2797] - we weren't honoring
.airflowignore from a
parent dir as of #3717 -- that (expected)
behaviour is now back again.

De-duplicate the walking code as well - we had two
versions that had
gotten out of sync as of #3171. So that doesn't
happen again we now only
have one version.

Closes #3602 from ashb/ignore-whole-dirs-
airflowignore

(cherry picked from commit 6b2fdbe)
Signed-off-by: Bolke de Bruin <bolke@xs4all.nl>
@asfgit asfgit closed this in 6b2fdbe Jul 13, 2018
@ashb ashb deleted the ignore-whole-dirs-airflowignore branch July 13, 2018 12:27
ashb added a commit to ashb/airflow that referenced this pull request Jul 13, 2018
I left an extra log call, at info level in apache#3602 that was being used for
debugging.
asfgit pushed a commit that referenced this pull request Jul 15, 2018
I left an extra log call, at info level in #3602
that was being used for
debugging.

Closes #3603 from ashb/remove-extra-log

(cherry picked from commit 7c19ed0)
Signed-off-by: Bolke de Bruin <bolke@xs4all.nl>
asfgit pushed a commit that referenced this pull request Jul 15, 2018
I left an extra log call, at info level in #3602
that was being used for
debugging.

Closes #3603 from ashb/remove-extra-log
lxneng pushed a commit to lxneng/incubator-airflow that referenced this pull request Aug 10, 2018
…n .airflowignore

We can ignore whole directories by removing them
from the `dirs` array
that `os.walk()` returns. Doing this means that we
fewer disk ops if
someone has a set of modules in their dag folder
that they want to
ignore.

Also fixes [AIRFLOW-2797] - we weren't honoring
.airflowignore from a
parent dir as of apache#3717 -- that (expected)
behaviour is now back again.

De-duplicate the walking code as well - we had two
versions that had
gotten out of sync as of apache#3171. So that doesn't
happen again we now only
have one version.

Closes apache#3602 from ashb/ignore-whole-dirs-
airflowignore
lxneng pushed a commit to lxneng/incubator-airflow that referenced this pull request Aug 10, 2018
I left an extra log call, at info level in apache#3602
that was being used for
debugging.

Closes apache#3603 from ashb/remove-extra-log
@jeffkpayne
Copy link
Contributor

@ashb Btw, you referenced AIRFLOW-2797 in the PR, but I think that was a typo, as that has to do with using a custom Dataproc image when creating clusters... Not sure if there is a way to correct this, but it's kind of confusing :)

@ashb
Copy link
Member Author

ashb commented Sep 5, 2018

D'oh. No, no way to correct it anymore! Though I was able to remove the links on the wrong Jira Issue so that makes it less confusing there.

@ashb ashb changed the title [AIRFLOW-1729] [AIRFLOW-2797] Ignore whole directories in .airflowignore [AIRFLOW-1729] [AIRFLOW-2729] Ignore whole directories in .airflowignore Sep 5, 2018
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
…n .airflowignore

We can ignore whole directories by removing them
from the `dirs` array
that `os.walk()` returns. Doing this means that we
fewer disk ops if
someone has a set of modules in their dag folder
that they want to
ignore.

Also fixes [AIRFLOW-2797] - we weren't honoring
.airflowignore from a
parent dir as of apache#3717 -- that (expected)
behaviour is now back again.

De-duplicate the walking code as well - we had two
versions that had
gotten out of sync as of apache#3171. So that doesn't
happen again we now only
have one version.

Closes apache#3602 from ashb/ignore-whole-dirs-
airflowignore
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
I left an extra log call, at info level in apache#3602
that was being used for
debugging.

Closes apache#3603 from ashb/remove-extra-log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants