-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-1729] [AIRFLOW-2729] Ignore whole directories in .airflowignore #3602
Conversation
40c5ebd
to
9c4d6e2
Compare
@ashb py2 builds seem to fail. |
Damn, will look. |
Looking (and testing it locally this time. I didn't think I used any py3 specific functionality. Guess I was wrong) |
Yes please test locally as it doesnt have explicit tests as you've mentioned (and it is pretty core) |
Sorry - meant "test in Py2 locally" - we use Py3 locally so that's the venv I had configured and tested with. |
We can ignore whole directories by removing them from the `dirs` array that `os.walk()` returns. Doing this means that we fewer disk ops if someone has a set of modules in their dag folder that they want to ignore. Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a parent dir as of apache#3717 -- that (expected) behaviour is now back again. De-duplicate the walking code as well - we had two versions that had gotten out of sync as of apache#3171. So that doesn't happen again we now only have one version.
9c4d6e2
to
93c95ac
Compare
Fixed, tested on Py2 and Py3 - both working now. Py2 doesn't have |
Codecov Report
@@ Coverage Diff @@
## master #3602 +/- ##
==========================================
+ Coverage 76.86% 76.86% +<.01%
==========================================
Files 204 204
Lines 15523 15511 -12
==========================================
- Hits 11931 11922 -9
+ Misses 3592 3589 -3
Continue to review full report at Codecov.
|
…n .airflowignore We can ignore whole directories by removing them from the `dirs` array that `os.walk()` returns. Doing this means that we fewer disk ops if someone has a set of modules in their dag folder that they want to ignore. Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a parent dir as of #3717 -- that (expected) behaviour is now back again. De-duplicate the walking code as well - we had two versions that had gotten out of sync as of #3171. So that doesn't happen again we now only have one version. Closes #3602 from ashb/ignore-whole-dirs- airflowignore (cherry picked from commit 6b2fdbe) Signed-off-by: Bolke de Bruin <bolke@xs4all.nl>
I left an extra log call, at info level in apache#3602 that was being used for debugging.
…n .airflowignore We can ignore whole directories by removing them from the `dirs` array that `os.walk()` returns. Doing this means that we fewer disk ops if someone has a set of modules in their dag folder that they want to ignore. Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a parent dir as of apache#3717 -- that (expected) behaviour is now back again. De-duplicate the walking code as well - we had two versions that had gotten out of sync as of apache#3171. So that doesn't happen again we now only have one version. Closes apache#3602 from ashb/ignore-whole-dirs- airflowignore
I left an extra log call, at info level in apache#3602 that was being used for debugging. Closes apache#3603 from ashb/remove-extra-log
@ashb Btw, you referenced AIRFLOW-2797 in the PR, but I think that was a typo, as that has to do with using a custom Dataproc image when creating clusters... Not sure if there is a way to correct this, but it's kind of confusing :) |
D'oh. No, no way to correct it anymore! Though I was able to remove the links on the wrong Jira Issue so that makes it less confusing there. |
…n .airflowignore We can ignore whole directories by removing them from the `dirs` array that `os.walk()` returns. Doing this means that we fewer disk ops if someone has a set of modules in their dag folder that they want to ignore. Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a parent dir as of apache#3717 -- that (expected) behaviour is now back again. De-duplicate the walking code as well - we had two versions that had gotten out of sync as of apache#3171. So that doesn't happen again we now only have one version. Closes apache#3602 from ashb/ignore-whole-dirs- airflowignore
I left an extra log call, at info level in apache#3602 that was being used for debugging. Closes apache#3603 from ashb/remove-extra-log
Make sure you have checked all steps below.
JIRA
Description
We can ignore whole directories by removing them from the
dirs
arraythat
os.walk()
returns. Doing this means that we fewer disk ops ifsomeone has a set of modules in their dag folder that they want to
ignore.
Also fixes [AIRFLOW-2797] - we weren't honoring .airflowignore from a
parent dir as of #3717 -- that (expected) behaviour is now back again.
De-duplicate the walking code as well - we had two versions that had
gotten out of sync as of #3171. So that doesn't happen again we now only
have one version.
Tests
Commits
Documentation
Code Quality
git diff upstream/master -u -- "*.py" | flake8 --diff