Support .airflowignore for plugins#9531
Conversation
ec2b43e to
bb4568c
Compare
bb4568c to
8bbe464
Compare
airflow/plugins_manager.py
Outdated
There was a problem hiding this comment.
Hello.
This code looks very similar to:
https://github.com/apache/airflow/blob/master/airflow/utils/file.py#L125
I think you can make a generator that will take the starting path and yield for each file to load.
Best regards,
Kamil
There was a problem hiding this comment.
You are right.
In fact, this is pretty much the same code.
I made a generator. What do you think of this?
(Perhaps we can make the generator func share with file.py but....)
There was a problem hiding this comment.
I like this code now. :-)
Moving this method to airflow.utils.file is a good idea. This will allow us to delete the duplicate code.
docs/concepts.rst
Outdated
There was a problem hiding this comment.
Why the new file name? In my opinion, you can use .airflowignore and it will be easier to use.
There was a problem hiding this comment.
I agree with your opinion.
In my opinion, you can use .airflowignore and it will be easier to use.
It was fixed from .pluginignore to .airflowignore . :-)
|
Could you add some tests? For plugin managers testing, tests.test_utils.mock_plugins.mock_plugin_manager may be helpful. A perfect test can create the required files in a temporary directory (NamedTemporaryFile) and then check if the plugins have been loaded. |
Thank you for your advice. |
98631de to
16173d0
Compare
16173d0 to
96f822f
Compare
|
I think the one failure of tests doesn't seem to have anything to do with this PR..... (##[error]Process completed with exit code 137. Probably not enough memory. ) |
565c3c0 to
42dc085
Compare
|
@turbaszek Can I ask for review? |
airflow/plugins_manager.py
Outdated
There was a problem hiding this comment.
It's not particularly necessary.
I fixed!
airflow/plugins_manager.py
Outdated
There was a problem hiding this comment.
Is this cast to str necessary?
There was a problem hiding this comment.
It is not necessary.
I fixed.
airflow/plugins_manager.py
Outdated
There was a problem hiding this comment.
Does this part have to be in try clause?
There was a problem hiding this comment.
No....
I fixed
airflow/plugins_manager.py
Outdated
There was a problem hiding this comment.
| for mod_attr_value in list(mod.__dict__.values()): | |
| if is_valid_plugin(mod_attr_value): | |
| plugin_instance = mod_attr_value() | |
| plugins.append(plugin_instance) | |
| for mod_attr_value in (m for m in mod.__dict__.values() if is_valid_plugin(m)): | |
| plugin_instance = mod_attr_value() | |
| plugins.append(plugin_instance) |
No strong opinion here but in this way we may be able to fix the # pylint: disable=too-many-nested-blocks
There was a problem hiding this comment.
Beautiful code!
I reflect this.
airflow/utils/file.py
Outdated
There was a problem hiding this comment.
Should we compile this only once?
There was a problem hiding this comment.
It is not necessary.
I fixed.
airflow/utils/file.py
Outdated
There was a problem hiding this comment.
| for subdir in dirs: | |
| patterns_by_dir[os.path.join(root, subdir)] = patterns.copy() | |
| patterns_by_dir = {os.path.join(root, sd): patterns.copy() for sd in dirs} |
WDYT? Also, do we have to create copy of patterns each time?
There was a problem hiding this comment.
This is necessary.
A canonical pattern that is evaluated in a parent directory must also be evaluated in its parent's child directories. At least that's how .airflowignore (selection of dag) is currently specificated.
airflow/utils/file.py
Outdated
There was a problem hiding this comment.
| if any([re.findall(p, file_path) for p in patterns]): | |
| if any(re.findall(p, file_path) for p in patterns): |
Should work also
|
@turbaszek Would this be okay? |
|
There was an error. |
ff074ce to
813b1aa
Compare
|
.... I think the two failure of k8s tests doesn't seem to have anything to do with this PR. The same error occurs in other PRs. :( |
|
Please ignore it. It is not related. |
airflow/utils/file.py
Outdated
There was a problem hiding this comment.
| ignore_list_file: str) -> Generator[str, None, None]: | |
| """ | |
| Search the file and return the path of the file that should not be ignored. | |
| :param base_dir_path: the base path to be searched for. | |
| :param ignore_file_list_name: the file name in which specifies a regular expression pattern is written. | |
| ignore_file_name: str) -> Generator[str, None, None]: | |
| """ | |
| Search the file and return the path of the file that should not be ignored. | |
| :param base_dir_path: the base path to be searched for. | |
| :param ignore_file_name: the file name in which specifies a regular expression pattern is written. |
WDYT?
There was a problem hiding this comment.
I'm sorry.
It's my simple mistake.
I fixed.
|
The same k8s error that has nothing to do with this PR as last time. |
tests/plugins/test_plugin_ignore.py
Outdated
There was a problem hiding this comment.
Can you please use the ctx manager?
with open(...) as file:
file.write()There was a problem hiding this comment.
Of course!
I fixed.
tests/plugins/test_plugin_ignore.py
Outdated
There was a problem hiding this comment.
| for path in detected_files: | |
| self.assertNotIn(path, should_ignore_files) | |
| self.assertEqual(detected_files & should_ignore_files, set()) |
WDYT?
There was a problem hiding this comment.
It's efficient!
I fixed!
There was a problem hiding this comment.
What is the purpose of those files? I think I don't see in tests
There was a problem hiding this comment.
These files were used to files that should not be ignored.
( self.assertEqual(detected_files, should_not_ignore_files) Line 87 (now 95)of the test_plugin_ignore.py. )
But....
I fixed these files and ".airflowignore" files to be generated by test_plugin_ignore.py, and delete pull files!
tests/plugins/test_plugin_ignore.py
Outdated
There was a problem hiding this comment.
Is content of those files important? There's a lot of repeated code so I would opt for some loop like:
for file_path, content in files_content:
with open(file_path) as f:
f.wrtie(content)Do you think it will make the code clearer?
There was a problem hiding this comment.
Yes!
The notation you suggest is better than the existing my code.
I fixed.
|
@j-y-matsubara would you mind a rebase? The k8s tests should be fixed now |
9fad794 to
307aa4d
Compare
Sure. All tests were passed! |
|
Is it better to use a .airflowignore in the dags folder and the plugins folder? My plugins folder is like this: Airflow needs access to the hooks and operators (my DAGs import them). It does not need access to the tests folder, and maybe not the SQL folder (some DAGs/operators read the SQL scripts from this path). |
This PR is to support .airflowinignore in the PLUGIN FOLDER.
Specifies intentionally files / directories in the PlUGINS_FOLDER to ignore by Airflow.
Issues Link:
#9413
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.