Skip to content

Support .airflowignore for plugins#9531

Merged
turbaszek merged 18 commits intoapache:masterfrom
j-y-matsubara:develop_plugin
Jul 4, 2020
Merged

Support .airflowignore for plugins#9531
turbaszek merged 18 commits intoapache:masterfrom
j-y-matsubara:develop_plugin

Conversation

@j-y-matsubara
Copy link
Contributor

@j-y-matsubara j-y-matsubara commented Jun 26, 2020

This PR is to support .airflowinignore in the PLUGIN FOLDER.
Specifies intentionally files / directories in the PlUGINS_FOLDER to ignore by Airflow.

Issues Link:
#9413


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello.
This code looks very similar to:
https://github.com/apache/airflow/blob/master/airflow/utils/file.py#L125
I think you can make a generator that will take the starting path and yield for each file to load.
Best regards,
Kamil

Copy link
Contributor Author

@j-y-matsubara j-y-matsubara Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.
In fact, this is pretty much the same code.

I made a generator. What do you think of this?
(Perhaps we can make the generator func share with file.py but....)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this code now. :-)

Moving this method to airflow.utils.file is a good idea. This will allow us to delete the duplicate code.

@j-y-matsubara j-y-matsubara requested a review from mik-laj June 27, 2020 10:02
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the new file name? In my opinion, you can use .airflowignore and it will be easier to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your opinion.

In my opinion, you can use .airflowignore and it will be easier to use.

It was fixed from .pluginignore to .airflowignore . :-)

@mik-laj
Copy link
Member

mik-laj commented Jun 28, 2020

Could you add some tests? For plugin managers testing, tests.test_utils.mock_plugins.mock_plugin_manager may be helpful. A perfect test can create the required files in a temporary directory (NamedTemporaryFile) and then check if the plugins have been loaded.

@j-y-matsubara
Copy link
Contributor Author

j-y-matsubara commented Jun 28, 2020

Could you add some tests? For plugin managers testing, tests.test_utils.mock_plugins.mock_plugin_manager may be helpful. A perfect test can create the required files in a temporary directory (NamedTemporaryFile) and then check if the plugins have been loaded.

Thank you for your advice.
I added a test.
And moved method to airflow.utils.file.

@j-y-matsubara j-y-matsubara changed the title Support .airflowignore for plugins : .pluginingore Support .airflowignore for plugins Jun 28, 2020
@j-y-matsubara j-y-matsubara requested a review from mik-laj June 29, 2020 01:03
@j-y-matsubara
Copy link
Contributor Author

j-y-matsubara commented Jun 30, 2020

I think the one failure of tests doesn't seem to have anything to do with this PR..... (##[error]Process completed with exit code 137. Probably not enough memory. )
The same error occurs in other PRs.

@mik-laj
Copy link
Member

mik-laj commented Jul 1, 2020

@turbaszek Can I ask for review?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not particularly necessary.
I fixed!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cast to str necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary.
I fixed.

Comment on lines 174 to 179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this part have to be in try clause?

Copy link
Contributor Author

@j-y-matsubara j-y-matsubara Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No....
I fixed

Comment on lines 188 to 191
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for mod_attr_value in list(mod.__dict__.values()):
if is_valid_plugin(mod_attr_value):
plugin_instance = mod_attr_value()
plugins.append(plugin_instance)
for mod_attr_value in (m for m in mod.__dict__.values() if is_valid_plugin(m)):
plugin_instance = mod_attr_value()
plugins.append(plugin_instance)

No strong opinion here but in this way we may be able to fix the # pylint: disable=too-many-nested-blocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful code!
I reflect this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we compile this only once?

Copy link
Contributor Author

@j-y-matsubara j-y-matsubara Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary.
I fixed.

Comment on lines 123 to 124
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for subdir in dirs:
patterns_by_dir[os.path.join(root, subdir)] = patterns.copy()
patterns_by_dir = {os.path.join(root, sd): patterns.copy() for sd in dirs}

WDYT? Also, do we have to create copy of patterns each time?

Copy link
Contributor Author

@j-y-matsubara j-y-matsubara Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary.
A canonical pattern that is evaluated in a parent directory must also be evaluated in its parent's child directories. At least that's how .airflowignore (selection of dag) is currently specificated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if any([re.findall(p, file_path) for p in patterns]):
if any(re.findall(p, file_path) for p in patterns):

Should work also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I fixed.

@j-y-matsubara
Copy link
Contributor Author

@turbaszek
Thank you very much for the review.
I have corrected the code you pointed out!

Would this be okay?

@j-y-matsubara j-y-matsubara requested a review from turbaszek July 1, 2020 16:15
@j-y-matsubara
Copy link
Contributor Author

There was an error.
I will fix it.

@j-y-matsubara
Copy link
Contributor Author

.... I think the two failure of k8s tests doesn't seem to have anything to do with this PR. The same error occurs in other PRs. :(

@mik-laj
Copy link
Member

mik-laj commented Jul 2, 2020

Please ignore it. It is not related.

Comment on lines 95 to 99
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ignore_list_file: str) -> Generator[str, None, None]:
"""
Search the file and return the path of the file that should not be ignored.
:param base_dir_path: the base path to be searched for.
:param ignore_file_list_name: the file name in which specifies a regular expression pattern is written.
ignore_file_name: str) -> Generator[str, None, None]:
"""
Search the file and return the path of the file that should not be ignored.
:param base_dir_path: the base path to be searched for.
:param ignore_file_name: the file name in which specifies a regular expression pattern is written.

WDYT?

Copy link
Contributor Author

@j-y-matsubara j-y-matsubara Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry.
It's my simple mistake.
I fixed.

@j-y-matsubara
Copy link
Contributor Author

The same k8s error that has nothing to do with this PR as last time.

Comment on lines 43 to 52
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use the ctx manager?

with open(...) as file:
    file.write()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course!
I fixed.

Comment on lines 88 to 89
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for path in detected_files:
self.assertNotIn(path, should_ignore_files)
self.assertEqual(detected_files & should_ignore_files, set())

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's efficient!
I fixed!

Copy link
Member

@turbaszek turbaszek Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of those files? I think I don't see in tests

Copy link
Contributor Author

@j-y-matsubara j-y-matsubara Jul 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files were used to files that should not be ignored.
( self.assertEqual(detected_files, should_not_ignore_files) Line 87 (now 95)of the test_plugin_ignore.py. )

But....
I fixed these files and ".airflowignore" files to be generated by test_plugin_ignore.py, and delete pull files!

Comment on lines 43 to 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is content of those files important? There's a lot of repeated code so I would opt for some loop like:

for file_path, content in files_content:
    with open(file_path) as f:
        f.wrtie(content)

Do you think it will make the code clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!
The notation you suggest is better than the existing my code.

I fixed.

@turbaszek
Copy link
Member

@j-y-matsubara would you mind a rebase? The k8s tests should be fixed now

@j-y-matsubara
Copy link
Contributor Author

@j-y-matsubara would you mind a rebase? The k8s tests should be fixed now

Sure.

All tests were passed!

@turbaszek turbaszek merged commit 5670e6f into apache:master Jul 4, 2020
@ldacey
Copy link
Contributor

ldacey commented Dec 3, 2021

Is it better to use a .airflowignore in the dags folder and the plugins folder?

My plugins folder is like this:

/plugins
  - /operators
  - /hooks
  - /sensors
  - /sql
  - /tests

Airflow needs access to the hooks and operators (my DAGs import them). It does not need access to the tests folder, and maybe not the SQL folder (some DAGs/operators read the SQL scripts from this path).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments