Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse recently modified files even if just parsed #16075

Merged
merged 1 commit into from May 26, 2021

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented May 25, 2021

This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
min_file_process_interval.

This way you can increase [scheduler] min_file_process_interval to
a higher value like 600 or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@kaxil kaxil requested a review from ashb May 25, 2021 22:21
@boring-cyborg boring-cyborg bot added the area:Scheduler Scheduler or dag parsing Issues label May 25, 2021
@kaxil kaxil requested a review from ephraimbuddy May 25, 2021 22:22
@kaxil kaxil added this to the Airflow 2.1.1 milestone May 25, 2021
@kaxil kaxil force-pushed the mtime-optimization branch 2 times, most recently from 4edcd5b to d727034 Compare May 25, 2021 22:56
This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
`min_file_process_interval`.

This way you can increase `[scheduler] min_file_process_interval` to
a higher value like `600` or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label May 26, 2021
@kaxil kaxil merged commit add7490 into apache:master May 26, 2021
@kaxil kaxil deleted the mtime-optimization branch May 26, 2021 10:29
kaxil added a commit to astronomer/airflow that referenced this pull request Jun 2, 2021
This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
`min_file_process_interval`.

This way you can increase `[scheduler] min_file_process_interval` to
a higher value like `600` or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.

(cherry picked from commit add7490)
kaxil added a commit to astronomer/airflow that referenced this pull request Jun 2, 2021
This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
`min_file_process_interval`.

This way you can increase `[scheduler] min_file_process_interval` to
a higher value like `600` or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.

(cherry picked from commit add7490)
(cherry picked from commit 19b3f1b)
kaxil added a commit to astronomer/airflow that referenced this pull request Jun 2, 2021
This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
`min_file_process_interval`.

This way you can increase `[scheduler] min_file_process_interval` to
a higher value like `600` or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.

(cherry picked from commit add7490)
(cherry picked from commit 19b3f1b)
(cherry picked from commit cb21b0a)
jhtimmins pushed a commit to astronomer/airflow that referenced this pull request Jun 3, 2021
This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
`min_file_process_interval`.

This way you can increase `[scheduler] min_file_process_interval` to
a higher value like `600` or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.

(cherry picked from commit add7490)
ashb pushed a commit that referenced this pull request Jun 22, 2021
This commit adds an optimization where the recently modified files
(detected by mtime) will be parsed even though it has not reached
`min_file_process_interval`.

This way you can increase `[scheduler] min_file_process_interval` to
a higher value like `600` or so when you have large number of files to
avoid unnecessary reparsing if files haven't changed, while still making
sure that modified files are taken care of.

(cherry picked from commit add7490)
kaxil added a commit to astronomer/airflow that referenced this pull request Aug 10, 2021
This feature was added in apache#16075. This PR adds it to docs to avoid situations like apache#17437

closes apache#17437
kaxil added a commit that referenced this pull request Aug 10, 2021
This feature was added in #16075. This PR adds it to docs to avoid situations like #17437

closes #17437
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 10, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 10, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 27, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
This feature was added in apache/airflow#16075. This PR adds it to docs to avoid situations like apache/airflow#17437

closes apache/airflow#17437

GitOrigin-RevId: 7dfc52068c75b01a309bf07be3696ad1f7f9b9e2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants