Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-4928] Move all config parses to class properties #5557

Merged
merged 8 commits into from Sep 19, 2019

Conversation

serkef
Copy link
Contributor

@serkef serkef commented Jul 10, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Every airflow.configuration.conf method is being called multiple times. We can avoid redundant evaluation by storing the parsed value in the class namespace.

Tests

  • My PR does not need testing for this extremely good reason:
    Is already covered. No new features are implemented.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@codecov-io
Copy link

codecov-io commented Jul 10, 2019

Codecov Report

Merging #5557 into master will decrease coverage by 0.6%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5557      +/-   ##
==========================================
- Coverage   80.05%   79.45%   -0.61%     
==========================================
  Files         607      607              
  Lines       35040    35042       +2     
==========================================
- Hits        28053    27841     -212     
- Misses       6987     7201     +214
Impacted Files Coverage Δ
airflow/models/dagbag.py 92.52% <100%> (+0.07%) ⬆️
airflow/operators/postgres_operator.py 0% <0%> (-100%) ⬇️
airflow/operators/mysql_operator.py 0% <0%> (-100%) ⬇️
airflow/operators/mysql_to_hive.py 0% <0%> (-100%) ⬇️
airflow/operators/generic_transfer.py 0% <0%> (-100%) ⬇️
airflow/executors/celery_executor.py 40.74% <0%> (-35.56%) ⬇️
airflow/utils/log/wasb_task_handler.py 32.87% <0%> (-9.59%) ⬇️
airflow/utils/sqlalchemy.py 84.74% <0%> (-8.48%) ⬇️
airflow/utils/log/es_task_handler.py 88.07% <0%> (-3.67%) ⬇️
airflow/hooks/dbapi_hook.py 84.74% <0%> (-3.39%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31f19fb...d70acb4. Read the comment docs.

airflow/models/dagbag.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Jul 11, 2019

Did you notice a performance improvment from doing this, or is it just "this way is better"?

@serkef
Copy link
Contributor Author

serkef commented Jul 13, 2019

@ashb I don't have a benchmark. The default arguments are evaluated in parsing, but still we don't need to evaluate multiple times. Also, two changes are in functions' bodies which is executed multiple times. (One is in a loop for every task). Configuration doesn't store the values (in that case it would be "same" cost to retrieve it from there), but instead it is only storing the plaintext and it's evaluating and "parsing" each variable on each call - I'm not 100% sure but that's what I got from skimming through the code.
Let me know if you think this doesn't provide any value and will close it.

@@ -194,7 +200,7 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True):
if mod_name in sys.modules:
del sys.modules[mod_name]

with timeout(configuration.conf.getint('core', "DAGBAG_IMPORT_TIMEOUT")):
with timeout(self.dagbag_import_timeout):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make these class-level vars be constants - self.DAGBAG_IMPORT_TIMEOUT etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

airflow/models/dagbag.py Outdated Show resolved Hide resolved
@serkef
Copy link
Contributor Author

serkef commented Aug 25, 2019

@ashb Fixed your comments and rebased. Please take another look.

@ashb
Copy link
Member

ashb commented Sep 3, 2019

Just made a few cosmetic changes. Will merge once it's green. (Feel free to ping me if I forget again!)

@serkef
Copy link
Contributor Author

serkef commented Sep 18, 2019

@ashb It's good to go

@ashb ashb merged commit 23ec78a into apache:master Sep 19, 2019
ashb pushed a commit that referenced this pull request Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants