Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1611] Customize logging config #2631

Closed

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Sep 25, 2017

Dear Airflow maintainers,

The current way of loading a custom logging config using the logging_config_path does not work:
https://codecov.io/gh/Fokko/incubator-airflow/src/master/airflow/settings.py#L180

Therefore I would like suggest to change this so we can import a logger configuration that is defined in Python using a dict. The logging_config_path property in the config file can be used to point to a custom logging configuration which will be imported.

Change the configuration of the logging to make use of the python logging and make the configuration easy configurable. Some of the settings which are now not needed anymore since they can easily be implemented in the Python config dict itself.

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@codecov-io
Copy link

codecov-io commented Sep 25, 2017

Codecov Report

Merging #2631 into master will decrease coverage by 0.26%.
The diff coverage is 97.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2631      +/-   ##
==========================================
- Coverage   71.75%   71.49%   -0.27%     
==========================================
  Files         152      153       +1     
  Lines       11749    11745       -4     
==========================================
- Hits         8431     8397      -34     
- Misses       3318     3348      +30
Impacted Files Coverage Δ
airflow/config_templates/airflow_local_settings.py 100% <ø> (ø)
airflow/logging_config.py 100% <100%> (ø)
airflow/settings.py 92.53% <100%> (+3.4%) ⬆️
airflow/www/app.py 96.47% <100%> (ø) ⬆️
airflow/jobs.py 77.7% <50%> (+0.09%) ⬆️
airflow/utils/log/s3_task_handler.py 0% <0%> (-26.87%) ⬇️
airflow/utils/log/gcs_task_handler.py 0% <0%> (-23.81%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f0798f...e99f80c. Read the comment docs.

@Fokko Fokko changed the title [AIRFLOW-1611] Customize logging [AIRFLOW-1611] Customize logging config Sep 26, 2017
UPDATING.md Outdated

Airflow's logging has been rewritten to uses Python’s builtin `logging` module to perform logging of the application. By extending classes with the existing `LoggingMixin`, all the logging will go through a central logger. The main benefit is easier configuration of the logging by setting a single python file. Disclaimer; there is still some inline configuration, but this will be removed eventually.

The loggers and handlers can be adjusted by placing a python file on the filsystem, and point the `logging_config_path` setting in your `airflow.cfg` to the configuration file. Please note that the variable that contains the config needs to be names `LOGGING_CONFIG`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be named (vs names)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

# directly specify in the handler definitions. This is to
# provide compatibility to older remote log folder settings.
REMOTE_BASE_LOG_FOLDER = conf.get('core', 'REMOTE_BASE_LOG_FOLDER')
S3_LOG_FOLDER = ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for 1.9 we should have this (and others in this diff) still be present but deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is a big change, but by keeping this, we're not really cleaning up the legacy.

path = os.path.dirname(config_path)
sys.path.append(path)

filename = os.path.basename(config_path).replace('.py', '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work if it contains '/' in it? Or should this setting be called loging_config_module?

i.e. conf/mylogging.py vs conf.mylogging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom config filename can be anything, but it has to be a python file because we expect a python dict, so conf/mylogging.py would be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I see how it's done, because we add the directory of the file to python path.

That might lead to some counter-intuitive behaviour if my config module is "deep" in another module somewhere (say mycompany.airflow.custom_logging) which would then add mycompany/airflow/ into the python search path, meaning if it did it's own imports of mycompany.x, they wouldn't work.

At the very least we need to document what we do with the python import path, though my preference here would be to leave any altering of the python path up to the user (via PYTHONPATH env var) and to accept something like what gunicorn does: $(MODULE_NAME):$(VARIABLE_NAME) (where VARIABLE_NAME could instead be hard-coded to be LOGGING_CONFIG)

Copy link
Contributor Author

@Fokko Fokko Sep 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that it is appended to the path, not prepended. So it will not alter any existing imports. The only problem might be that the import is already existing.

What would you recommend? Before, I searched for a specific file on the PYTHONPATH. But this might be a bit more cumbersome for some users. On @bolkedebruin's suggestion I changed it so you now can set the path to the config file: logging_config_path=/home/fokko/airflow_config.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, good point about adding to the end of the path, not the start.

But here's a thought that would make this a smaller change, and mean we don't have to load another file. Given that airflow already loads 'airflow_local_settings.py', we could define a LOGGING_CONFIG dict in airflow.settings, and then use that, rather than a new python file just for logging.

i.e:

  • airflow.settings module defines the base/default LOGGING_CONFIG in dict form
  • due to the already existing from airflow_local_settings import * if that defines a new LOGGING_CONFIG it would be used instead.
  • configure_logging() could then just look at airflow.settings.LOGGING_CONFIG.

Thoughts?

(Forgive me if I missed some discussions about this, I was on vacation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. I looked into how Django is doing this, and learned from here that they just import a module. So in the airflow.cfg you'll have to specify:

LOGGING_CONFIG_CLASS = my.path.class.LOGGING_CONFIG

Which has to be on the PYTHONPATH, but Airflow will just search for this class and import the LOGGING_CONFIG dict, which is then loaded. What do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds like a sensible approach to take, (and copying from Django is almost always a good idea to me 😄)

)
except AttributeError:
# Import default logging configurations.
log.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a warning rather than info, as it is likely a mistake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks!

config_path = conf.get('core', 'logging_config_path')

try:
path = os.path.dirname(config_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default for logging_config_path is an empty string, which I think causes the "Unable to load custom logging, using default config instead" message to be printed (I think)

It would be better to print nothing in the case of the default setting and just use the stock config without logging about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about setting the logging level to debug for this?

@ashb
Copy link
Member

ashb commented Sep 26, 2017

I'm not sure that we have the correct upgrade path from 1.8.x here -- i.e. if I am currently using S3 remote logging and I upgrade to this 1.9 it will silently stop working.

I do like the idea of the change though.

# Make sure that the variable is in scope
assert (isinstance(logging_config, dict))

log.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use .format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this way of formatting the logging messages throughout Airflow:
#2592

)
from airflow.config_templates.default_airflow_local_settings import \
if logging_class_path == '':
from airflow.config_templates.airflow_local_settings import \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use () for multiline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this is not working for imports:

[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> (from math import exp)
  File "<stdin>", line 1
    (from math import exp)
        ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The things to import go in the brackets:

from math import (
  exp as POW,
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, love it

# Make sure that the variable is in scope
assert (isinstance(logging_config, dict))
def configure_logging():
logging_class_path = conf.get("core", "logging_config_class")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails when not present (exception)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

'Unable to load custom logging, using default config instead'
)
from airflow.config_templates.default_airflow_local_settings import \
if logging_class_path == '':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the above fails this is a bit ugly

module_path, class_name = dotted_path.rsplit('.', 1)
except ValueError as err:
raise ImportError(
'{} doesn\'t look like a module path'.format(dotted_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to escape if using “”

import logging
from importlib import import_module


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing the tests for this function ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait 😁

@Fokko Fokko force-pushed the AIRFLOW-1611-customize-logging-in-airflow branch 9 times, most recently from bd0e003 to 8193c7c Compare September 29, 2017 11:18
@Fokko
Copy link
Contributor Author

Fokko commented Sep 29, 2017

Took a while. I've extended the tests with a more complicated path etc.airflow.config.custom_airflow_local_settings.LOGGING_CONFIG. This caused the tests to fail because the temporary path that I was creating required those __init__.py files for Python 2.7.

@ahvigil
Copy link

ahvigil commented Sep 29, 2017

my thinking on this was that logging_config_path could point to a json file that could be json.loaded, and the resulting dict could be passed to logging.config.dictConfig. Its a bit annoying and not entirely necessary requiring an importable object on the python path (supporting logging_config_path=/path/to/my/config.json would be nice), but maybe this can be a follow up issue.

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Sep 30, 2017

@Fokko almost LGTM. Can you make clear in updating.md (if then wise) what one needs to do when currently using the S3 / GCS log? Ie. what do I need to do to make it work on 1.9.0. Then we can merge.

@Fokko Fokko force-pushed the AIRFLOW-1611-customize-logging-in-airflow branch from 8193c7c to 494e5c6 Compare October 1, 2017 11:52
@Fokko
Copy link
Contributor Author

Fokko commented Oct 1, 2017

I've updated the UPDATING.md, please let me know if it is clear. My technical writing is not one of my qualities. Also rebased against master.

@Fokko Fokko force-pushed the AIRFLOW-1611-customize-logging-in-airflow branch from 494e5c6 to 5bcc170 Compare October 1, 2017 12:06
UPDATING.md Outdated

IF you are explicitly setting the `REMOTE_BASE_LOG_FOLDER`, you will need to:
- Create a logging configuration `airflow/confi_templates/airflow_logging_settings.py` and configure the remote paths.
- Place it on the Python path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Place it in a directory inside the Python import path, ensuring that any __init__.py files exist so that it is importable.

@ashb
Copy link
Member

ashb commented Oct 1, 2017

One minor point to add to the updating docs, and I think it should be called out explicitly in UPDATING.md that to use S3Log or GCSLogs you will need a custom logging config -- otherwise people would miss it unless they checked the comment in the default template which they might not.

@Fokko Fokko force-pushed the AIRFLOW-1611-customize-logging-in-airflow branch from 5bcc170 to e99f80c Compare October 1, 2017 12:50
@Fokko
Copy link
Contributor Author

Fokko commented Oct 1, 2017

@ashb Thanks, I've made it more explicit.

@Fokko
Copy link
Contributor Author

Fokko commented Oct 1, 2017

@ashb Can you do a suggestion?

Change the configuration of the logging to make use of the python
logging and make the configuration easy configurable. Some of the
settings which are now not needed anymore since they can easily
be implemented in the config file.
@Fokko Fokko force-pushed the AIRFLOW-1611-customize-logging-in-airflow branch from e99f80c to d71aff1 Compare October 2, 2017 11:27
asfgit pushed a commit that referenced this pull request Oct 2, 2017
Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes #2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow

(cherry picked from commit 3c3a65a)
Signed-off-by: Bolke de Bruin <bolke@xs4all.nl>
@asfgit asfgit closed this in 3c3a65a Oct 2, 2017
mrares pushed a commit to mrares/incubator-airflow that referenced this pull request Oct 7, 2017
Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes apache#2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow
mrares pushed a commit to mrares/incubator-airflow that referenced this pull request Dec 5, 2017
Change the configuration of the logging to make
use of the python
logging and make the configuration easy
configurable. Some of the
settings which are now not needed anymore since
they can easily
be implemented in the config file.

Closes apache#2631 from Fokko/AIRFLOW-1611-customize-
logging-in-airflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants