New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-1611] Customize logging config #2631
[AIRFLOW-1611] Customize logging config #2631
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2631 +/- ##
==========================================
- Coverage 71.75% 71.49% -0.27%
==========================================
Files 152 153 +1
Lines 11749 11745 -4
==========================================
- Hits 8431 8397 -34
- Misses 3318 3348 +30
Continue to review full report at Codecov.
|
UPDATING.md
Outdated
|
||
Airflow's logging has been rewritten to uses Python’s builtin `logging` module to perform logging of the application. By extending classes with the existing `LoggingMixin`, all the logging will go through a central logger. The main benefit is easier configuration of the logging by setting a single python file. Disclaimer; there is still some inline configuration, but this will be removed eventually. | ||
|
||
The loggers and handlers can be adjusted by placing a python file on the filsystem, and point the `logging_config_path` setting in your `airflow.cfg` to the configuration file. Please note that the variable that contains the config needs to be names `LOGGING_CONFIG`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be named
(vs names
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
# directly specify in the handler definitions. This is to | ||
# provide compatibility to older remote log folder settings. | ||
REMOTE_BASE_LOG_FOLDER = conf.get('core', 'REMOTE_BASE_LOG_FOLDER') | ||
S3_LOG_FOLDER = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for 1.9 we should have this (and others in this diff) still be present but deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is a big change, but by keeping this, we're not really cleaning up the legacy.
airflow/logging_config.py
Outdated
path = os.path.dirname(config_path) | ||
sys.path.append(path) | ||
|
||
filename = os.path.basename(config_path).replace('.py', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually work if it contains '/' in it? Or should this setting be called loging_config_module
?
i.e. conf/mylogging.py
vs conf.mylogging
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The custom config filename can be anything, but it has to be a python file because we expect a python dict, so conf/mylogging.py
would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I see how it's done, because we add the directory of the file to python path.
That might lead to some counter-intuitive behaviour if my config module is "deep" in another module somewhere (say mycompany.airflow.custom_logging
) which would then add mycompany/airflow/
into the python search path, meaning if it did it's own imports of mycompany.x
, they wouldn't work.
At the very least we need to document what we do with the python import path, though my preference here would be to leave any altering of the python path up to the user (via PYTHONPATH env var) and to accept something like what gunicorn does: $(MODULE_NAME):$(VARIABLE_NAME)
(where VARIABLE_NAME could instead be hard-coded to be LOGGING_CONFIG)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that it is appended to the path, not prepended. So it will not alter any existing imports. The only problem might be that the import is already existing.
What would you recommend? Before, I searched for a specific file on the PYTHONPATH. But this might be a bit more cumbersome for some users. On @bolkedebruin's suggestion I changed it so you now can set the path to the config file: logging_config_path=/home/fokko/airflow_config.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, good point about adding to the end of the path, not the start.
But here's a thought that would make this a smaller change, and mean we don't have to load another file. Given that airflow already loads 'airflow_local_settings.py', we could define a LOGGING_CONFIG dict in airflow.settings, and then use that, rather than a new python file just for logging.
i.e:
- airflow.settings module defines the base/default LOGGING_CONFIG in dict form
- due to the already existing
from airflow_local_settings import *
if that defines a newLOGGING_CONFIG
it would be used instead. configure_logging()
could then just look at airflow.settings.LOGGING_CONFIG.
Thoughts?
(Forgive me if I missed some discussions about this, I was on vacation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. I looked into how Django is doing this, and learned from here that they just import a module. So in the airflow.cfg
you'll have to specify:
LOGGING_CONFIG_CLASS = my.path.class.LOGGING_CONFIG
Which has to be on the PYTHONPATH, but Airflow will just search for this class and import the LOGGING_CONFIG dict, which is then loaded. What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds like a sensible approach to take, (and copying from Django is almost always a good idea to me 😄)
airflow/logging_config.py
Outdated
) | ||
except AttributeError: | ||
# Import default logging configurations. | ||
log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a warning rather than info, as it is likely a mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, thanks!
airflow/logging_config.py
Outdated
config_path = conf.get('core', 'logging_config_path') | ||
|
||
try: | ||
path = os.path.dirname(config_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for logging_config_path is an empty string, which I think causes the "Unable to load custom logging, using default config instead" message to be printed (I think)
It would be better to print nothing in the case of the default setting and just use the stock config without logging about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about setting the logging level to debug for this?
I'm not sure that we have the correct upgrade path from 1.8.x here -- i.e. if I am currently using S3 remote logging and I upgrade to this 1.9 it will silently stop working. I do like the idea of the change though. |
# Make sure that the variable is in scope | ||
assert (isinstance(logging_config, dict)) | ||
|
||
log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use .format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this way of formatting the logging messages throughout Airflow:
#2592
airflow/logging_config.py
Outdated
) | ||
from airflow.config_templates.default_airflow_local_settings import \ | ||
if logging_class_path == '': | ||
from airflow.config_templates.airflow_local_settings import \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use () for multiline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this is not working for imports:
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> (from math import exp)
File "<stdin>", line 1
(from math import exp)
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The things to import go in the brackets:
from math import (
exp as POW,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, love it
airflow/logging_config.py
Outdated
# Make sure that the variable is in scope | ||
assert (isinstance(logging_config, dict)) | ||
def configure_logging(): | ||
logging_class_path = conf.get("core", "logging_config_class") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails when not present (exception)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks
airflow/logging_config.py
Outdated
'Unable to load custom logging, using default config instead' | ||
) | ||
from airflow.config_templates.default_airflow_local_settings import \ | ||
if logging_class_path == '': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the above fails this is a bit ugly
airflow/utils/module_loading.py
Outdated
module_path, class_name = dotted_path.rsplit('.', 1) | ||
except ValueError as err: | ||
raise ImportError( | ||
'{} doesn\'t look like a module path'.format(dotted_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to escape if using “”
airflow/utils/module_loading.py
Outdated
import logging | ||
from importlib import import_module | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing the tests for this function ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait 😁
bd0e003
to
8193c7c
Compare
Took a while. I've extended the tests with a more complicated path |
my thinking on this was that |
@Fokko almost LGTM. Can you make clear in updating.md (if then wise) what one needs to do when currently using the S3 / GCS log? Ie. what do I need to do to make it work on 1.9.0. Then we can merge. |
8193c7c
to
494e5c6
Compare
I've updated the UPDATING.md, please let me know if it is clear. My technical writing is not one of my qualities. Also rebased against master. |
494e5c6
to
5bcc170
Compare
UPDATING.md
Outdated
|
||
IF you are explicitly setting the `REMOTE_BASE_LOG_FOLDER`, you will need to: | ||
- Create a logging configuration `airflow/confi_templates/airflow_logging_settings.py` and configure the remote paths. | ||
- Place it on the Python path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Place it in a directory inside the Python import path, ensuring that any
__init__.py
files exist so that it is importable.
One minor point to add to the updating docs, and I think it should be called out explicitly in UPDATING.md that to use S3Log or GCSLogs you will need a custom logging config -- otherwise people would miss it unless they checked the comment in the default template which they might not. |
5bcc170
to
e99f80c
Compare
@ashb Thanks, I've made it more explicit. |
@ashb Can you do a suggestion? |
Change the configuration of the logging to make use of the python logging and make the configuration easy configurable. Some of the settings which are now not needed anymore since they can easily be implemented in the config file.
e99f80c
to
d71aff1
Compare
Change the configuration of the logging to make use of the python logging and make the configuration easy configurable. Some of the settings which are now not needed anymore since they can easily be implemented in the config file. Closes #2631 from Fokko/AIRFLOW-1611-customize- logging-in-airflow (cherry picked from commit 3c3a65a) Signed-off-by: Bolke de Bruin <bolke@xs4all.nl>
Change the configuration of the logging to make use of the python logging and make the configuration easy configurable. Some of the settings which are now not needed anymore since they can easily be implemented in the config file. Closes apache#2631 from Fokko/AIRFLOW-1611-customize- logging-in-airflow
Change the configuration of the logging to make use of the python logging and make the configuration easy configurable. Some of the settings which are now not needed anymore since they can easily be implemented in the config file. Closes apache#2631 from Fokko/AIRFLOW-1611-customize- logging-in-airflow
Dear Airflow maintainers,
The current way of loading a custom logging config using the logging_config_path does not work:
https://codecov.io/gh/Fokko/incubator-airflow/src/master/airflow/settings.py#L180
Therefore I would like suggest to change this so we can import a logger configuration that is defined in Python using a dict. The logging_config_path property in the config file can be used to point to a custom logging configuration which will be imported.
Change the configuration of the logging to make use of the python logging and make the configuration easy configurable. Some of the settings which are now not needed anymore since they can easily be implemented in the Python config dict itself.
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Tests
Commits