-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-1582] Improve logging within Airflow #2592
Conversation
93e7da0
to
ac30395
Compare
We know it is massive, but mostly due to the refactoring. We will fully test it (prod) from tomorrow onwards. Obviously we like to keep iterations fast as other merges will require a rebase everytime. |
This would be much easier to read if the |
ac30395
to
d6fbda9
Compare
Agree with Alex. |
b16a8fe
to
eebea51
Compare
Codecov Report
@@ Coverage Diff @@
## master #2592 +/- ##
=========================================
- Coverage 70.96% 70.7% -0.26%
=========================================
Files 150 150
Lines 11607 11566 -41
=========================================
- Hits 8237 8178 -59
- Misses 3370 3388 +18
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall PR looks solid to me, this was quite the undertaking. I really appreciated the time taken to unify the logging style, and converting the format
calls to logging_function(msg, *args)
. There are a couple of things that I think need to be fixed:
- the handle_failure call with the wrong signature
- there is a couple of places with a
_log
which should probably belog
, but happy to be corrected if I read it wrong.
Not blockers but nice to haves :
- Not changing the docstring of the spark-sql hook
- Some of the variable renaming could be done separately.
airflow/bin/cli.py
Outdated
@@ -770,7 +778,7 @@ def monitor_gunicorn(gunicorn_master_proc): | |||
gunicorn_master_proc_pid = int(f.read()) | |||
break | |||
except IOError: | |||
logging.debug("Waiting for gunicorn's pid file to be created.") | |||
_log.debug("Waiting for gunicorn's pid file to be created.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be log
instead of _log
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch
'project not included in {var}: ' | ||
'{input}; using project "{project}"'.format( | ||
var=var_name, input=table_input, project=default_project_id)) | ||
log = LoggingMixin().logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can this one not use the self.logger inherited from GoogleBigQueryBaseCursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no GoogleBigQueryBaseCursor in scope that can be used, correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thrown because the function is being called a method, so I assumed it was part of BigQueryHook but isn't. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bring this up after the PR has been merged, but I think this pattern is less than ideal: using this means that all the cases outside of a class now have a log source of "airflow.utils.log.LoggingMixin.LoggingMixin":
In [7]: from airflow.utils.log.LoggingMixin import LoggingMixin
In [8]: l = LoggingMixin().logger
In [9]: l.name
Out[9]: 'airflow.utils.log.LoggingMixin.LoggingMixin'
Compare this with:
In [10]: from airflow.contrib.hooks.bigquery_hook import BigQueryHook
In [11]: BigQueryHook().logger.name
Out[11]: 'airflow.contrib.hooks.bigquery_hook.BigQueryHook'
That advantage of the latter is that different classes can be set at different level log. In a case like this where we need to log from outside a class we loose that ability.
That said what we do is probably not worse in many cases where we were just using the root logger, this just happens to be one of the few cases that was using a differently named logger. So ideally in cases like this we'd somehow pass __module__
in, rather than getting the module from the class.
An example of how this is different:
In [2]: from airflow.contrib.hooks.bigquery_hook import BigQueryHook
In [3]: from airflow.utils.log.LoggingMixin import LoggingMixin
In [4]: lm = LoggingMixin()
In [5]: h = BigQueryHook()
In [6]: h.logger.info("hook")
[2017-09-13 09:16:14,422] {<ipython-input-6-d9ef8875e3af>:1} INFO - hook
In [7]: lm.logger.info("mixin")
[2017-09-13 09:16:22,798] {<ipython-input-7-f4ed3f594263>:1} INFO - mixin
In [8]: import logging
In [9]: logging.getLogger('airflow.contrib').setLevel(logging.ERROR)
In [10]: lm.logger.info("mixin")
[2017-09-13 09:17:58,043] {<ipython-input-10-f4ed3f594263>:1} INFO - mixin
In [11]: h.logger.info("hook")
In [12]: # Note how there was no log produced
In [13]: h.logger.error("but error still is")
[2017-09-13 09:19:32,870] {<ipython-input-13-0b7830c22006>:1} ERROR - but error still is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think on balance this change is still massively better, just wanted to point this out. If I have time I'll see about a PR to fix this behaviour.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @ashb . I think this should be addressed with a better way of getting the logger. LoggingMixin().logger seems a bit off anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very valid issue. Thank you for pointing out. I will push a fix soon:
https://issues.apache.org/jira/browse/AIRFLOW-1605
@@ -32,15 +32,17 @@ def __init__(self, cloudant_conn_id='cloudant_default'): | |||
self.cloudant_conn_id = cloudant_conn_id | |||
|
|||
def get_conn(self): | |||
def _str(s): | |||
def _str(string): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why rename s
for the name of a builtin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This s
is also passed to the logging, and I don't like one letter variables. I've revoked the change to change as little as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you about one letter variables, but for a short method, I think it is fine. I am not a fan of variable names which shadow builtins :)
from oauth2client.client import GoogleCredentials | ||
|
||
logging.getLogger('GoogleCloudMLEngine').setLevel(settings.LOGGING_LEVEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put a warning in the release notes, because this will change the behavior of this operator in particular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
@@ -171,8 +169,9 @@ def _to_timestamp(cls, col): | |||
try: | |||
col = pd.to_datetime(col) | |||
except ValueError: | |||
logging.warning( | |||
"Could not convert field to timestamps: {0}".format(col.name) | |||
log = LoggingMixin().logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense here because of the classmethod.
airflow/models.py
Outdated
@@ -1361,15 +1356,14 @@ def _check_and_change_state_before_execution( | |||
# have been running prematurely. This should be handled in the | |||
# scheduling mechanism. | |||
self.state = State.NONE | |||
msg = ("FIXME: Rescheduling due to concurrency limits reached at task " | |||
msg = ("%s FIXME: Rescheduling due to concurrency limits reached at task " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong to me, because no parameter is passed below. You might want to just remove %s
in this instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went wrong there, thanks!
airflow/settings.py
Outdated
@@ -130,8 +132,9 @@ def _configure_logging(logging_level): | |||
try: | |||
_configure_logging(logging_level) | |||
except ValueError: | |||
logging.warning("Logging level {} is not defined. " | |||
"Use default.".format(logging_level)) | |||
_log.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_log
or log
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@@ -136,7 +142,7 @@ def read(self, task_instance, try_number=None): | |||
next_try = task_instance.try_number | |||
|
|||
if try_number is None: | |||
try_numbers = list(range(next_try)) | |||
try_numbers = range(next_try) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work in Python 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know why this was changed. I've reverted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this should say Python 3 since range returns a list indeed in Python 2
tests/operators/sensors.py
Outdated
import sys | ||
import time | ||
import unittest | ||
|
||
from mock import patch | ||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep it alphabetized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my IDE.
try: | ||
import mock | ||
except ImportError: | ||
mock = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking to learn: how does this play with the from mock import ...
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This is a different PR, but we should make the way we import mock
consistent, because I see two ways of doing it:
try:
from unittest import mock
except ImportError:
try:
import mock
except ImportError:
mock = None
or just:
import mock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes what I meant is that your have both a top level module import and a from mock import ...
below. I usually do one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see now. I've removed the new import completely because it isn't required as you mentioned. Beside that I've also created a new ticket of cleaning up the way we import mock:
https://issues.apache.org/jira/browse/AIRFLOW-1596?jql=project%20%3D%20AIRFLOW
@saguziel I can understand the concern of having multiple commits. The thing is, it took me a couple of iterations through the code to get to the place where it is now, and because of that the separate commits did not really help. @artwr Thanks for the elaborate feedback!! I'll get back to you. Cheers |
airflow/models.py
Outdated
@@ -366,8 +365,8 @@ def kill_zombies(self, session=None): | |||
if ti.task_id in dag.task_ids: | |||
task = dag.get_task(ti.task_id) | |||
ti.task = task | |||
ti.handle_failure("%s killed as zombie", ti) | |||
self.logger.info('Marked zombie job %s as failed', ti) | |||
ti.handle_failure("%s killed as zombie".format(ti)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want {}
if you use format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixed in the latest commit, thanks
d70e6ed
to
dfa7ba9
Compare
Alright, for me the couple blockers have been fixed. I am +1 on this as is, and if there is more we can fix down the line. A couple of action items are :
|
I'm also in favor of committing, with the following conditions:
What say you @Fokko ? O and can you make it all into one commit with a proper (more elaborate) description? |
1e2a933
to
5a4ef24
Compare
Clean the way of logging within Airflow. Remove the old logging.py and move to the airflow.utils.log.* interface. Remove setting the logging outside of the settings/configuration code. Move away from the string format to logging_function(msg, *args).
5a4ef24
to
79db684
Compare
@bolkedebruin Sounds like a perfect plan.
I've rebased all the commits and elaborated on the message. |
@@ -3529,7 +3522,8 @@ def pickle_info(self, session=None): | |||
d['pickle_len'] = len(pickled) | |||
d['pickling_duration'] = "{}".format(datetime.now() - dttm) | |||
except Exception as e: | |||
logging.debug(e) | |||
log = LoggingMixin().logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not using self.logger? The logging mixin has been removed form this class which seems odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the DAG
only has two log statements in there (over 1100). As far as I know DAG object did not extend the LoggingMixing class. Personally I prefer to extend the DAG
(or even the BaseDAG
) class and use the self.logger
. But because this is such a large and fundamental class, I did not want to make too many changes to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maken sense to have a logger available in DAG. Preferably with the name of the DAG appended for the things it is logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I pointed this case out is that it's a method on DAG, and it did have the LoggingMixin, but we removed that in this PR (lines -2756/+2749)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now I think of it. It allows also the users to re-use the dag's logger to emit logs statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation #2602
Clean the way of logging within Airflow. Remove the old logging.py and move to the airflow.utils.log.* interface. Remove setting the logging outside of the settings/configuration code. Move away from the string format to logging_function(msg, *args). Closes apache#2592 from Fokko/AIRFLOW-1582-Improve- logging-structure
Dear Airflow maintainers,
By this PR I would like to reduce the several ways of logging into a single transparent logging structure using the already existing
LoggingMixin
. I've removed the logging.py and move to theairflow.utils.log.*
interface which consists of the Python logging handlers. Got rid of settings the logging settings outside of the main settings/configuration code.Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Tests
Commits