Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1582] Improve logging within Airflow #2592

Closed

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Sep 11, 2017

Dear Airflow maintainers,

By this PR I would like to reduce the several ways of logging into a single transparent logging structure using the already existing LoggingMixin. I've removed the logging.py and move to the airflow.utils.log.* interface which consists of the Python logging handlers. Got rid of settings the logging settings outside of the main settings/configuration code.

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@Fokko Fokko force-pushed the AIRFLOW-1582-Improve-logging-structure branch from 93e7da0 to ac30395 Compare September 11, 2017 15:33
@bolkedebruin
Copy link
Contributor

@criccomini @saguziel @aoen

We know it is massive, but mostly due to the refactoring. We will fully test it (prod) from tomorrow onwards. Obviously we like to keep iterations fast as other merges will require a rebase everytime.

@saguziel
Copy link
Contributor

This would be much easier to read if the logging -> log refactors were in a separate commit than the heavy lifting.

@Fokko Fokko force-pushed the AIRFLOW-1582-Improve-logging-structure branch from ac30395 to d6fbda9 Compare September 11, 2017 21:13
@aoen
Copy link
Contributor

aoen commented Sep 11, 2017

Agree with Alex.

@Fokko Fokko force-pushed the AIRFLOW-1582-Improve-logging-structure branch 2 times, most recently from b16a8fe to eebea51 Compare September 11, 2017 22:42
@codecov-io
Copy link

codecov-io commented Sep 11, 2017

Codecov Report

Merging #2592 into master will decrease coverage by 0.25%.
The diff coverage is 51.78%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2592      +/-   ##
=========================================
- Coverage   70.96%   70.7%   -0.26%     
=========================================
  Files         150     150              
  Lines       11607   11566      -41     
=========================================
- Hits         8237    8178      -59     
- Misses       3370    3388      +18
Impacted Files Coverage Δ
...rflow/api/common/experimental/get_task_instance.py 90% <ø> (-0.91%) ⬇️
airflow/hooks/presto_hook.py 36.95% <ø> (-2.63%) ⬇️
airflow/api/common/experimental/get_task.py 100% <ø> (ø) ⬆️
airflow/operators/pig_operator.py 0% <0%> (ø) ⬆️
airflow/operators/mssql_to_hive.py 0% <0%> (ø) ⬆️
airflow/operators/hive_to_druid.py 59.67% <0%> (-0.65%) ⬇️
airflow/operators/oracle_operator.py 0% <0%> (ø) ⬆️
airflow/operators/redshift_to_s3_operator.py 0% <0%> (ø) ⬆️
airflow/operators/slack_operator.py 0% <0%> (ø) ⬆️
airflow/executors/dask_executor.py 79.06% <0%> (-2.33%) ⬇️
... and 63 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5de632e...1e2a933. Read the comment docs.

Copy link
Contributor

@artwr artwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall PR looks solid to me, this was quite the undertaking. I really appreciated the time taken to unify the logging style, and converting the format calls to logging_function(msg, *args). There are a couple of things that I think need to be fixed:

  1. the handle_failure call with the wrong signature
  2. there is a couple of places with a _log which should probably be log, but happy to be corrected if I read it wrong.

Not blockers but nice to haves :

  1. Not changing the docstring of the spark-sql hook
  2. Some of the variable renaming could be done separately.

@@ -770,7 +778,7 @@ def monitor_gunicorn(gunicorn_master_proc):
gunicorn_master_proc_pid = int(f.read())
break
except IOError:
logging.debug("Waiting for gunicorn's pid file to be created.")
_log.debug("Waiting for gunicorn's pid file to be created.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be log instead of _log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch

'project not included in {var}: '
'{input}; using project "{project}"'.format(
var=var_name, input=table_input, project=default_project_id))
log = LoggingMixin().logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can this one not use the self.logger inherited from GoogleBigQueryBaseCursor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no GoogleBigQueryBaseCursor in scope that can be used, correct me if I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thrown because the function is being called a method, so I assumed it was part of BigQueryHook but isn't. LGTM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bring this up after the PR has been merged, but I think this pattern is less than ideal: using this means that all the cases outside of a class now have a log source of "airflow.utils.log.LoggingMixin.LoggingMixin":

In [7]: from airflow.utils.log.LoggingMixin import LoggingMixin

In [8]: l = LoggingMixin().logger

In [9]: l.name
Out[9]: 'airflow.utils.log.LoggingMixin.LoggingMixin'

Compare this with:

In [10]: from airflow.contrib.hooks.bigquery_hook import BigQueryHook

In [11]: BigQueryHook().logger.name
Out[11]: 'airflow.contrib.hooks.bigquery_hook.BigQueryHook'

That advantage of the latter is that different classes can be set at different level log. In a case like this where we need to log from outside a class we loose that ability.

That said what we do is probably not worse in many cases where we were just using the root logger, this just happens to be one of the few cases that was using a differently named logger. So ideally in cases like this we'd somehow pass __module__ in, rather than getting the module from the class.

An example of how this is different:

In [2]: from airflow.contrib.hooks.bigquery_hook import BigQueryHook

In [3]: from airflow.utils.log.LoggingMixin import LoggingMixin

In [4]: lm = LoggingMixin()

In [5]: h = BigQueryHook()

In [6]: h.logger.info("hook")
[2017-09-13 09:16:14,422] {<ipython-input-6-d9ef8875e3af>:1} INFO - hook

In [7]: lm.logger.info("mixin")
[2017-09-13 09:16:22,798] {<ipython-input-7-f4ed3f594263>:1} INFO - mixin

In [8]: import logging

In [9]: logging.getLogger('airflow.contrib').setLevel(logging.ERROR)

In [10]: lm.logger.info("mixin")
[2017-09-13 09:17:58,043] {<ipython-input-10-f4ed3f594263>:1} INFO - mixin

In [11]: h.logger.info("hook")

In [12]: # Note how there was no log produced

In [13]: h.logger.error("but error still is")
[2017-09-13 09:19:32,870] {<ipython-input-13-0b7830c22006>:1} ERROR - but error still is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think on balance this change is still massively better, just wanted to point this out. If I have time I'll see about a PR to fix this behaviour.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @ashb . I think this should be addressed with a better way of getting the logger. LoggingMixin().logger seems a bit off anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very valid issue. Thank you for pointing out. I will push a fix soon:
https://issues.apache.org/jira/browse/AIRFLOW-1605

@@ -32,15 +32,17 @@ def __init__(self, cloudant_conn_id='cloudant_default'):
self.cloudant_conn_id = cloudant_conn_id

def get_conn(self):
def _str(s):
def _str(string):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why rename s for the name of a builtin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This s is also passed to the logging, and I don't like one letter variables. I've revoked the change to change as little as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you about one letter variables, but for a short method, I think it is fine. I am not a fan of variable names which shadow builtins :)

from oauth2client.client import GoogleCredentials

logging.getLogger('GoogleCloudMLEngine').setLevel(settings.LOGGING_LEVEL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put a warning in the release notes, because this will change the behavior of this operator in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@@ -171,8 +169,9 @@ def _to_timestamp(cls, col):
try:
col = pd.to_datetime(col)
except ValueError:
logging.warning(
"Could not convert field to timestamps: {0}".format(col.name)
log = LoggingMixin().logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense here because of the classmethod.

@@ -1361,15 +1356,14 @@ def _check_and_change_state_before_execution(
# have been running prematurely. This should be handled in the
# scheduling mechanism.
self.state = State.NONE
msg = ("FIXME: Rescheduling due to concurrency limits reached at task "
msg = ("%s FIXME: Rescheduling due to concurrency limits reached at task "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong to me, because no parameter is passed below. You might want to just remove %s in this instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went wrong there, thanks!

@@ -130,8 +132,9 @@ def _configure_logging(logging_level):
try:
_configure_logging(logging_level)
except ValueError:
logging.warning("Logging level {} is not defined. "
"Use default.".format(logging_level))
_log.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_log or log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -136,7 +142,7 @@ def read(self, task_instance, try_number=None):
next_try = task_instance.try_number

if try_number is None:
try_numbers = list(range(next_try))
try_numbers = range(next_try)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work in Python 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why this was changed. I've reverted it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this should say Python 3 since range returns a list indeed in Python 2

import sys
import time
import unittest

from mock import patch
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep it alphabetized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my IDE.

try:
import mock
except ImportError:
mock = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking to learn: how does this play with the from mock import ... below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is a different PR, but we should make the way we import mock consistent, because I see two ways of doing it:

try:
    from unittest import mock
except ImportError:
    try:
        import mock
    except ImportError:
        mock = None

or just:

import mock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes what I meant is that your have both a top level module import and a from mock import ... below. I usually do one or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see now. I've removed the new import completely because it isn't required as you mentioned. Beside that I've also created a new ticket of cleaning up the way we import mock:
https://issues.apache.org/jira/browse/AIRFLOW-1596?jql=project%20%3D%20AIRFLOW

@Fokko
Copy link
Contributor Author

Fokko commented Sep 12, 2017

@saguziel I can understand the concern of having multiple commits. The thing is, it took me a couple of iterations through the code to get to the place where it is now, and because of that the separate commits did not really help.

@artwr Thanks for the elaborate feedback!! I'll get back to you. Cheers

@@ -366,8 +365,8 @@ def kill_zombies(self, session=None):
if ti.task_id in dag.task_ids:
task = dag.get_task(ti.task_id)
ti.task = task
ti.handle_failure("%s killed as zombie", ti)
self.logger.info('Marked zombie job %s as failed', ti)
ti.handle_failure("%s killed as zombie".format(ti))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want {} if you use format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed in the latest commit, thanks

@Fokko Fokko force-pushed the AIRFLOW-1582-Improve-logging-structure branch from d70e6ed to dfa7ba9 Compare September 12, 2017 06:29
@artwr
Copy link
Contributor

artwr commented Sep 12, 2017

Alright, for me the couple blockers have been fixed. I am +1 on this as is, and if there is more we can fix down the line. A couple of action items are :

  1. See what the impact will be for boto logging (non critical)
  2. Adding a note in this PR about the change in behavior for the ml engine operator to make it easy on the next release manager.
  3. I am also fine deleting the "line break" which is not really a line break in the databricks operator. I disagree with @bolkedebruin's characterization as tech debt, I do not know of a better way to do what this particular code achieves. But it's not a huge deal and can be brought back later if needed.

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Sep 12, 2017

I'm also in favor of committing, with the following conditions:

  1. Documentation on the usage of logging to be added in a new PR and design description (you could take Django's as an example)
  2. Add a note to UPDATING.md detailing how one gets from the previous situation to the new situation
  3. Babysit any issues occurring after the commit (of course we will help)

What say you @Fokko ?

O and can you make it all into one commit with a proper (more elaborate) description?

@Fokko Fokko force-pushed the AIRFLOW-1582-Improve-logging-structure branch from 1e2a933 to 5a4ef24 Compare September 12, 2017 21:29
Clean the way of logging within Airflow. Remove the old logging.py and
move to the airflow.utils.log.* interface. Remove setting the logging
outside of the settings/configuration code. Move away from the string
format to logging_function(msg, *args).
@Fokko Fokko force-pushed the AIRFLOW-1582-Improve-logging-structure branch from 5a4ef24 to 79db684 Compare September 12, 2017 21:34
@Fokko
Copy link
Contributor Author

Fokko commented Sep 12, 2017

@bolkedebruin Sounds like a perfect plan.

  1. I've created a new ticket for it https://issues.apache.org/jira/browse/AIRFLOW-1599. Thanks for pointing out the Django documentation, this is a nice base.
  2. A paragraph has been added to the UPDATING.md
  3. Speaks for itself

I've rebased all the commits and elaborated on the message.

@asfgit asfgit closed this in a7a5189 Sep 13, 2017
@bolkedebruin
Copy link
Contributor

Merged!

kewl!

@@ -3529,7 +3522,8 @@ def pickle_info(self, session=None):
d['pickle_len'] = len(pickled)
d['pickling_duration'] = "{}".format(datetime.now() - dttm)
except Exception as e:
logging.debug(e)
log = LoggingMixin().logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not using self.logger? The logging mixin has been removed form this class which seems odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the DAG only has two log statements in there (over 1100). As far as I know DAG object did not extend the LoggingMixing class. Personally I prefer to extend the DAG (or even the BaseDAG) class and use the self.logger. But because this is such a large and fundamental class, I did not want to make too many changes to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maken sense to have a logger available in DAG. Preferably with the name of the DAG appended for the things it is logging

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I pointed this case out is that it's a method on DAG, and it did have the LoggingMixin, but we removed that in this PR (lines -2756/+2749)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, now I think of it. It allows also the users to re-use the dag's logger to emit logs statements.

Copy link
Contributor Author

@Fokko Fokko Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fokko Fokko deleted the AIRFLOW-1582-Improve-logging-structure branch September 15, 2017 13:04
stlava pushed a commit to Nextdoor/airflow that referenced this pull request Sep 4, 2019
Clean the way of logging within Airflow. Remove
the old logging.py and
move to the airflow.utils.log.* interface. Remove
setting the logging
outside of the settings/configuration code. Move
away from the string
format to logging_function(msg, *args).

Closes apache#2592 from Fokko/AIRFLOW-1582-Improve-
logging-structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants