Skip to content

Avoid log spam & have more meaningful log when pull image in DockerOperator#12763

Merged
XD-DENG merged 5 commits into
apache:masterfrom
XD-DENG:issue-12576
Dec 3, 2020
Merged

Avoid log spam & have more meaningful log when pull image in DockerOperator#12763
XD-DENG merged 5 commits into
apache:masterfrom
XD-DENG:issue-12576

Conversation

@XD-DENG
Copy link
Copy Markdown
Member

@XD-DENG XD-DENG commented Dec 2, 2020

Closes #12576

import docker
cli = docker.APIClient()
for x in cli.pull(self.image, stream=True, decode=True):
    print(x)
  • The change in this PR actually also makes the log more meaningful. When we pull an image, normally there are a few layers, simply printing the status is not very meaningful. It's much better to print the status by id.

Sample output when we pull image using docker.APIClient.pull(stream=True)

image


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

…erator

Fixing issue reported in apache#12576

This change actually also makes the log more meaningful
@XD-DENG XD-DENG requested review from ashb, kaxil and turbaszek December 2, 2020 20:07
@XD-DENG XD-DENG changed the title Avoid log spam when pull image in DockerOperator Avoid log spam & have more meaningful log when pull image in DockerOperator Dec 2, 2020
@XD-DENG XD-DENG added the type:bug-fix Changelog: Bug Fixes label Dec 2, 2020
@XD-DENG XD-DENG added this to the Airflow 2.0.0 (rc1) milestone Dec 2, 2020
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 2, 2020

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions Bot added the okay to merge It's ok to merge this PR as it does not require more tests label Dec 2, 2020
Mainly for the final two lines

{'status': 'Digest: sha256:589cc12df79de86631d447e09bf131791c661814ee3e235eaa81389f0778d6a0'}
{'status': 'Status: Downloaded newer image for python:latest'}
@XD-DENG
Copy link
Copy Markdown
Member Author

XD-DENG commented Dec 2, 2020

Hi @turbaszek , I added b543036 to address cases in the screenshot below

Default__Python_

Mind taking another look?

@XD-DENG XD-DENG requested a review from turbaszek December 2, 2020 21:27
Comment thread airflow/providers/docker/operators/docker.py
Comment on lines +293 to +296
if 'id' in output:
if latest_status.get(output['id']) != output['status']:
self.log.info("%s: %s", output['id'], output['status'])
latest_status[output['id']] = output['status']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if 'id' in output:
if latest_status.get(output['id']) != output['status']:
self.log.info("%s: %s", output['id'], output['status'])
latest_status[output['id']] = output['status']
if 'id' in output and latest_status.get(output['id']) != output['status']:
self.log.info("%s: %s", output['id'], output['status'])
latest_status[output['id']] = output['status']

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a chance that latest_status.get(output.get("id")) != output["status"] will also work

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change you proposed may not work, because if this if is not matched, it goes to else and print the status. Then it's no difference from the earlier "spamming" status.

This is why I have the nested ifs here. may you let me know if this clarification makes sense to you?

Copy link
Copy Markdown
Member

@turbaszek turbaszek Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maybe we should revert the ifs:

if 'id' not in output:
   self.log.info("%s", output['status'])
   continue

output_id, output_status = output["id"], output["status"]
if latest_status.get(output_id) != output_status:
   self.log.info("%s: %s", output_id, output_status)
   latest_status[output_id] = output_status

I'm not a fan of too many nested ifs as pylint likes to complain about them 😉

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we already cannot avoid having to apply disable=too-many-nested-blocks, but your suggestion above is definitely fair.

Have addressed it in 23e81de (with extremely minor change)

Co-authored-by: Tomek Urbaszek <turbaszek@gmail.com>
@XD-DENG XD-DENG merged commit 6b339c7 into apache:master Dec 3, 2020
@XD-DENG XD-DENG deleted the issue-12576 branch December 3, 2020 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

okay to merge It's ok to merge this PR as it does not require more tests type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DockerOperator causes log spam when pulling image

3 participants