Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5478] Decode PythonVirtualenvOperator Output to Logs #6097

Closed
wants to merge 1 commit into from
Closed

[AIRFLOW-5478] Decode PythonVirtualenvOperator Output to Logs #6097

wants to merge 1 commit into from

Conversation

ramannanda
Copy link

  • Added decode to subprocess output

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-XXX
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@@ -334,7 +334,7 @@ def _execute_in_subprocess(self, cmd):
self.log.info("Executing cmd\n%s", cmd)
output = subprocess.check_output(cmd,
stderr=subprocess.STDOUT,
close_fds=True)
close_fds=True).decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From: https://docs.python.org/3/library/subprocess.html#subprocess.check_output

By default, this function will return the data as encoded bytes. The actual encoding of the output data may depend on the command being invoked, so the decoding to text will often need to be handled at the application level.

I think it is fine to decode it at that level, but then we should set the encoding in the check_output as well to encoding='utf-8', don't you think so?

Copy link
Member

@feluelle feluelle Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add an optional/kwarg to the __init__ to pass over the command-specific encoding/decoding. Because in some cases utf-8 might not be the right choice.

@mik-laj
Copy link
Member

mik-laj commented Sep 21, 2019

Travis is sad. Can you fix it?

@codecov-io
Copy link

codecov-io commented Sep 23, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@7506c95). Click here to learn what that means.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #6097   +/-   ##
=========================================
  Coverage          ?   80.05%           
=========================================
  Files             ?      608           
  Lines             ?    35054           
  Branches          ?        0           
=========================================
  Hits              ?    28061           
  Misses            ?     6993           
  Partials          ?        0
Impacted Files Coverage Δ
airflow/operators/python_operator.py 94.73% <71.42%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7506c95...3837026. Read the comment docs.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! looks great now with the optional parameter. Would you mind however to add a test for this functionality? It would be rather easy to extend tests in test_python_operator.py to test for this behaviour (and you can use the existing tests to see how things are mocked etc.).

This is super helpful if we decide to cherry-pick the change to v1-10-branch as only automated testing will help us to see if everything works after cherry-picking.

@OmerJog
Copy link
Contributor

OmerJog commented Nov 13, 2019

@ramannanda are you still working on this PR?

@mik-laj
Copy link
Member

mik-laj commented Nov 13, 2019

Is it done on master?

@stale
Copy link

stale bot commented Dec 28, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 28, 2019
@stale stale bot closed this Jan 4, 2020
@fj-sanchez
Copy link
Contributor

I can still see this issue... is it going to be fixed? What is missing here?

@mik-laj
Copy link
Member

mik-laj commented Apr 8, 2020

@fj-sanchez What version of Airflow? In Airflow 2.0, we did changes in this operator.
https://github.com/apache/airflow/blob/master/airflow/operators/python.py#L322

@fj-sanchez
Copy link
Contributor

I'm currently on 1.9.10.

@mik-laj
Copy link
Member

mik-laj commented Apr 8, 2020

This is not a significant error, so I suspect that it will not be fixed in Airflow 1.10. We try to limit changes in operators in airflow 1.10 series, because each change must be made for Airflow 2.0, and then manually repeated once again for Airflow 1.10.x. There is too much difference between these versions.

Here is discussion about ETA for Airflow 2.0: https://lists.apache.org/thread.html/r0abba3669962f101d787ad793611ba436d35c8e022aa565705778b7d%40%3Cdev.airflow.apache.org%3E

@fj-sanchez
Copy link
Contributor

Ok, I'll try to get this patch into our deployments then. Thank you.

@jasonstitt
Copy link

Curious why this is not a significant error as log output appears to be functionally unusable when newlines aren't printed as newlines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants