New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-5478] Decode PythonVirtualenvOperator Output to Logs #6097
Conversation
airflow/operators/python_operator.py
Outdated
@@ -334,7 +334,7 @@ def _execute_in_subprocess(self, cmd): | |||
self.log.info("Executing cmd\n%s", cmd) | |||
output = subprocess.check_output(cmd, | |||
stderr=subprocess.STDOUT, | |||
close_fds=True) | |||
close_fds=True).decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From: https://docs.python.org/3/library/subprocess.html#subprocess.check_output
By default, this function will return the data as encoded bytes. The actual encoding of the output data may depend on the command being invoked, so the decoding to text will often need to be handled at the application level.
I think it is fine to decode it at that level, but then we should set the encoding in the check_output
as well to encoding='utf-8'
, don't you think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to add an optional/kwarg to the __init__
to pass over the command-specific encoding/decoding. Because in some cases utf-8
might not be the right choice.
Travis is sad. Can you fix it? |
Codecov Report
@@ Coverage Diff @@
## master #6097 +/- ##
=========================================
Coverage ? 80.05%
=========================================
Files ? 608
Lines ? 35054
Branches ? 0
=========================================
Hits ? 28061
Misses ? 6993
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! looks great now with the optional parameter. Would you mind however to add a test for this functionality? It would be rather easy to extend tests in test_python_operator.py to test for this behaviour (and you can use the existing tests to see how things are mocked etc.).
This is super helpful if we decide to cherry-pick the change to v1-10-branch as only automated testing will help us to see if everything works after cherry-picking.
@ramannanda are you still working on this PR? |
Is it done on master? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I can still see this issue... is it going to be fixed? What is missing here? |
@fj-sanchez What version of Airflow? In Airflow 2.0, we did changes in this operator. |
I'm currently on 1.9.10. |
This is not a significant error, so I suspect that it will not be fixed in Airflow 1.10. We try to limit changes in operators in airflow 1.10 series, because each change must be made for Airflow 2.0, and then manually repeated once again for Airflow 1.10.x. There is too much difference between these versions. Here is discussion about ETA for Airflow 2.0: https://lists.apache.org/thread.html/r0abba3669962f101d787ad793611ba436d35c8e022aa565705778b7d%40%3Cdev.airflow.apache.org%3E |
Ok, I'll try to get this patch into our deployments then. Thank you. |
Curious why this is not a significant error as log output appears to be functionally unusable when newlines aren't printed as newlines? |
Make sure you have checked all steps below.
Jira
Description
Tests
Commits
Documentation