Skip to content

[AIRFLOW-1878] Fix stderr/stdout redirection for tasks#2836

Closed
bolkedebruin wants to merge 1 commit intoapache:masterfrom
bolkedebruin:AIRFLOW-1878
Closed

[AIRFLOW-1878] Fix stderr/stdout redirection for tasks#2836
bolkedebruin wants to merge 1 commit intoapache:masterfrom
bolkedebruin:AIRFLOW-1878

Conversation

@bolkedebruin
Copy link
Contributor

@bolkedebruin bolkedebruin commented Dec 2, 2017

Make sure you have checked all steps below.

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:

logging.StreamHandler keeps a reference to the initial stream
it has been assigned. This prevent redirection after initalization
to a logging facility.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Will add

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":

    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"
  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

cc: @Fokko @ashb @jgao54

@codecov-io
Copy link

codecov-io commented Dec 2, 2017

Codecov Report

Merging #2836 into master will decrease coverage by 0.07%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2836      +/-   ##
==========================================
- Coverage   74.05%   73.97%   -0.08%     
==========================================
  Files         160      160              
  Lines       12139    12153      +14     
==========================================
+ Hits         8989     8990       +1     
- Misses       3150     3163      +13
Impacted Files Coverage Δ
airflow/bin/cli.py 53.73% <ø> (ø) ⬆️
airflow/config_templates/airflow_local_settings.py 100% <ø> (ø) ⬆️
airflow/utils/log/logging_mixin.py 97.46% <85.71%> (-2.54%) ⬇️
airflow/task_runner/bash_task_runner.py 93.33% <0%> (-6.67%) ⬇️
airflow/utils/helpers.py 53.44% <0%> (-2.88%) ⬇️
airflow/executors/dask_executor.py 79.06% <0%> (-2.33%) ⬇️
airflow/jobs.py 79.44% <0%> (-0.45%) ⬇️
airflow/models.py 87.47% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9731ce6...9fc893d. Read the comment docs.

@bolkedebruin
Copy link
Contributor Author

@ashb @Fokko PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than setting self.stream each time maybe do something like this:

    @property
    def stream(self):
        if self._use_stderr:
            return sys.stderr
        return sys.stdout

(Inspiration from https://github.com/python/cpython/blob/3.6/Lib/logging/__init__.py#L1077-L1095)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I like that :)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible improvement, looks good otherwise.

@bolkedebruin bolkedebruin force-pushed the AIRFLOW-1878 branch 6 times, most recently from e631bbd to e089c3a Compare December 6, 2017 22:04
@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Dec 6, 2017

@ashb couldn't make it work. Seems to be Python 3.5/3.6 only.

^^ fixed that

logging.StreamHandler keeps a reference to the initial stream
it has been assigned. This prevents redirection after initalization
to a logging facility.
@jgao54
Copy link

jgao54 commented Dec 11, 2017

@bolkedebruin Just ran the local master today, and looks like there is a handler bug introduced by this PR that is preventing scheduler from running the tasks.

Specifically, I think:

    @property
     def stream(self):
         if self._use_stderr:
             return sys.stderr
 
         return sys.stdout

should be

    @property
     def stream(self):
         if self._use_stderr:
             return sys.__stderr__
 
         return sys.__stdout__

@ashb
Copy link
Member

ashb commented Dec 11, 2017

I haven't tested this myself, but __stderr__ seems like explicitly the wrong thing to use given the goal of this ticket:

sys.__stdin__
sys.__stdout__
sys.__stderr__

These objects contain the original values of stdin, stderr and stdout at the start of the program. They are used during finalization, and could be useful to print to the actual standard stream no matter if the sys.std* object has been redirected.

It can also be used to restore the actual files to known working file objects in case they have been overwritten with a broken object. However, the preferred way to do this is to explicitly save the previous stream before replacing it, and restore the saved object.

(Taken from https://docs.python.org/2/library/sys.html#sys.__stdin__)

@ashb
Copy link
Member

ashb commented Dec 11, 2017

@jgao54 What was the error reported by the scheduler? Or no error, but making the change you suggested make it start actually scheduling tasks?

@bolkedebruin
Copy link
Contributor Author

I tested this locally as well and don’t have any issues?

@bolkedebruin
Copy link
Contributor Author

And no stderr are not the right thing. Are you sure you tested the latest version of the patch?

@bolkedebruin
Copy link
Contributor Author

A previous version missed setting the redirect to ti.log which creates an endless loop and prevents tasks from running.

@jgao54
Copy link

jgao54 commented Dec 11, 2017

@ashb @bolkedebruin yeah I'm testing with a fresh copy of the latest master. I'm not exactly sure why it's causing issues, as both stdout and __stdout__ have the same reference, but I am not seeing any task-level logs, and the tasks do not appear to be triggered.

@jgao54
Copy link

jgao54 commented Dec 11, 2017

@bolkedebruin, I took another look at the code and looks like this PR itself is fine. But this PR triggered logging redirection (https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L350-L351), which gets into an infinite loop somewhere and logging INFO - Started process (PID=11759) to work on ... repetitively in the scheduler log, and is blowing up my laptop...

@bolkedebruin
Copy link
Contributor Author

I’ll have a look today to see if I can reproduce.

@TrevorEdwards
Copy link
Contributor

I noticed the same issues as @jgao54 starting with this commit on master.

@bolkedebruin
Copy link
Contributor Author

bolkedebruin commented Dec 12, 2017

Ok I have reproduced it. It seems to have creeped in between branching for the PR itself and changes that have been made to master. Will investigate and create a patch.

^^ This PR is probably indeed unrelated as mentioned by @jgao54

@bolkedebruin
Copy link
Contributor Author

fixed by #2871

Acehaidrey pushed a commit to Acehaidrey/incubator-airflow that referenced this pull request Jan 19, 2018
logging.StreamHandler keeps a reference to the
initial stream
it has been assigned. This prevents redirection
after initalization
to a logging facility.

Closes apache#2836 from bolkedebruin/AIRFLOW-1878
hougs pushed a commit to stitchfix/incubator-airflow that referenced this pull request Jun 29, 2018
logging.StreamHandler keeps a reference to the
initial stream
it has been assigned. This prevents redirection
after initalization
to a logging facility.

Closes apache#2836 from bolkedebruin/AIRFLOW-1878
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants