Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework output redirection #5084

Merged
merged 6 commits into from
Aug 20, 2017
Merged

Rework output redirection #5084

merged 6 commits into from
Aug 20, 2017

Conversation

tgamblin
Copy link
Member

@tgamblin tgamblin commented Aug 14, 2017

This reworks I/O redirection in Spack to avoid race cases, to make output more deterministic, and to ensure that everything gets logged in build.out. Previously output could be different if you just ran spack install or ran spack install | cat. This PR tries to make the output more deterministic and to simplify the interface to the log_output context handler.

  • Simplify interface to log_output. New interface requires only one context handler instead of two. Before:

      with log_output('logfile.txt') as log_redirection:
           with log_redirection:
               # do things ... output will be logged

    After:

      with log_output('logfile.txt'):
          # do things ... output will be logged

    If you also want the output to be echoed to stdout, use the echo parameter::

      with log_output('logfile.txt', echo=True):
          # do things ... output will be logged and printed out

    And, if you just want to echo some stuff from the parent, use force_echo:

      with log_output('logfile.txt', echo=False) as logger:
          # do things ... output will be logged
    
          with logger.force_echo():
              # things here will be echoed *and* logged

    A key difference between this and the previous implementation is that everything in the context handler is logged. Previously, things like Executing phase 'configure' would not be logged, only output to the screen, so understanding phases in the build log was difficult.

  • The implementation of log_output() is different in two major ways:

    1. This implementation avoids race cases by using only one pipe (before we had a multiprocessing pipe and a unix pipe). The logger daemon stops naturally when the input stream is closed, which avoids a race in the previous implementation where we'd miss some lines of output because the parent would shut the daemon down before it was done with all output.

    2. Instead of turning output redirection on and off, which prevented some things from being logged, this version uses control characters in the output stream to enable/disable forced echoing. We're using the time-honored xon and xoff codes, which tell the daemon to echo anything between them AND write it to the log. This is how logger.force_echo() works.

  • Fix places where output could get stuck in buffers by flushing more aggressively. This makes the output printed to the terminal the same as that which would be printed through a pipe to cat or to a file. Previously these could be weirdly different, and some output would be missing when redirecting Spack to a file or pipe.

  • Simplify input and color handling in both build_environment.fork() and llnl.util.tty.log.log_output(). Neither requires an input_stream parameter anymore; we assume stdin will be forwarded if possible.

  • remove llnl.util.lang.duplicate_stream() and remove associated monkey-patching in tests, as these aren't needed if you just check whether stdin is a tty and has a fileno attribute.

  • fix exit call in SpackError.die():

    1. Previously we would use os._exit() in to avoid Spack error handling in the parent process when build processes failed. This isn't necessary anymore since build processes propagate their exceptions to the parent process.

    2. Use sys.exit instead of os._exit. This has the advantage of automatically flushing output streams on quit, so output from child processes is not lost when Spack exits.

  • add test for output capture

    1. Check that output is captured from python and forked programs
    2. Disable pytest's capsys/capfd where necessary
  • Preserve verbosity setting across installs

    • user presses v, and it's valid for the entire Spack run!

@tgamblin tgamblin requested a review from alalazo August 14, 2017 12:26
@tgamblin
Copy link
Member Author

@alalazo: this reworks a lot of the major changes from your last reworking of I/O redirection. Can you take a look at it?

@tgamblin tgamblin added the WIP label Aug 14, 2017
@tgamblin
Copy link
Member Author

Note that this currently includes #5016 but probably shouldn't. This should go in first -- I'll separate it out later.

Copy link
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much agree with the design and changes here, but enabling / removing verbosity is not working for me (hitting v does not alter echoing - didn't have time to debug the cause yet).


EDIT: Once in a while it seems the logger daemon hangs, and stops the whole application.


or::
with logger.force_echo():
# things here will be echoed *and* logged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a silence_echo method for consistency? Something like:

with log_output('logfile.txt', echo=True) as logger:
    # do things ... output will be logged and printed out
    with logger.silence_echo():
        # things here will not be echoed, but will be logged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost added this and with no_logging, but I think we could easily add those when the need arises. We might need to adjust the protocol a bit, and instead of using XON/XOFF to delineate echo'd regions of actual log, we'd use XON/OFF to start/stop some control protocol. e.g.:

\xonTF\xoff\n    # echo, no log
\xonTT\xoff\n    # echo and log
\xonFT\xoff\n    # no echo, log

Personally, I'd save this for later, as I don't think it is needed right now.

self.echo = not self.echo
if stdin in rlist:
if stdin.read(1) == 'v':
echo = not echo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently not able to get this part working. I tested the PR on:

$ uname -a
Linux nuvolari 4.4.0-87-generic #110~14.04.1-Ubuntu SMP Tue Jul 18 14:51:32 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

using different versions of python for spack.

Copy link
Member Author

@tgamblin tgamblin Aug 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look... wonder if this is a race to see who gets to EOF first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a case of using self.echo instead of echo down below 😄

@alalazo
Copy link
Member

alalazo commented Aug 17, 2017

When the application hangs and I hit Ctrl+C to quit, the stacktrace looks like:

==> WRITE LOCK: /home/mculpo/PycharmProjects/spack/var/spack/stage/.lock[6195093728828230899:1] [Released]
Traceback (most recent call last):
  File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
    self.run()
  File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mculpo/PycharmProjects/spack/lib/spack/llnl/util/tty/log.py", line 314, in _writer_daemon
    if stdin.read(1) == 'v':
KeyboardInterrupt

==> Error: Keyboard interrupt.

@tgamblin tgamblin force-pushed the features/better-io-redirection branch from 7a784fc to 625008e Compare August 20, 2017 07:13
@tgamblin
Copy link
Member Author

@alalazo: this should be ready for another review.

This fixes the v issue, preserved verbosity across package builds (so if you hit v for a dependency, it sticks), and adds tests.

I learned in the process that log_output is very similar to some functionality in py._io, and some of this output redirection interferes with pytest, so it's necessary to disable pytest's I/O capture for some of the tests.

@tgamblin
Copy link
Member Author

@alalazo: I also could not reproduce the hang you're seeing, but I put a generic exception handler around the main writer loop, so that if something goes wrong, we'll still get out.

I'm not sure the best way to wait on the logger daemon. I couldn't get it to hang, and I think that join()ing it is necessary to clean up the process. If we do a timeout of 60 on the join like before we'll just end up having a zombie process.

@tgamblin tgamblin force-pushed the features/better-io-redirection branch from 625008e to 86132f5 Compare August 20, 2017 07:49
@alalazo
Copy link
Member

alalazo commented Aug 20, 2017

For the record: I re-started python@3.5 unit tests because of #5156

@tgamblin
Copy link
Member Author

I was wracking my brains trying to figure that out. I installed a local python 3.5 to try to reproduce but couldn't.

@alalazo
Copy link
Member

alalazo commented Aug 20, 2017

@tgamblin I tried a lot of times to reproduce this error locally and a few times on Travis (by restarting unit test in #4994 ), but was not able to do so. I also noticed that the error is not specific to a single python version. I really don't get why sometimes we end up with the separator being deduced as o.

@tgamblin
Copy link
Member Author

@alalazo: I was able to reproduce and (I think) fix it locally but I still don't know exactly why it happens. I can reproduce it running spack info openmpi in Python3.

@alalazo
Copy link
Member

alalazo commented Aug 20, 2017

@tgamblin You're right. Before I tried this:

$ for ii in {1..100}; do spack info openmpi 1> /dev/null ; done

which for me calls python2. I also tried running unit tests on python 3 a couple of times and were not failing. But if I do:

$ for ii in {1..100}; do python3 bin/spack info openmpi 1> /dev/null ; done
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'
==> Error: 'o'

where python 3 is:

$  python3 --version
Python 3.4.3

@tgamblin
Copy link
Member Author

See if this fixes it in spec.py:

- _separators = '[%s]' % ''.join(color_formats.keys())
+ _separators = '[\\%s]' % '\\'.join(color_formats.keys())

It seems to for me. That escapes all the special chars in the _separators regex, but I can't seem to get the original regex to match 'o' outside of Spack. Trying to reproduce that now...

@alalazo
Copy link
Member

alalazo commented Aug 20, 2017

Before I dive into the debugger: are we relying on the order of keys in a dictionary when we iterate over them?

@tgamblin
Copy link
Member Author

@alalazo: oh interesting. That might be it. If the order of the characters in _separators is randomized, ^ can come first. Here's the issue:

_separators = '[\\%s]' % '\\'.join(color_formats.keys()) 

I relied on it being last, because if it's first it means "match anything NOT in this list", which would include 'o', the first character in openmpi.

So escaping them as I did above will fix this... if ^ comes first, it'll be preceded by \. I'll make a PR.

@alalazo
Copy link
Member

alalazo commented Aug 20, 2017

@tgamblin Check this slide that shows a summary of how dictionaries are implemented in various python versions. The order of keys is scrambled for python 2.7, randomized for python 3.5, ordered for python 3.6. I always found that really evil on users 😄


At this point I wonder if the other issues we are frequently seeing (like svn) are due to the same use of dictionary keys.

@tgamblin
Copy link
Member Author

Personally I like the randomized option :), because it makes you get rid of bugs like this. Go does this too -- it enables them to make more performant parallel map implementations, too.

@tgamblin
Copy link
Member Author

I'm sure someone would argue that "randomized keys for better parallel map implementations" violates some part of import this.

Copy link
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I tried the PR again on hdf5~mpi+szip, using both python 3.4.3 and python 2.7.6 and it worked correctly (no hangs, output echoing controlled by pressing v).

@tgamblin tgamblin force-pushed the features/better-io-redirection branch from 86132f5 to 7a7a29f Compare August 20, 2017 10:33
@tgamblin
Copy link
Member Author

@alalazo: one more thing. Now redirected output is unbuffered (really, force-flushed), so that python output will be properly interleaved with subcommand output during install.

@cyrush: this should fix your long-standing issue.

@alalazo
Copy link
Member

alalazo commented Aug 20, 2017

One test failing:

_____________________________ test_package_output ______________________________
install_mockery = None, mock_fetch = None
    def test_package_output(install_mockery, mock_fetch):
        """Ensure output printed from pkgs is captured by output redirection."""
        out, err = install('-v', 'printing-package')
>       assert "BEFORE INSTALL\n==> './configure'" in out
E       assert "BEFORE INSTALL\n==> './configure'" in '  Fetch: 0.07s.  Build: 0.35s.  Total: 0.42s.\n/tmp/pytest-of-travis/pytest-0/test_package_output0/test-debian6-x86_64/clang-3.3/printing-package-1.0-xpnz2byzii5mgzx62d2tpgfwffy7qbii\n'
/home/travis/build/LLNL/spack/lib/spack/spack/test/cmd/install.py:94: AssertionError

- add fail_on_error argument
- record exception and return code when the command fails
- Simplify interface to log_output. New interface requires only one
  context handler instead of two.  Before:

      with log_output('logfile.txt') as log_redirection:
           with log_redirection:
               # do things ... output will be logged

  After:

      with log_output('logfile.txt'):
          # do things ... output will be logged

  If you also want the output to be echoed to ``stdout``, use the
  `echo` parameter::

      with log_output('logfile.txt', echo=True):
          # do things ... output will be logged and printed out

  And, if you just want to echo *some* stuff from the parent, use
  ``force_echo``:

      with log_output('logfile.txt', echo=False) as logger:
          # do things ... output will be logged

          with logger.force_echo():
              # things here will be echoed *and* logged

  A key difference between this and the previous implementation is that
  *everything* in the context handler is logged.  Previously, things like
  `Executing phase 'configure'` would not be logged, only output to the
  screen, so understanding phases in the build log was difficult.

- The implementation of `log_output()` is different in two major ways:

  1. This implementation avoids race cases by using only one pipe (before
     we had a multiprocessing pipe and a unix pipe).  The logger daemon
     stops naturally when the input stream is closed, which avoids a race
     in the previous implementation where we'd miss some lines of output
     because the parent would shut the daemon down before it was done
     with all output.

  2. Instead of turning output redirection on and off, which prevented
     some things from being logged, this version uses control characters
     in the output stream to enable/disable forced echoing.  We're using
     the time-honored xon and xoff codes, which tell the daemon to echo
     anything between them AND write it to the log.  This is how
     `logger.force_echo()` works.

- Fix places where output could get stuck in buffers by flushing more
  aggressively.  This makes the output printed to the terminal the same
  as that which would be printed through a pipe to `cat` or to a file.
  Previously these could be weirdly different, and some output would be
  missing when redirecting Spack to a file or pipe.

- Simplify input and color handling in both `build_environment.fork()`
  and `llnl.util.tty.log.log_output()`.  Neither requires an input_stream
  parameter anymore; we assume stdin will be forwarded if possible.

- remove `llnl.util.lang.duplicate_stream()` and remove associated
  monkey-patching in tests, as these aren't needed if you just check
  whether stdin is a tty and has a fileno attribute.
- Previously we would use `os._exit()` in to avoid Spack error handling
  in the parent process when build processes failed.  This isn't
  necessary anymore since build processes propagate their exceptions to
  the parent process.

- Use `sys.exit` instead of `os._exit`. This has the advantage of
  automatically flushing output streams on quit, so output from child
  processes is not lost when Spack exits.
- 'v' toggle was previously only good for the current install.
- subsequent installs needed user to press 'v' again.
- 'v' state is now preserved across dependency installs.
- Python I/O would not properly interleave (or appear) with output from
  subcommands.

- Add a flusing wrapper around sys.stdout and sys.stderr when
  redirecting, so that Python output is synchronous with that of
  subcommands.
@tgamblin tgamblin force-pushed the features/better-io-redirection branch from 439525e to aed383a Compare August 20, 2017 22:40
@tgamblin tgamblin merged commit 10bb681 into develop Aug 20, 2017
@tgamblin tgamblin deleted the features/better-io-redirection branch August 20, 2017 23:54
@cyrush
Copy link
Member

cyrush commented Aug 21, 2017

@tgamblin awesome thanks for this fix -- looks good:

surface86@cyrush:cat m.out.cz.spack.fix.test.1728086.surface113.txt 
Mon Aug 21 09:22:56 PDT 2017
Initialized empty Git repository in /g/g24/cyrush/work/2017.08.spack.msub.test/spack_error_test/spack/.git/
HEAD is now at 10bb681 Unbuffer so that output from packages appears when redirecting
==> Warning: You asked for --insecure. Will NOT check SSL certificates.
==> Warning: /g/g24/cyrush/.spack/compilers.yaml in out of date compilers format. Moved to /g/g24/cyrush/.spack/_old_compilers.yaml. Spack automatically generate a compilers config file 
==> Installing zcrash
==> Fetching http://zlib.net/fossils/zlib-1.2.10.tar.gz
==> Staging archive: /g/g24/cyrush/work/2017.08.spack.msub.test/spack_error_test/spack/var/spack/stage/zcrash-1.2.10-c6hgtrazcydsggjzpjrg4n6nq4dgdgyl/zlib-1.2.10.tar.gz
==> Created stage in /g/g24/cyrush/work/2017.08.spack.msub.test/spack_error_test/spack/var/spack/stage/zcrash-1.2.10-c6hgtrazcydsggjzpjrg4n6nq4dgdgyl
==> No patches needed for zcrash
==> Building zcrash [Package]
==> Executing phase: 'install'
==> Error: ProcessError: Command exited with status 1:
    './configure' '--CRASH!' '--prefix=/g/g24/cyrush/work/2017.08.spack.msub.test/spack_error_test/spack/opt/spack/linux-rhel6-x86_64/gcc-4.4.7/zcrash-1.2.10-c6hgtrazcydsggjzpjrg4n6nq4dgdgyl'
/g/g24/cyrush/work/2017.08.spack.msub.test/spack_error_test/spack/var/spack/repos/builtin/packages/zcrash/package.py:43, in install:
     41       def install(self, spec, prefix):
     42           cfg_args = ["--CRASH!","--prefix=%s" % prefix]
  >> 43           configure(*cfg_args)
     44           make()
     45           make("install")

See build log for details:
  /tmp/cyrush/spack-stage/spack-stage-_Q1cRm/zlib-1.2.10/spack-build.out
[exe: spack/bin/spack -k install zcrash]
{ERROR [return code: 1] from command: spack/bin/spack -k install zcrash}
Mon Aug 21 09:23:37 PDT 2017

vs before:

surface86@cyrush:cat m.out.cz.spack.asctk.test.1728085.surface113.txt
Mon Aug 21 09:22:07 PDT 2017
Initialized empty Git repository in /g/g24/cyrush/work/2017.08.spack.msub.test/spack_error_test/spack/.git/
HEAD is now at 4a6ec63 Merge pull request #940 from gzagaris/feature/doxygen-version-1.8.11
==> Warning: You asked for --insecure, which does not check SSL certificates.
==> Error: Command exited with status 1:
==> Error: Installation process had nonzero exit code.
[exe: spack/bin/spack -k install zcrash]
{ERROR [return code: 1] from command: spack/bin/spack -k install zcrash}
Mon Aug 21 09:22:34 PDT 2017

@tgamblin
Copy link
Member Author

@cyrush: awesome. You will probably also like #5016, which is next.

@cyrush
Copy link
Member

cyrush commented Aug 21, 2017

@tgamblin sounds good!

On failure, I am also thinking of automatically copying the build log out from the stage somewhere into the spack space ('last.build.error.txt'), b/c if we fail while building in a temp space on a random node, we may not be able to access the build log.

@tgamblin tgamblin mentioned this pull request Oct 21, 2017
@tgamblin tgamblin added this to the v0.11.0 milestone Nov 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants