Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent CI issues: subprocess.CalledProcessError - Fatal Python error: _enter_buffered_busy: could not acquire lock #37

Open
jacobtomlinson opened this issue Mar 7, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@jacobtomlinson
Copy link
Contributor

jacobtomlinson commented Mar 7, 2022

I am noticing an intermittent subprocess.CalledProcessError error in CI mentioning locks and stderr. This has been seen in https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true and mentioned in #36 too.

subprocess.CalledProcessError: Command '['daskctl', 'cluster', 'create', '-f', '/home/runner/work/dask-ctl/dask-ctl/dask_ctl/tests/specs/simple.yaml']' died with <Signals.SIGABRT: 6>.

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/process.py", line 216, in _watch_process
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x55883c7ff240)

Current thread 0x00007f163932b740 (most recent call first):
<no Python frame>
Full log
============================= test session starts ==============================
platform linux -- Python 3.[9](https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true#step:5:9).[10](https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true#step:5:10), pytest-7.0.1, pluggy-1.0.0
rootdir: /home/runner/work/dask-ctl/dask-ctl, configfile: setup.cfg
plugins: asyncio-0.15.1, cov-3.0.0
collected 24 items

dask_ctl/discovery.py sss                                                [ 12%]
dask_ctl/lifecycle.py ssssss                                             [ 37%]
dask_ctl/proxy.py sss                                                    [ 50%]
dask_ctl/tests/test_cli.py ..F.                                          [ 66%]
dask_ctl/tests/test_dask_ctl.py .                                        [ 70%]
dask_ctl/tests/test_discovery.py .....                                   [ 91%]
dask_ctl/tests/test_lifecycle.py ..                                      [100%]

=================================== FAILURES ===================================
_________________________________ test_create __________________________________

simple_spec_path = '/home/runner/work/dask-ctl/dask-ctl/dask_ctl/tests/specs/simple.yaml'

    def test_create(simple_spec_path):
>       output = check_output(["daskctl", "cluster", "create", "-f", simple_spec_path])

dask_ctl/tests/test_cli.py:25: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/subprocess.py:424: in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

input = None, capture_output = False, timeout = None, check = True
popenargs = (['daskctl', 'cluster', 'create', '-f', '/home/runner/work/dask-ctl/dask-ctl/dask_ctl/tests/specs/simple.yaml'],)
kwargs = {'stdout': -1}
process = <Popen: returncode: -6 args: ['daskctl', 'cluster', 'create', '-f', '/home/r...>
stdout = b'Created cluster 09fa9ac9.\n', stderr = None, retcode = -6

    def run(*popenargs,
            input=None, capture_output=False, timeout=None, check=False, **kwargs):
        """Run command with arguments and return a CompletedProcess instance.
    
        The returned instance will have attributes args, returncode, stdout and
        stderr. By default, stdout and stderr are not captured, and those attributes
        will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.
    
        If check is True and the exit code was non-zero, it raises a
        CalledProcessError. The CalledProcessError object will have the return code
        in the returncode attribute, and output & stderr attributes if those streams
        were captured.
    
        If timeout is given, and the process takes too long, a TimeoutExpired
        exception will be raised.
    
        There is an optional argument "input", allowing you to
        pass bytes or a string to the subprocess's stdin.  If you use this argument
        you may not also use the Popen constructor's "stdin" argument, as
        it will be used internally.
    
        By default, all communication is in bytes, and therefore any "input" should
        be bytes, and the stdout and stderr will be bytes. If in text mode, any
        "input" should be a string, and stdout and stderr will be strings decoded
        according to locale encoding, or by "encoding" if set. Text mode is
        triggered by setting any of text, encoding, errors or universal_newlines.
    
        The other arguments are the same as for the Popen constructor.
        """
        if input is not None:
            if kwargs.get('stdin') is not None:
                raise ValueError('stdin and input arguments may not both be used.')
            kwargs['stdin'] = PIPE
    
        if capture_output:
            if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
                raise ValueError('stdout and stderr arguments may not be used '
                                 'with capture_output.')
            kwargs['stdout'] = PIPE
            kwargs['stderr'] = PIPE
    
        with Popen(*popenargs, **kwargs) as process:
            try:
                stdout, stderr = process.communicate(input, timeout=timeout)
            except TimeoutExpired as exc:
                process.kill()
                if _mswindows:
                    # Windows accumulates the output in a single blocking
                    # read() call run on child threads, with the timeout
                    # being done in a join() on those threads.  communicate()
                    # _after_ kill() is required to collect that and add it
                    # to the exception.
                    exc.stdout, exc.stderr = process.communicate()
                else:
                    # POSIX _communicate already populated the output so
                    # far into the TimeoutExpired exception.
                    process.wait()
                raise
            except:  # Including KeyboardInterrupt, communicate handled that.
                process.kill()
                # We don't call process.wait() as .__exit__ does that for us.
                raise
            retcode = process.poll()
            if check and retcode:
>               raise CalledProcessError(retcode, process.args,
                                         output=stdout, stderr=stderr)
E               subprocess.CalledProcessError: Command '['daskctl', 'cluster', 'create', '-f', '/home/runner/work/dask-ctl/dask-ctl/dask_ctl/tests/specs/simple.yaml']' died with <Signals.SIGABRT: 6>.

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/subprocess.py:528: CalledProcessError
----------------------------- Captured stderr call -----------------------------
/home/runner/work/dask-ctl/dask-ctl/dask_ctl/cli.py:15: FutureWarning: typename is deprecated and will be removed in a future release. Please use dask.utils.typename instead.
  from distributed.utils import typename
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x559447276ac0)

Current thread 0x00007f7f2bd16740 (most recent call first):
<no Python frame>
=============================== warnings summary ===============================
dask_ctl/cli.py:15
  /home/runner/work/dask-ctl/dask-ctl/dask_ctl/cli.py:15: FutureWarning: typename is deprecated and will be removed in a future release. Please use dask.utils.typename instead.
    from distributed.utils import typename

dask_ctl/tests/test_cli.py::test_list
dask_ctl/tests/test_lifecycle.py::test_snippet
  /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/scheduler.py:5715: UserWarning: Scheduler already contains a plugin with name worker-status; overwriting.
    warnings.warn(

dask_ctl/tests/test_cli.py::test_autocompletion
dask_ctl/tests/test_discovery.py::test_discover_cluster_names
dask_ctl/tests/test_lifecycle.py::test_snippet
  /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/client.py:1354: RuntimeWarning: coroutine 'wait_for' was never awaited
    self.close()

dask_ctl/tests/test_cli.py::test_autocompletion
dask_ctl/tests/test_discovery.py::test_discover_cluster_names
dask_ctl/tests/test_lifecycle.py::test_snippet
  /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/client.py:1354: RuntimeWarning: coroutine 'Client._close' was never awaited
    self.close()

dask_ctl/tests/test_discovery.py::test_discover_clusters
  /home/runner/work/dask-ctl/dask-ctl/dask_ctl/discovery.py:100: UserWarning: Cluster discovery for proxycluster timed out.
    warnings.warn(

dask_ctl/tests/test_discovery.py::test_discover_clusters
  /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function Client.__del__ at 0x7f16362e7e50>
  
  Traceback (most recent call last):
    File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/client.py", line 1354, in __del__
      self.close()
    File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/client.py", line 1597, in close
      sync(self.loop, self._close, fast=True, callback_timeout=timeout)
    File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/utils.py", line 371, in sync
      raise TimeoutError(f"timed out after {callback_timeout} s.")
  asyncio.exceptions.TimeoutError: timed out after 2 s.
  
    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

dask_ctl/tests/test_lifecycle.py::test_snippet
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
  /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
  Perhaps you already have a cluster running?
  Hosting the HTTP server on port 43767 instead
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.9.10-final-0 -----------
Name                         Stmts   Miss  Cover
------------------------------------------------
dask_ctl/__init__.py            10      0   100%
dask_ctl/cli.py                162     53    67%
dask_ctl/config.py               8      0   100%
dask_ctl/discovery.py           34      4    88%
dask_ctl/lifecycle.py           42      9    79%
dask_ctl/proxy.py               46      2    96%
dask_ctl/spec.py                14      1    93%
dask_ctl/tests/conftest.py      [11](https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true#step:5:11)      1    91%
dask_ctl/utils.py               18      0   100%
------------------------------------------------
TOTAL                          345     70    80%

=========================== short test summary info ============================
FAILED dask_ctl/tests/test_cli.py::test_create - subprocess.CalledProcessErro...
============ 1 failed, 11 passed, [12](https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true#step:5:12) skipped, 12 warnings in 34.43s ============
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/distributed/process.py", line 216, in _watch_process
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x55883c7ff240)

Current thread 0x00007f163932b740 (most recent call first):
<no Python frame>
/home/runner/work/_temp/58de1659-75dd-4396-916b-3d1fc3[13](https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true#step:5:13)2720.sh: line 1:  [17](https://github.com/dask-contrib/dask-ctl/runs/5448329455?check_suite_focus=true#step:5:17)06 Aborted                 (core dumped) pytest
@jacobtomlinson jacobtomlinson added the bug Something isn't working label Mar 7, 2022
@jacobtomlinson jacobtomlinson changed the title Intermittent CI issues: Fatal Python error: _enter_buffered_busy: could not acquire lock Intermittent CI issues: subprocess.CalledProcessError - Fatal Python error: _enter_buffered_busy: could not acquire lock Mar 7, 2022
@keewis
Copy link
Contributor

keewis commented Jun 3, 2022

If I understand correctly, the issue is that there are daemon threads (those with thread.daemon = True) involved somewhere that try to print to stderr on shutdown (which might have already been deconstructed).

Since there's no explicit use of threading here I suppose this should actually be forwarded to distributed?

Edit: I have also occasionally seen this when shutting down (double Ctrl-C in the terminal) a jupyterlab instance with a cluster created by dask-labextension still running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants