New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ansible] on_receive() throws "Resource temporarily unavailable" for 20 target runs on OS X #320

Closed
dw opened this Issue Jul 24, 2018 · 2 comments

Comments

Projects
None yet
1 participant
@dw
Owner

dw commented Jul 24, 2018

When running on OS X controller targetting 20 Linux clients, early in the run the connection multiplexer process generates frequent errors like:

ERROR! [pid 69424] 14:48:23.105738 E mitogen: mitogen.core.Stream(u'unix_client.70937') crashed
Traceback (most recent call last):
  File "/Users/dmw/src/mitogen/mitogen/core.py", line 1765, in _call
    func(self)
  File "/Users/dmw/src/mitogen/mitogen/core.py", line 953, in on_receive
    buf = self.receive_side.read()
  File "/Users/dmw/src/mitogen/mitogen/core.py", line 883, in read
    s, disconnected = io_op(os.read, self.fd, n)
  File "/Users/dmw/src/mitogen/mitogen/core.py", line 287, in io_op
    return func(*args), False
OSError: [Errno 35] Resource temporarily unavailable

The client of the connection multiplexer (the task worker) subsequently generates its own error as the IO loop disconnects its now-broken socket:

[pid 69424] 14:48:23.196828 D mitogen.io: mitogen.ssh.Stream(u'ssh.foo.bar').on_receive()
The full traceback is:
Traceback (most recent call last):
  File "/Users/dmw/src/x/x-pbm/.tox/deploy/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 138, in run
    res = self._execute()
  File "/Users/dmw/src/x/x-pbm/.tox/deploy/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 576, in _execute
    result = self._handler.run(task_vars=variables)
  File "/Users/dmw/src/mitogen/ansible_mitogen/mixins.py", line 115, in run
    return super(ActionModuleMixin, self).run(tmp, task_vars)
  File "/Users/dmw/src/x/x-pbm/.tox/deploy/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 46, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "/Users/dmw/src/mitogen/ansible_mitogen/mixins.py", line 326, in _execute_module
    self._connection._connect()
  File "/Users/dmw/src/mitogen/ansible_mitogen/connection.py", line 511, in _connect
    stack=mitogen.utils.cast(list(stack)),
  File "/Users/dmw/src/mitogen/mitogen/core.py", line 1114, in call_service
    return recv.get().unpickle()
  File "/Users/dmw/src/mitogen/mitogen/core.py", line 573, in get
    raise ChannelError(ChannelError.local_msg)
ChannelError: Channel closed by local end.

Problem did not occur in older versions, suggesting a problem with the new KQueuePoller.

Initial idea is that kqueue has a read event buffered for an FD number that has been reused for a new socket before the event has been drained, causing the IO loop to attempt to read from the new instantiation of the FD before it is ready for reading.

@dw dw added bug ansible labels Jul 24, 2018

@dw dw changed the title from [ansible] on_receive() throws EAGAIN for 20 target runs on OS X to [ansible] on_receive() throws "Resource temporarily unavailable" for 20 target runs on OS X Jul 24, 2018

@dw

This comment has been minimized.

Owner

dw commented Jul 24, 2018

Minimal reproduction:

import select
import socket

s1, s2 = socket.socketpair()
kq = select.kqueue()
print kq.control([select.kevent(s2.fileno(), select.KQ_FILTER_READ, select.KQ_EV_ADD)], 32, 0.0)
s1.send('x')
s2.close()
print kq.control([select.kevent(s2.fileno(), select.KQ_FILTER_READ, select.KQ_EV_DELETE)], 32, 0.0)

Yields:

[<select.kevent ident=18446744073709551615 filter=-1 flags=0x4002 fflags=0x0 data=0x2 udata=0x0>]

Notice that the FD was both closed, and a changelist entry deleting its subscription was present in the syscall that yielded the event for it.

Notice the KQ_EV_ERROR bit of flags is set.

Commenting out "s2.close()" reveals the event queue is emptied of the stale event (per expectation) before it is returned.

@dw

This comment has been minimized.

Owner

dw commented Jul 27, 2018

This bug is basically a manifestation of #26, which entails a poller interface change.

@dw dw referenced this issue Aug 8, 2018

Closed

Post-run error #333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment