Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL 2016 Anniversary Update tcsetattr() failure #550

Closed
dw opened this Issue Feb 22, 2019 · 19 comments

Comments

Projects
None yet
2 participants
@dw
Copy link
Owner

commented Feb 22, 2019

I think this problem might be back. I'm trying to run this on WSL using mitogen 2.5 with ansible 2.7.7 and getting the following during the Gather Facts step.

ERROR! [pid 17342] 16:05:31.543223 E mitogen: mitogen.service.Pool(0x7f089dd07c10, size=32, th='mitogen.service.Pool.7f089dd07c10.worker-0'): while invoking u'get' of u'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
  File "/etc/ansible/mitogen-0.2.5/mitogen/service.py", line 544, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
  File "/etc/ansible/mitogen-0.2.5/mitogen/service.py", line 274, in invoke
    response = self._invoke(method_name, kwargs, msg)
  File "/etc/ansible/mitogen-0.2.5/mitogen/service.py", line 260, in _invoke
    ret = method(**kwargs)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/services.py", line 467, in get
    reraise(*result)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/services.py", line 425, in _wait_or_start
    response = self._connect(key, spec, via=via)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/services.py", line 377, in _connect
    context = method(via=via, unidirectional=True, **spec['kwargs'])
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 2152, in ssh
    return self.connect(u'ssh', **kwargs)
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 2097, in connect
    **mitogen.core.Kwargs(kwargs))
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 2078, in _connect
    stream.connect()
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 1380, in connect
    self.pid, fd, diag_fd = self.start_child()
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 1354, in start_child
    return self.create_child(args, **self.create_child_args)
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 498, in hybrid_tty_create_child
    disable_echo(master_fd)
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 245, in disable_echo
    termios.tcsetattr(fd, flags, new)
error: (22, 'Invalid argument')
An exception occurred during task execution. To see the full traceback, use -vvv. The error was:     termios.tcsetattr(fd, flags, new)

Originally posted by @zyphermonkey in #319 (comment)

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

Hey @zyphermonkey. Can you please report exact WSL version and Python version.

WSL is a moving target, so possibly this is new breakage, or you simply haven't upgraded for a while

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

I think WSL version is tied to OS build number -- see Microsoft/WSL#1728 (comment)

@dw dw added ansible os:wsl labels Feb 22, 2019

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 22, 2019

uname -a

Linux <hostname> 6 3.4.0+ #1 PREEMPT Thu Aug 1 17:06:05 CST 2013 x86_64 x86_64 x86_64 GNU/Linux

OS Build

14393.2791

ansible --version

ansible 2.7.7
  config file = /root/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.6 (default, Nov 13 2018, 12:45:42) [GCC 4.8.4]
@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

Thanks. A little busy just now, but will spin up Windows VM later and try to figure this one out.

Just to confirm, was this definitely a "plain old" SSH connection? Any become: true or similar involved?

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 22, 2019

I tried it with and without become: true set and the error is the same either way.

I also tried an adhoc ping command with the same results.

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

Thanks a ton for narrowing it down, it's definitely an SSH problem.

Just to confirm (probably true)

  • this happens 100% of the time
  • you can SSH from WSL to the machine normally otherwise

WSL is very funny about lifetimes and object states. For example if SSH client exitted quickly, and so nothing was left open on the TTY, maybe that could explain the error, but probably not

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 22, 2019

  • this happens 100% of the time

Yes it doesn't appear to matter what settings I chose it just fails immediately with this error.

  • you can SSH from WSL to the machine normally otherwise

I can ssh to the box with no issue from within WSL and if I'm not using mitogen everything works.

@dw dw added the user-reported label Feb 23, 2019

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

It seems you're on the supposed "Redstone 1" release, aka 2016 Anniversary Update, whereas mine is "Redstone 4" aka. April 2018 update.

Do you know if your machine is particularly out of date for some reason? I'm really not familiar with Windows update cycles, so possibly this is a normal situation, perhaps a result of your corporate domain settings or similar.

The code is working fine for me locally. It seems I need to go digging in WSL's GitHub to figure out what may have broken/changed between those two releases.

I'm guessing a workaround should be more than possible -- these are very core UNIX APIs. It is likely WSL is objecting to some particular bit set in the value we supply, or similar

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Does strace work for you on that WSL install? If it does, running Ansible with a command line:

strace -f -o /tmp/foo ansible-playbook .....

Followed by

grep EINVAL /tmp/foo | tail -10

Might reveal the exact system call and flag combination that is failing

@dw dw changed the title [regression?] tcsetattr() failure WSL 2016 Anniversary Update tcsetattr() failure Feb 23, 2019

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 23, 2019

strace appears to lockup bash. When I ctrl+c to break out of it I get this.

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/process.py", line 181, in <lambda>
    atexit.register(lambda: clean_shutdown(cls.worker_sock))
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/process.py", line 83, in clean_shutdown
    sock.recv(1)
KeyboardInterrupt

And here is the output from the strace.

# grep -i einval /tmp/ansible.trace
18    readlink("/usr/bin/python2.7", 0x7fffd10858f0, 4096) = -1 EINVAL (Invalid argument)
30    ioctl(76, SNDCTL_TMR_CONTINUE or SNDRV_TIMER_IOCTL_GPARAMS or TCSETSF, {B38400 -opost -isig -icanon -echo ...}) = -1 EINVAL (Invalid argument)
@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Okay, I think I can see what's happening. You're right, it is exactly the old bug. The patch creates an IS_WSL variable, which I /think/ is set to False in your install, because the output of uname is different on these older versions.

Looks like Microsoft/WSL#423 (comment) is how we're supposed to reliably detect WSL.

If possible, would you mind catting the contents of the files listed on that ticket:

  • /proc/sys/kernel/osrelease
  • /proc/version

I want to make sure fixing IS_WSL will work for modern and older versions. Thanks again!

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

You can verify it's just a flag issue by editing mitogen/core.py and changing the IS_WSL = ... near line 143 to IS_WSL = True. The library should start working

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 23, 2019

cat /proc/sys/kernel/osrelease
3.4.0-Microsoft

cat /proc/version
Linux version 3.4.0-Microsoft (Microsoft@Microsoft.com) (gcc version 4.7 (GCC) ) #1 SMP PREEMPT Wed Dec 31 14:42:53 PST 2014

I also made the change to mitogen/core.py and still getting an error

ERROR! [pid 26] 09:59:44.868879 E mitogen: mitogen.service.Pool(0x7f98154e2f90, size=32, th='mitogen.service.Pool.7f98154e2f90.worker-0'): while invoking u'get' of u'ansible_mitogen.services.ContextService'
Traceback (most recent call last):
  File "/etc/ansible/mitogen-0.2.5/mitogen/service.py", line 544, in _on_service_call
    return invoker.invoke(method_name, kwargs, msg)
  File "/etc/ansible/mitogen-0.2.5/mitogen/service.py", line 274, in invoke
    response = self._invoke(method_name, kwargs, msg)
  File "/etc/ansible/mitogen-0.2.5/mitogen/service.py", line 260, in _invoke
    ret = method(**kwargs)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/services.py", line 467, in get
    reraise(*result)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/services.py", line 425, in _wait_or_start
    response = self._connect(key, spec, via=via)
  File "/etc/ansible/mitogen-0.2.5/ansible_mitogen/services.py", line 377, in _connect
    context = method(via=via, unidirectional=True, **spec['kwargs'])
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 2152, in ssh
    return self.connect(u'ssh', **kwargs)
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 2097, in connect
    **mitogen.core.Kwargs(kwargs))
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 2078, in _connect
    stream.connect()
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 1380, in connect
    self.pid, fd, diag_fd = self.start_child()
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 1354, in start_child
    return self.create_child(args, **self.create_child_args)
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 511, in hybrid_tty_create_child
    close_fds=True,
  File "/etc/ansible/mitogen-0.2.5/mitogen/parent.py", line 302, in detach_popen
    proc = subprocess.Popen(preexec_fn=preexec_fn, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
IOError: [Errno 1] Operation not permitted
An exception occurred during task execution. To see the full traceback, use -vvv. The error was:     raise child_exception
@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Hrm! That's a new issue, and another ioctl() problem.

Would you mind changing mitogen/parent.py around line 377 from:

if hasattr(termios, 'TIOCSCTTY'): 

to:

if hasattr(termios, 'TIOCSCTTY') and not mitogen.core.IS_WSL: 

I think it is the only possible explanation for that failure

Thanks a ton for your help -- it really sucks I can't think of an easy way to test this locally. My Technet subscription expired :)

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 23, 2019

That worked. I had to keep the IS_WSL = True set in mitogen/core.py though so the check for that will need updated as well.

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Thanks a ton for this! Looks like I've got everything I need. Yay remote debugging :)

dw added a commit that referenced this issue Feb 23, 2019

dw added a commit that referenced this issue Feb 23, 2019

dw added a commit that referenced this issue Feb 23, 2019

Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #550: parent: add explanatory comment.
  issue #550: fix up TTY ioctls on WSL 2016 Anniversary Update
  docs: update Changelog.
  service: make service list optional.
@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Hi there,

Would you mind giving current master a test? It should be fixed now

dw added a commit that referenced this issue Feb 23, 2019

@zyphermonkey

This comment has been minimized.

Copy link

commented Feb 23, 2019

Just tried the current master. It works with no errors. 👍

@dw

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Thanks so much!

@dw dw closed this Feb 23, 2019

dw added a commit that referenced this issue Mar 6, 2019

Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #554: mitogen_action_script fix
  issue #554: fix Ansible 2.4 compatibility
  issue #554: don't rely on tmp_path autoremoval in test.
  issue #554: track and remove multiple make_tmp_path() calls.
  docs: update Changelog.
  docs: drastically simplify install/changelog.
  issue #552: include process identity in log messages.
  issue #550: update Changelog.

dw added a commit that referenced this issue Mar 6, 2019

Merge remote-tracking branch 'origin/026' into stable
* origin/026:
  docs: update Changelog for release.
  Bump version for release.
  issue #555: ansible: workaround ancient reload(sys) hack.
  issue #554: mitogen_action_script fix
  issue #554: fix Ansible 2.4 compatibility
  issue #554: don't rely on tmp_path autoremoval in test.
  issue #554: track and remove multiple make_tmp_path() calls.
  docs: update Changelog.
  docs: drastically simplify install/changelog.
  issue #552: include process identity in log messages.
  issue #550: update Changelog.
  issue #550: parent: add explanatory comment.
  issue #550: fix up TTY ioctls on WSL 2016 Anniversary Update
  docs: update Changelog.
  service: make service list optional.
  docs: update Changelog; closes #548.
  issue #548: always treat transport=smart as 'ssh' for mitogen_via=.
  docs: better intro paragraph.
  .ci: copy private key file to tempdir.
  os_fork: more doc tweaks
  os_fork: more doc tweaks
  os_fork: yet more doc tidyup
  os_fork: more doc tweaks
  os_fork: clean up docs
  .ci: import soak scripts.
  .ci: allow containers for different jobs to run simultaneously
  os_fork: python 3 fixes and tests.
  issue #535: activate Corker on 2.4 in master too.
  issue #535: update Changelog.
  issue #535: wire mitogen.os_fork into Broker and Pool.
  issue #535: parent: add create_socketpair(size=..) parameter.
  issue #535: introduce mitogen.os_fork module and Corker class.
  issue #535: docs: update Changelog
  issue #535: service: support Pool.defer() like Broker.defer()
  issue #535: core: unicode.encode() may take importer lock on 2.x
  issue #535: docs: fix up Select doc
  issue #535: docs: update Changelog.
  issue #535: core/select: support selecting from Latches.
  core: increase cookie field lengths to 64-bit; closes #545.
  tests: ensure serialization restrictions are in effect
  tests/bench: set process affinity in throughput.py.
  docs: update copyright year.
  docs: update Changelog.
  core: Make Latch.put(obj=) optional.
  docs: change 'unreleased' Changelog format and add a hint.
  docs: update Changelog; closes #542.
  issue #542: return of select poller, new selection logic
  issue #542: .ci: move some tests to Azure and enable Mac job.
  ansible: create stub __init__.py for sdist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.