Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect path used only during synchronize task on FreeBSD #602

Closed
dch opened this issue Jul 23, 2019 · 5 comments

Comments

@dch
Copy link

commented Jul 23, 2019

Please drag-drop large logs as text file attachments.

Feel free to write an issue in your preferred format, however if in doubt, use
the following checklist as a guide for what to include.

  • Which version of Ansible are you running?
  • both 2.7 and 2.8 tried
  • Is your version of Ansible patched in any way?
  • no (vanilla FreeBSD pkg sysutils/ansible, python 3.6 version)
  • Are you running with any custom modules, or module_utils loaded?
  • no
  • Have you tried the latest master version from Git?
  • yes, same fail
  • FreeBSD 12.0R amd64 on both ends
  • Mention your host and target Python versions
  • std python 3.6 from FreeBSD ports at both ends
  • If reporting a performance issue, mention the number of targets and a rough
    description of your workload (lots of copies, lots of tiny file edits, etc.)

  • If reporting any kind of problem with Ansible, please include the Ansible
    version along with output of "ansible-config dump --only-changed".

TASK [poudriere : poudriere | push out port options] ********************************************************************************************************
task path: /projects/koans/ansible/roles/poudriere/tasks/jobs.yml:12
[task 68971] 18:42:13.405122 D mitogen: unix.connect(path='/tmp/mitogen_unix_pfouom0w.sock')
[mux  97281] 18:42:13.406947 D mitogen.io: KqueuePoller(0x804bbcda0): POLLIN: 14
[mux  97281] 18:42:13.407502 D mitogen: mitogen.unix.Listener('/tmp/mitogen_unix_pfouom0w.sock'): accepted mitogen.core.Stream('unix_client.68971')
[task 68971] 18:42:13.407582 D mitogen: unix.connect(): local ID is 2004, remote is 0
[mux  97281] 18:42:13.407904 D mitogen.io: Broker(0x804bbccc0).start_receive(mitogen.core.Stream('unix_client.68971'))
[task 68971] 18:42:13.408112 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method Router._on_del_route of Router(Broker(0x80534b828))>, 104, True)
[mux  97281] 18:42:13.408158 D mitogen.io: Waker(Broker(0x804bbccc0) rfd=10, wfd=12).defer() [immediate]
[mux  97281] 18:42:13.408399 D mitogen.io: KqueuePoller(0x804bbcda0).start_receive(81, (<Side of mitogen.core.Stream('unix_client.68971') fd 81>, <bound method Stream.on_receive of mitogen.core.Stream('unix_client.68971')>))
[task 68971] 18:42:13.408428 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method IdAllocator.on_allocate_id of IdAllocator(Router(Broker(0x80534b828)))>, 105, True)
[mux  97281] 18:42:13.408633 D mitogen.io: KqueuePoller(0x804bbcda0)._control(81, -1, 1)
[task 68971] 18:42:13.408840 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method ModuleResponder._on_get_module of ModuleResponder(Router(Broker(0x80534b828)))>, 100, True)
[mux  97281] 18:42:13.408894 D mitogen.io: Broker(0x804bbccc0)._loop_once(None, KqueuePoller(0x804bbcda0))
[mux  97281] 18:42:13.409103 D mitogen.io: KqueuePoller(0x804bbcda0).poll(None)
[task 68971] 18:42:13.409116 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method LogForwarder._on_forward_log of LogForwarder(Router(Broker(0x80534b828)))>, 102, True)
[task 68971] 18:42:13.409486 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method RouteMonitor._on_add_route of RouteMonitor()>, 103, True)
[task 68971] 18:42:13.409724 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method RouteMonitor._on_del_route of RouteMonitor()>, 104, True)
[task 68971] 18:42:13.409940 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method Router._on_detaching of Router(Broker(0x80534b828))>, 109, True)
[task 68971] 18:42:13.410420 D mitogen.io: Broker(0x80534b828).start_receive(mitogen.core.Stream('unix_listener.97281'))
[task 68971] 18:42:13.410651 D mitogen.io: Waker(Broker(0x80534b828) rfd=15, wfd=16).defer() [fd=16]
[task 68971] 18:42:13.412358 D mitogen.io: Router(Broker(0x80534b828)).add_handler(<bound method Receiver._on_receive of Receiver(Router(Broker(0x80534b828)), None)>, 1000, False)
[task 68971] 18:42:13.412755 D mitogen.io: KqueuePoller(0x804cb20b8): POLLIN: 15
[task 68971] 18:42:13.412966 D mitogen.io: Waker(Broker(0x80534b828) rfd=15, wfd=16).defer() [fd=16]
[task 68971] 18:42:13.413619 D mitogen.io: Waker(Broker(0x80534b828) rfd=15, wfd=16).on_receive()
[task 68971] 18:42:13.414572 D mitogen.io: Receiver(Router(Broker(0x80534b828)), 1000).get(timeout=None, block=True)
[task 68971] 18:42:13.414911 D mitogen.io: KqueuePoller(0x804cb20b8).start_receive(19, (<Side of mitogen.core.Stream('unix_listener.97281') fd 19>, <bound method Stream.on_receive of mitogen.core.Stream('unix_listener.97281')>))
[task 68971] 18:42:13.415218 D mitogen.io: Latch(0x804cb2518, size=0, t='MainThread').get(timeout=None, block=True)
[task 68971] 18:42:13.415564 D mitogen.io: KqueuePoller(0x804cb20b8)._control(19, -1, 1)
[task 68971] 18:42:13.416334 D mitogen.io: Latch(0x804cb2518, size=0, t='MainThread')._get_sleep(timeout=None, block=True, rfd=18, wfd=21)
[task 68971] 18:42:13.416587 D mitogen.io: Router(Broker(0x80534b828))._async_route(Message(0, 2004, 2004, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202), None)
[task 68971] 18:42:13.416699 D mitogen.io: PollPoller(0x804cb26a0).poll(None)
[task 68971] 18:42:13.417070 D mitogen.io: mitogen.core.Stream('unix_listener.97281')._send(Message(0, 2004, 2004, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202))
[task 68971] 18:42:13.417599 D mitogen.io: Broker(0x80534b828)._loop_once(None, KqueuePoller(0x804cb20b8))
[mux  97281] 18:42:13.417607 D mitogen.io: KqueuePoller(0x804bbcda0): POLLIN: 81
[task 68971] 18:42:13.417819 D mitogen.io: KqueuePoller(0x804cb20b8).poll(None)
[mux  97281] 18:42:13.417834 D mitogen.io: mitogen.core.Stream('unix_client.68971').on_receive()
[mux  97281] 18:42:13.418164 D mitogen.io: Router(Broker(0x804bbccc0))._async_route(Message(0, 2004, 2004, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202), mitogen.core.Stream('unix_client.68971'))
[mux  97281] 18:42:13.418408 D mitogen.io: Receiver(Router(Broker(0x804bbccc0)), 110)._on_receive(Message(0, 2004, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202))
[mux  97281] 18:42:13.418628 D mitogen.io: Latch(0x804bce400, size=0, t='mitogen.broker').put(Message(0, 2004, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202))
[mux  97281] 18:42:13.418857 D mitogen.io: Latch(0x804bce4a8, size=0, t='mitogen.broker').put(Receiver(Router(Broker(0x804bbccc0)), 110))
[mux  97281] 18:42:13.419077 D mitogen.io: Latch(0x804bce4a8, size=1, t='mitogen.broker').put() -> waking wfd=36
[mux  97281] 18:42:13.419322 D mitogen.io: Broker(0x804bbccc0)._loop_once(None, KqueuePoller(0x804bbcda0))
[mux  97281] 18:42:13.419589 D mitogen.io: PollPoller(0x804dbf978): POLLIN|POLLHUP for 35
[mux  97281] 18:42:13.419724 D mitogen.io: KqueuePoller(0x804bbcda0).poll(None)
[mux  97281] 18:42:13.420323 D mitogen.io: Latch(0x804bce4a8, size=1, t='mitogen.service.Pool.804bce320.worker-10').get() wake -> Receiver(Router(Broker(0x804bbccc0)), 110)
[mux  97281] 18:42:13.420662 D mitogen.io: Receiver(Router(Broker(0x804bbccc0)), 110).get(timeout=None, block=False)
[mux  97281] 18:42:13.420881 D mitogen.io: Latch(0x804bce400, size=1, t='mitogen.service.Pool.804bce320.worker-10').get(timeout=None, block=False)
[mux  97281] 18:42:13.421093 D mitogen.io: Latch(0x804bce400, size=1, t='mitogen.service.Pool.804bce320.worker-10').get() -> Message(0, 2004, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202)
[mux  97281] 18:42:13.421391 D mitogen.io: Message(0, 2004, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202).unpickle()
[mux  97281] 18:42:13.421707 D mitogen.io: Message(0, 2004, 0, 110, 1000, b"\x80\x02X'\x00\x00\x00ansible_mitogen.services.ContextServiceq\x00X\x03"..202).unpickle()
[mux  97281] 18:42:13.422335 D mitogen: mitogen.parent.Stream('default').connect()
[mux  97281] 18:42:13.569222 D mitogen.io: Latch(0x80980dd68, size=0, t='mitogen.service.Pool.804bce320.worker-10').put((<class 'mitogen.core.StreamError'>, StreamError('Child start failed: [Errno 2] No such file or directory: \'/usr/bin/python\': \'/usr/bin/python\'. Command was: /usr/bin/python -c "import codecs,os,sys;_=codecs.decode;exec(_(_(\\"eNqFkc1......\".encode(),\\"base64\\"),\\"zip\\"))"',), <traceback object at 0x809f89bc8>))

All hundereds of preceding tasks in other roles & plays run perfectly -- except this one -- which is doing a simple rsync. For some reason the selected path is not correct, it should be:

# inventory.ini:
[all:vars]
ansible_port=               2200
ansible_user=               ansible
ansible_python_interpreter= /usr/local/bin/python3.6
...
---
- name: THIS WORKS | push out pkg build lists
  copy:
    src: "files/server.pkg"
    dest: "/usr/local/etc/poudriere.d/server.pkg"
  notify: poudriere | build
  tags:
    - build_pkgs
    - poudriere
    - pkg

- name: THIS FAILS | push out port options
  synchronize:
    src: "files/poudriere.d/options/"
    dest: "/usr/local/etc/poudriere.d/options/"
    delete: yes
    recursive: yes
  notify: poudriere | build
  tags:
    - build_pkgs
@dw

This comment has been minimized.

Copy link
Owner

commented Jul 23, 2019

Hi David!

Is this the most descriptive part of the failure?

[mux 97281] 18:42:13.569222 D mitogen.io: Latch(0x80980dd68, size=0, t='mitogen.service.Pool.804bce320.worker-10').put((<class 'mitogen.core.StreamError'>, StreamError('Child start failed: [Errno 2] No such file or directory: \'/usr/bin/python\': \'/usr/bin/python\'. Command was: /usr/bin/python -c "import codecs,os,sys;_=codecs.decode;exec(_(_(\\"eNqFkc1......\".encode(),\\"base64\\"),\\"zip\\"))"',), <traceback object at 0x809f89bc8>))

Did the job hang after that, or did it print some other kind of exception?

My immediate suspicion is an internal problem with handling variables -- synchronize is a little special because it is an action plugin.

Do you have anything like delegate_to or become: true set for the entire block?

Also do you have ansible_python_interpreter set and/or does /usr/bin/python really exist on those machines.

I'm trying to get the huge stream-refactor branch merged right now as a priority for completing 2.8 support, but will follow up with this ASAP

@dch

This comment has been minimized.

Copy link
Author

commented Jul 23, 2019

hi David, this isn't urgent (I leave for a month's holiday this week) but seems curious - mitogen has IIRC always failed on this 1 role in all my plays.

Yep that line is the weird one, the path should be /usr/local/bin/python3.6. FWIW there are well over a hundred tasks and roles that run prior, on this & other machines, and the ansible_python_interpreter is set globally in ansible.cfg. It appears to be picked up correctly earlier - I added more debug output below. It's only the synchronize module that gets confused. The job blows up at this point, the subsequent step is to run handlers, which obviously aren't triggered. I attached a full log.
koans.mitogen.log

A non-mitogen run shows this:

TASK [poudriere : poudriere | push out port options] ********************************************************************************************************
task path: /projects/koans/ansible/roles/poudriere/tasks/jobs.yml:12
ok: [i09.koans] => changed=false 
  cmd: /usr/local/bin/rsync --delay-updates -F --compress --delete-after --archive --rsh=/usr/bin/ssh -S none -o Port=2200 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null --rsync-path=sudo rsync --out-format=<<CHANGED>>%i %n%L /projects/koans/ansible/files/poudriere.d/options/ ansible@i09.koans:/usr/local/etc/poudriere.d/options/
  msg: ''
  rc: 0
  stdout_lines: []

the base64-coded string appears to be this:

b"R,W=os.pipe()\nr,w=os.pipe()\nif os.fork():\n\tos.dup2(0,100)\n\tos.dup2(R,0)\n\tos.dup2(r,101)\n\tos.close(R)\n\tos.close(r)\n\tos.close(W)\n\tos.close(w)\n\tif sys.platform == 'darwin' and sys.executable == '/usr/bin/python':\n\t\tsys.executable += sys.version[:3]\n\tos.environ['ARGV0']=sys.executable\n\tos.execl(sys.executable,sys.executable+'(mitogen:dch@wintermute.skunkwerks.at:82286)')\nos.write(1,'MITO000\\n'.encode())\nC=_(os.fdopen(0,'rb').read(15844),'zip')\nfp=os.fdopen(W,'wb',0)\nfp.write(C)\nfp.close()\nfp=os.fdopen(w,'wb',0)\nfp.write(C)\nfp.close()\nos.write(1,'MITO001\\n'.encode())\nos.close(2)"
@dw

This comment has been minimized.

Copy link
Owner

commented Jul 23, 2019

The base64'd chunk is the "first stage", it's possibly the ugliest trick in the library :)

I'm not looking at this just now, but there are at least two interesting things

  • many tasks have already been running over the same connection configuration, yet somehow it's trying to start a new connection. This shouts 'variables mismatch', because those variables contribute to the key used to pick an existing connection

  • the connection being started is not over SSH or some remote transport -- it is attempting to start as a brand new child (perhaps on the local machine?). The failing connection is probably somehow related to the 'local' transport, or for some sudo use

  • /usr/bin/python is definitely wrong if you've got ansible_python_interpeter set.

  • it's an action plug-in, so the truth is probably found in the exact set of calls made in synchronize.py.

Will dig into this sometime tomorrow. Thanks for reporting!

@dw

This comment has been minimized.

Copy link
Owner

commented Aug 1, 2019

I think I can see what's happening. It does not relate to the target machine at all, but since you are running it on FreeBSD also, it relates to the local machine.

The synchronize plug-in uses connection_loader.get('local', ...) for whatever reason to construct a new Connection object that refers to the local machine. However, Connection.on_action_run() is never invoked for this new instance because the patched Action plug-in doesn't know about it.

It seems hostvars is being used to fetch ansible_python_interpreter for the local machine, and because on_action_run never gets called, the new connection has no access to hostvars.

dw added a commit that referenced this issue Aug 2, 2019

dw added a commit that referenced this issue Aug 2, 2019

Merge remote-tracking branch 'origin/602-recover-taskvars'
* origin/602-recover-taskvars:
  issue #602: update Changelog
  issue #602: recover task_vars for synchronize and meta: reset_connection
  ansible: remove cutpasted docstring
  [linear2] merge fallout: restore optimization from #491 / 7b129e8
@dw

This comment has been minimized.

Copy link
Owner

commented Aug 2, 2019

Hi David,

The problem was the same as another bug buried somewhere in the tickets :) This fix also benefits meta: reset_connection. I reproduced your configuration and verified the fix. Thanks a ton for reporting it, and apologies for the delay


This is now on the master branch and will make it into the next release. To be updated when a new release is made, subscribe to https://networkgenomics.com/mail/mitogen-announce/

Thanks for reporting this!

@dw dw closed this Aug 2, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.