New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker thread crashes on debops playbook #110

Closed
stefangweichinger opened this Issue Mar 7, 2018 · 40 comments

Comments

Projects
None yet
3 participants
@stefangweichinger
Copy link

stefangweichinger commented Mar 7, 2018

Doing first steps here.
ansible 2.4.3.0 in a python virtual-env, python version = 2.7.14, all on a Fedora 28 machine.

Maybe I miss some dependencies but as I am no python-coder I have problems spotting where to start.

I get:

$ time ansible-playbook run_hostname_100_times.yml 

PLAY [tx100.local] ******************************************************************************************************************************************************************

TASK [Run hostname] *****************************************************************************************************************************************************************
13:46:41 W mitogen: get_module_source('time'): cannot find source
13:46:41 W mitogen: get_module_source('operator'): cannot find source
13:46:41 W mitogen: get_module_source('_locale'): cannot find source
13:46:41 W mitogen: get_module_source('grp'): cannot find source
13:46:41 W mitogen: get_module_source('datetime'): cannot find source
13:46:41 W mitogen: get_module_source('syslog'): cannot find source
13:46:41 W mitogen: get_module_source('itertools'): cannot find source
13:46:41 W mitogen: get_module_source('select'): cannot find source
13:46:41 W mitogen: get_module_source('_random'): cannot find source
13:46:41 W mitogen: get_module_source('binascii'): cannot find source
13:46:41 W mitogen: get_module_source('math'): cannot find source
13:46:41 W mitogen: get_module_source('fcntl'): cannot find source
13:46:41 W mitogen: get_module_source('cStringIO'): cannot find source
13:46:41 W mitogen: get_module_source('_hashlib'): cannot find source
13:46:41 W mitogen: get_module_source('cPickle'): cannot find source
13:46:41 W mitogen: get_module_source('_collections'): cannot find source
13:46:41 W mitogen: get_module_source('zlib'): cannot find source
13:46:41 W mitogen: get_module_source('bz2'): cannot find source
13:46:41 W mitogen: get_module_source('_json'): cannot find source
13:46:41 W mitogen: get_module_source('_io'): cannot find source
13:46:41 W mitogen: get_module_source('strop'): cannot find source
13:46:41 W mitogen: get_module_source('_functools'): cannot find source
13:46:41 W mitogen: get_module_source('_struct'): cannot find source
13:46:41 W mitogen: get_module_source('_heapq'): cannot find source
changed: [tx100.local] => (item=1)
changed: [tx100.local] => (item=2)
changed: [tx100.local] => (item=3)

and playbook runs through.
(how) Can I fix these messages?

I am on my way to try to use your strategy within debops-projects https://github.com/debops/debops

thanks, Stefan

@dw dw closed this in 61923f1 Mar 7, 2018

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 7, 2018

Hi Stefan,

Mitogen's logging is not yet integrated into Ansible's, so presently soft errors (and plenty more) just get dumped on stdout. I've changed the log level of this particular error though, so it won't show up any more. Thanks for reporting.

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 7, 2018

Debops looks very cool! Please keep filing if you find more stuff

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 7, 2018

trying your latest commits now within a simple debops-project dir: one target machine, running various debops-playbooks to test and measure speed etc

For now I get:

TASK [Gathering Facts] **************************************************************************************************************************************************************
14:10:41 E mitogen: _broker_main() crashed
Traceback (most recent call last):
  File "/home/sgw/projects/github/mitogen/mitogen/core.py", line 1212, in _broker_main
    self._loop_once()
  File "/home/sgw/projects/github/mitogen/mitogen/core.py", line 1194, in _loop_once
    (), timeout
  File "/home/sgw/projects/github/mitogen/mitogen/core.py", line 171, in restart
    return func(*args)
error: (9, 'Bad file descriptor')
14:10:41 E mitogen: _broker_main() crashed
Traceback (most recent call last):
  File "/home/sgw/projects/github/mitogen/mitogen/core.py", line 1212, in _broker_main
    self._loop_once()
  File "/home/sgw/projects/github/mitogen/mitogen/core.py", line 1194, in _loop_once
    (), timeout
  File "/home/sgw/projects/github/mitogen/mitogen/core.py", line 171, in restart
    return func(*args)
error: (9, 'Bad file descriptor')

and execution hangs ...

I wonder if I maybe have to add some specific paths pointing to your stuff within my local .debops/ansible.cfg

(DebOps uses a .debops.cfg within the project dir to configure paths to roles etc)

@dw dw changed the title first tests, ansible in virtual env: what do I miss here? Broker thread crashes on debops playbook Mar 7, 2018

@dw dw reopened this Mar 7, 2018

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 7, 2018

Whoa this one looks way more interesting! Can you tell me which playbook triggered it and the command line you used? I'll try to reproduce later.

This is a bug in the Mitogen bits, not the Ansible bits :)

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 7, 2018

It crashed during fact gathering?! Does it happen reliably?

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 7, 2018

I have to do a bunch of hours commercial work, will be back to this later. Thanks so much for testing! It's frustrating to see these stack traces being reported, and being unable to attach a debugger ;)

@dw dw added bug ansible labels Mar 7, 2018

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 7, 2018

Yes, I can reproduce it every time so far.
Yes, in fact gathering.

The test is running "debops common", that means running

https://github.com/debops/debops/blob/master/ansible/playbooks/common.yml

and pulls in loads of stuff, afaik ;-)

I'll be afk as well for a while, no hurry.

@drybjed

This comment has been minimized.

Copy link

drybjed commented Mar 7, 2018

@dw Hello, DebOps author here. Mitogen looks very interesting and I hope it could make DebOps faster later on, currently playbook runs can take from 20 to 40 minutes depending on the hardware and enabled services.

I think that easiest way for you to test this would be with Vagrant, the DebOps monorepo works with Debian Stretch box on KVM (libvirt) and LXC. Just run:

git clone https://github.com/debops/debops ; cd debops
vagrant up

When the VM is set up, you can run the common playbook with:

vagrant ssh ; cd src/controller ; debops common

Since you need to add custom configuration in ansible.cfg configuration file, you can do this by adding it in the [ansible defaults] section in the .debops.cfg file. When you run the debops command, the options you added will be inserted in the Ansible configuration file automatically.

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

unsure if to comment here or in #112 now: pulled your latest commits, current behavior is about the same (ok, understood, no commits fixing this so far ;-) ).

log output.txt

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Hi Stefan,

Would you mind running it again with "ansible -vvvv" /in addition/ to MITOGEN_ROUTER_DEBUG=1 set in your process environment?

The former will produce full Ansible logs on stdout on the controller machine, the latter will additionally produce /tmp/mitogen.*.log on every machine the code runs on. Please attach it all to the ticket :)

Still trying to get a debops setup on my laptop, but looks like I'm going to have to use a remote VM simply due to size

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Oh, to be clear, I just checked in support for MITOGEN_ROUTER_DEBUG :)

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

debops_common.log
mitogen.27048.log

I had some issues with a full btrfs-subvolume right now, just to add. But I think it is not related to this issue.

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Something /very/ bad is happening:

[pid 27300] 12:52:20 D mitogen: unix.connect(): local ID is 825375285, remote is 842674736

This is in the second task of the run, right after the worker process connects to the controller process and receives 2 integers from it. They're already corrupt.

Treating those crazy huge integers as hex, we see it's the text '12:52:20'. Something is writing logs over a brand new file descriptor opened in the master process.

I'm probably going to have to reproduce your full setup for this one, looks broken. Does the debops stuff enable any other Ansible extensions that might trigger this?

@drybjed

This comment has been minimized.

Copy link

drybjed commented Mar 8, 2018

DebOps has some custom lookup plugins with a companion Python module, but this shouldn't do any logging, I don't think that interferes.

@stefangweichinger, you could try running a playbook bypassing the debops script, in the project directory run:

ansible-playbook ~/.local/share/debops/debops/ansible/playbooks/service/core.yml -l <host>

or wherever your playbooks are located. You should also show the contents of the ansible.cfg config file the debops script generated to be sure (you might want to censor some things).

Also, does this happen when you try a simple Ansible playbook, or only with DebOps playbooks and others are unaffected?

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

The suggested call of ansible-playbook runs through fine.

ansible.cfg in the directory ~sgw/projects/tx100:

# Ansible configuration file generated by DebOps, all changes will be lost.
# You can manipulate the contents of this file via `.debops.cfg`.

[ssh_connection]

[paramiko]

[defaults]
filter_plugins = /home/sgw/projects/tx100/ansible/playbooks/filter_plugins:/home/sgw/projects/tx100/ansible/plugins/filter_plugins:/home/sgw/projects/tx100/ansible/filter_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/filter_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/filter_plugins:/home/sgw/projects/tx100/debops/ansible/plugins/filter_plugins:/home/sgw/projects/tx100/debops/ansible/filter_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/filter_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/filter_plugins:/home/sgw/.local/share/debops/debops/ansible/plugins/filter_plugins:/home/sgw/.local/share/debops/debops/ansible/filter_plugins:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/filter_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/filter_plugins:/usr/share/ansible/filter_plugins
roles_path = /home/sgw/projects/tx100/roles:/home/sgw/projects/tx100/ansible/roles:/home/sgw/projects/tx100/debops/ansible/roles:/home/sgw/.local/share/debops/debops/ansible/roles:/home/sgw/.local/share/debops/debops/ansible/playbooks/../roles:/home/sgw/.local/share/debops/debops/ansible/playbooks/roles:/etc/ansible/roles
connection_plugins = /home/sgw/projects/tx100/ansible/playbooks/connection_plugins:/home/sgw/projects/tx100/ansible/plugins/connection_plugins:/home/sgw/projects/tx100/ansible/connection_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/connection_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/connection_plugins:/home/sgw/projects/tx100/debops/ansible/plugins/connection_plugins:/home/sgw/projects/tx100/debops/ansible/connection_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/connection_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/connection_plugins:/home/sgw/.local/share/debops/debops/ansible/plugins/connection_plugins:/home/sgw/.local/share/debops/debops/ansible/connection_plugins:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/connection_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/connection_plugins:/usr/share/ansible/connection_plugins
library = /home/sgw/projects/tx100/ansible/playbooks/library:/home/sgw/projects/tx100/ansible/plugins/library:/home/sgw/projects/tx100/ansible/library:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/library:/home/sgw/projects/tx100/debops/ansible/playbooks/library:/home/sgw/projects/tx100/debops/ansible/plugins/library:/home/sgw/projects/tx100/debops/ansible/library:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/library:/home/sgw/.local/share/debops/debops/ansible/playbooks/library:/home/sgw/.local/share/debops/debops/ansible/plugins/library:/home/sgw/.local/share/debops/debops/ansible/library:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/library:/home/sgw/.local/share/debops/debops/ansible/playbooks/library:/usr/share/ansible/library
strategy = mitogen
ansible_managed = This file is managed remotely, all changes will be lost
inventory = /home/sgw/projects/tx100/ansible/inventory
action_plugins = /home/sgw/projects/tx100/ansible/playbooks/action_plugins:/home/sgw/projects/tx100/ansible/plugins/action_plugins:/home/sgw/projects/tx100/ansible/action_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/action_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/action_plugins:/home/sgw/projects/tx100/debops/ansible/plugins/action_plugins:/home/sgw/projects/tx100/debops/ansible/action_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/action_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/action_plugins:/home/sgw/.local/share/debops/debops/ansible/plugins/action_plugins:/home/sgw/.local/share/debops/debops/ansible/action_plugins:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/action_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/action_plugins:/usr/share/ansible/action_plugins
strategy_plugins = /home/sgw/projects/github/mitogen/ansible_mitogen/plugins/strategy
vars_plugins = /home/sgw/projects/tx100/ansible/playbooks/vars_plugins:/home/sgw/projects/tx100/ansible/plugins/vars_plugins:/home/sgw/projects/tx100/ansible/vars_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/vars_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/vars_plugins:/home/sgw/projects/tx100/debops/ansible/plugins/vars_plugins:/home/sgw/projects/tx100/debops/ansible/vars_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/vars_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/vars_plugins:/home/sgw/.local/share/debops/debops/ansible/plugins/vars_plugins:/home/sgw/.local/share/debops/debops/ansible/vars_plugins:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/vars_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/vars_plugins:/usr/share/ansible/vars_plugins
callback_plugins = /home/sgw/projects/tx100/ansible/playbooks/callback_plugins:/home/sgw/projects/tx100/ansible/plugins/callback_plugins:/home/sgw/projects/tx100/ansible/callback_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/callback_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/callback_plugins:/home/sgw/projects/tx100/debops/ansible/plugins/callback_plugins:/home/sgw/projects/tx100/debops/ansible/callback_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/callback_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/callback_plugins:/home/sgw/.local/share/debops/debops/ansible/plugins/callback_plugins:/home/sgw/.local/share/debops/debops/ansible/callback_plugins:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/callback_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/callback_plugins:/usr/share/ansible/callback_plugins
lookup_plugins = /home/sgw/projects/tx100/ansible/playbooks/lookup_plugins:/home/sgw/projects/tx100/ansible/plugins/lookup_plugins:/home/sgw/projects/tx100/ansible/lookup_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/playbooks/lookup_plugins:/home/sgw/projects/tx100/debops/ansible/playbooks/lookup_plugins:/home/sgw/projects/tx100/debops/ansible/plugins/lookup_plugins:/home/sgw/projects/tx100/debops/ansible/lookup_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/playbooks/lookup_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/lookup_plugins:/home/sgw/.local/share/debops/debops/ansible/plugins/lookup_plugins:/home/sgw/.local/share/debops/debops/ansible/lookup_plugins:/home/sgw/projects/debops-venv/lib/python2.7/site-packages/debops/ansible/lookup_plugins:/home/sgw/.local/share/debops/debops/ansible/playbooks/lookup_plugins:/usr/share/ansible/lookup_plugins

(nothing sensible in there, I assume)

For reference:

python-virtual-env in /home/sgw/projects/debops-venv
mitogen-fork in /home/sgw/projects/github/mitogen/

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

We have another smaller playbook that fails with:

fatal: [tx100.local]: FAILED! => {"changed": false, "failed_when_result": true, "msg": "non-zero return code", "rc": 126, "stderr": "/bin/sh: 1: /tmp/ansible-mitogen-tmp-QETXXn/dpkg-state: Permission denied\n", "stdout": "", "stdout_lines": []}

I can add the logs but maybe that already rings a bell.
There are files on the target like /tmp/ansible-mitogen*

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

That looks like a become: stanza isn't being respected. Can you please file a (terse, don't worry about that, I know your setup at this point) bug with the playbook name and I'll look at it later. This one should be an easy fix

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Hi again,

When you say a Fedora 28 machine, are you using some latest-and-greatest build, or did you mean 27? I can't find 28 download links anywhere :)

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

I somehow managed to get some pre-beta /alpha test release onto my systems.
F28 beta is announced for late march, yes.
That is a reason for using the virtualenv to be able to run downgraded Jinja2 etc

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

I'm building out a Fedora 27 box right now, if it doesn't work out then I'll do whatever you did to get the 28-ish box :) (the worry is that it's kernel related and specific to a release, Red Hat kernels are super patchy)

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Also, how many CPUs does your Fedora machine have?

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

The controller machine runs on intel i7-6700 w/ 8 cores(?) ... 4 cores, 2 threads each or so.
I have smaller CPUs in thinkpads as well but all on Fedora for now.

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

It's probably unimportant, it's just because the last time I saw "bad file descriptor" it was a parallelism issue, but we've already seen there is at least some kind of stream corruption happening too

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

I chose Fedora a few yrs ago because of the mix of rather up2date software and reliable updates, without having to compile my own stuff all the time (coming from gentoo linux). So far I am quite happy with the overall quality.

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

Ad "bad file descriptor": having my stuff on btrfs volumes might be an issue as well maybe. Can't tell how likely that is. And sure, we're talking linux-4.16 kernel here, rather bleeding edge.

I also don't know which file this points to, if it's on controller or target side(?)

I can start up a Thinkpad with Fedora27 and cross-check. That one should run a 4.15 kernel etc

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

same errors w/ kernel-4.14 on Fedora27

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

I'm pretty sure the file descriptor in question is the UNIX socket from the worker process that was showing the corruption. Somehow the IO loop is restarting while that descriptor is still marked for reading despite already being closed. It's probably a super simple problem to fix, I just need to see it on my own machine

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 8, 2018

So good luck with setting up F27 and debops on it. Do you think the virtualenv plays into it?
I just thought of the SElinux-related special step needed back then:

debops/debops@042c1c6

But I should stop chatting and let you get on with your setup.

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Re: virtualenv, probably not. 70% expecting this to be a plugins issue with debops, 30% a Fedora issue. Let's find out in the next hour or so :)

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Unrelated to this, but the debops playbooks exposed another bug during saving facts.. 12b422d

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

So there's good news and there's some bad news.

The good news (common.yml, Ansible 2.3.1.0):

screen shot 2018-03-09 at 04 29 25

The bad news is I haven't been able to replicate your crash, so over the weekend I'm going to try again with Fedora 28 and configuring as many vCPUs on my laptop as possible (it might still be concurrency related).

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

ansible-playbook status line

z                          : ok=212  changed=0    unreachable=0    failed=0

Ansible 2.4.3.0, Google Cloud

mitogen

About 2.5x time vs. pipelining

Generate firewall rules ------------------------------------------------- 7.71s
Enforce root system account --------------------------------------------- 1.47s
Ensure ferm is installed ------------------------------------------------ 0.72s
Generate rsyslog configuration rules ------------------------------------ 0.67s
Execute PKI realm commands ---------------------------------------------- 0.66s
Ensure that APT cache is valid ------------------------------------------ 0.61s
Install requested APT packages ------------------------------------------ 0.52s
Remove firewall rules --------------------------------------------------- 0.51s
Install PKI packages ---------------------------------------------------- 0.42s
Make sure libldap is installed ------------------------------------------ 0.42s

27.28user 6.76system 0:48.54elapsed 70%CPU (0avgtext+0avgdata 71632maxresident)k
0inputs+104outputs (0major+2058238minor)pagefaults 0swaps

pipelining

Generate firewall rules ------------------------------------------------ 16.57s
Create root directories ------------------------------------------------ 12.39s
Generate rsyslog configuration rules ------------------------------------ 3.39s
Generate configuration files -------------------------------------------- 2.15s
Enforce root system account --------------------------------------------- 1.58s
Remove firewall rules --------------------------------------------------- 1.31s
Generate hosts.allow entries -------------------------------------------- 1.28s
Remove legacy APT preferences ------------------------------------------- 1.25s
Ensure ferm is installed ------------------------------------------------ 1.02s
Configure OpenNTPd ------------------------------------------------------ 0.92s

36.50user 10.98system 2:02.27elapsed 38%CPU (0avgtext+0avgdata 69592maxresident)k
0inputs+5184outputs (0major+3755633minor)pagefaults 0swaps

Ansible 2.4.3.0

mitogen

About 2.66x time vs. pipelining

Generate firewall rules ------------------------------------------------ 14.53s
Enforce root system account --------------------------------------------- 2.85s
Create root directories ------------------------------------------------- 1.96s
Gathering Facts --------------------------------------------------------- 1.93s
Remove unknown firewall rules ------------------------------------------- 1.16s
Generate rsyslog configuration rules ------------------------------------ 1.12s
Ensure that APT cache is valid ------------------------------------------ 1.11s
Ensure ferm is installed ------------------------------------------------ 1.07s
Remove firewall rules --------------------------------------------------- 1.07s
Install requested APT packages ------------------------------------------ 0.81s
Generate firewall rules ------------------------------------------------ 14.53s
Enforce root system account --------------------------------------------- 2.85s
Create root directories ------------------------------------------------- 1.96s
Gathering Facts --------------------------------------------------------- 1.93s
Remove unknown firewall rules ------------------------------------------- 1.16s
Generate rsyslog configuration rules ------------------------------------ 1.12s
Ensure that APT cache is valid ------------------------------------------ 1.11s
Ensure ferm is installed ------------------------------------------------ 1.07s
Remove firewall rules --------------------------------------------------- 1.07s
Install requested APT packages ------------------------------------------ 0.81s

60.46user 12.14system 1:32.54elapsed 78%CPU (0avgtext+0avgdata 87232maxresident)k
7464inputs+864outputs (22major+2233623minor)pagefaults 0swaps

pipelining

Generate firewall rules ------------------------------------------------ 44.74s
Create root directories ------------------------------------------------- 7.14s
Generate configuration files -------------------------------------------- 6.86s
Generate rsyslog configuration rules ------------------------------------ 6.53s
Remove firewall rules --------------------------------------------------- 4.01s
Enforce root system account --------------------------------------------- 3.44s
Remove legacy APT preferences ------------------------------------------- 3.14s
Generate hosts.allow entries -------------------------------------------- 2.80s
Install local fact scripts ---------------------------------------------- 2.32s
Generate custom log rotation configuration ------------------------------ 2.31s
Generate firewall rules ------------------------------------------------ 44.74s
Create root directories ------------------------------------------------- 7.14s
Generate configuration files -------------------------------------------- 6.86s
Generate rsyslog configuration rules ------------------------------------ 6.53s
Remove firewall rules --------------------------------------------------- 4.01s
Enforce root system account --------------------------------------------- 3.44s
Remove legacy APT preferences ------------------------------------------- 3.14s
Generate hosts.allow entries -------------------------------------------- 2.80s
Install local fact scripts ---------------------------------------------- 2.32s
Generate custom log rotation configuration ------------------------------ 2.31s

110.05user 22.05system 4:06.66elapsed 53%CPU (0avgtext+0avgdata 80388maxresident)k
0inputs+6560outputs (0major+4211810minor)pagefaults 0swaps

no pipelining

Didn't bother measuring

Ansible 2.3.1.0

mitogen

About 1.75x time vs. pipelining

Generate firewall rules ------------------------------------------------ 27.04s
Enforce root system account --------------------------------------------- 4.93s
Generate rsyslog configuration rules ------------------------------------ 2.06s
Remove firewall rules --------------------------------------------------- 1.77s
Gathering Facts --------------------------------------------------------- 1.51s
Remove unknown firewall rules ------------------------------------------- 1.31s
Ensure ferm is installed ------------------------------------------------ 1.18s
Remove custom config files when requested ------------------------------- 1.07s
Generate configuration files -------------------------------------------- 1.01s
Generate hosts.allow entries -------------------------------------------- 0.91s

88.06user 20.09system 1:59.59elapsed 90%CPU (0avgtext+0avgdata 78684maxresident)k
0inputs+112outputs (0major+2096320minor)pagefaults 0swaps

pipelining

Generate firewall rules ------------------------------------------------ 35.92s
Generate rsyslog configuration rules ------------------------------------ 7.60s
Generate configuration files -------------------------------------------- 4.11s
Create root directories ------------------------------------------------- 3.98s
Enforce root system account --------------------------------------------- 3.20s
Generate hosts.allow entries -------------------------------------------- 2.98s
Generate custom log rotation configuration ------------------------------ 2.77s
Remove firewall rules --------------------------------------------------- 2.68s
Remove legacy APT preferences ------------------------------------------- 2.40s
Install local fact scripts ---------------------------------------------- 1.97s

69.72user 18.74system 3:30.93elapsed 41%CPU (0avgtext+0avgdata 70520maxresident)k
0inputs+5088outputs (0major+3745139minor)pagefaults 0swaps

no pipelining

Generate firewall rules ------------------------------------------------ 52.78s
Generate rsyslog configuration rules ----------------------------------- 11.96s
Generate configuration files -------------------------------------------- 9.10s
Create root directories ------------------------------------------------- 7.71s
Remove firewall rules --------------------------------------------------- 5.83s
Remove legacy APT preferences ------------------------------------------- 5.01s
Generate hosts.allow entries -------------------------------------------- 4.60s
Generate custom log rotation configuration ------------------------------ 4.27s
Enforce root system account --------------------------------------------- 3.28s
Make sure required directories exist ------------------------------------ 3.20s

92.12user 36.94system 5:28.59elapsed 39%CPU (0avgtext+0avgdata 72560maxresident)k
0inputs+74176outputs (0major+5978072minor)pagefaults 0swaps
@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 8, 2018

Can you please send me the 'pip freeze' output from your virtualenv too

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 9, 2018

  • pip freeze
ansible==2.4.3.0
asn1crypto==0.24.0
bcrypt==3.1.4
cffi==1.11.4
cryptography==2.1.4
debops==0.7.0
enum34==1.1.6
funcsigs==1.0.2
idna==2.6
ipaddress==1.0.19
Jinja2==2.8
linecache2==1.0.0
MarkupSafe==1.0
mock==2.0.0
netaddr==0.7.19
paramiko==2.4.0
passlib==1.7.1
pbr==3.1.1
py==1.5.2
pyasn1==0.4.2
pycparser==2.18
PyNaCl==1.2.1
python2==1.2
pytz==2018.3
PyYAML==3.12
six==1.11.0
traceback2==1.4.0

Might have unneeded packages from me trying to install unnecessary deps.

  • Runtime of debops.common would only be comparable if we had the same config for the target machine, right? Plus the same initial state to do the same changes IMO. I will setup a clean debian target VM in VirtualBox tmrw.
@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 9, 2018

new deb9 machine on local VirtualBox as target for debops

Initial run of debops.common:

Generate firewall rules ------------------------------------------------ 13.08s
Install requested APT packages ----------------------------------------- 12.99s
Install PKI packages --------------------------------------------------- 11.03s
Update APT cache -------------------------------------------------------- 9.32s
Remove unknown firewall rules ------------------------------------------- 4.63s
Save local facts -------------------------------------------------------- 4.33s
Generate configuration files -------------------------------------------- 3.16s
Generate rsyslog configuration rules ------------------------------------ 3.01s
Create root directories ------------------------------------------------- 2.69s
Ensure ferm is installed ------------------------------------------------ 2.60s
Generate firewall rules ------------------------------------------------ 13.08s
Install requested APT packages ----------------------------------------- 12.99s
Install PKI packages --------------------------------------------------- 11.03s
Update APT cache -------------------------------------------------------- 9.32s
Remove unknown firewall rules ------------------------------------------- 4.63s
Save local facts -------------------------------------------------------- 4.33s
Generate configuration files -------------------------------------------- 3.16s
Generate rsyslog configuration rules ------------------------------------ 3.01s
Create root directories ------------------------------------------------- 2.69s
Ensure ferm is installed ------------------------------------------------ 2.60s

real	2m59,550s
user	0m59,450s
sys	0m19,614s

Immediate rerun:

Generate firewall rules ------------------------------------------------ 11.74s
Create root directories ------------------------------------------------- 2.81s
Generate configuration files -------------------------------------------- 2.46s
Clone acme-tiny source code --------------------------------------------- 2.38s
Generate rsyslog configuration rules ------------------------------------ 2.31s
Remove firewall rules --------------------------------------------------- 1.70s
Remove legacy APT preferences ------------------------------------------- 1.66s
Enforce root system account --------------------------------------------- 1.29s
Remove custom config files when requested ------------------------------- 1.03s
Generate hosts.allow entries -------------------------------------------- 0.92s
Generate firewall rules ------------------------------------------------ 11.74s
Create root directories ------------------------------------------------- 2.81s
Generate configuration files -------------------------------------------- 2.46s
Clone acme-tiny source code --------------------------------------------- 2.38s
Generate rsyslog configuration rules ------------------------------------ 2.31s
Remove firewall rules --------------------------------------------------- 1.70s
Remove legacy APT preferences ------------------------------------------- 1.66s
Enforce root system account --------------------------------------------- 1.29s
Remove custom config files when requested ------------------------------- 1.03s
Generate hosts.allow entries -------------------------------------------- 0.92s

real	1m42,700s
user	0m49,001s
sys	0m15,784s

Same broker crash with mitogen enabled.

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 9, 2018

I've finally managed to replicate it on a 16 vCPU google cloud box. Thanks for your help

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 9, 2018

Great to hear. I am looking forward to any news here.

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 10, 2018

It's actually 2 bugs :) The first is a graceful shutdown problem, that's the real bug that's causing the crash. The second is that shutdown is happening between every play in a multi-play run, which is super bad juju, since it means all the SSH connections and cached code uploads are discarded

I'm attacking number 2 first because it's easier to fix, and will likely convert number 1 into a milder "crashes during exit" problem, which buys me plenty more time to fix it.

dw added a commit that referenced this issue Mar 10, 2018

issue #110: split process state out of strategy state
The strategy is reconstructed for every playbook that is included or
specified on the command line, therefore we can't store the global
Router there without losing all our SSH connections across playbooks.

dw added a commit that referenced this issue Mar 10, 2018

@dw

This comment has been minimized.

Copy link
Owner

dw commented Mar 10, 2018

I've updated the benchmark above with common.yml timings from the Google Cloud 16 CPU box. It now runs to completion, although I've found 2 more issues in the process (one with transfer_data already fixed, the other #122, probably an easy fix).

I've opened a separate ticket for the shutdown bug, and closing this ticket as it's gotten huge :)

Thanks again for reporting this, and please let me know if you bump into any more issues! I'm going to setup CI at some point that runs as much of Debops as possible, I bet there are a bunch more issues hiding in there

@dw dw closed this Mar 10, 2018

@stefangweichinger

This comment has been minimized.

Copy link
Author

stefangweichinger commented Mar 10, 2018

Confirming that it runs through now, great. I will have a look at the "new" bugs and see how things work out here with my stuff. Thanks.

dw added a commit that referenced this issue Mar 19, 2018

issue #110: split process state out of strategy state
The strategy is reconstructed for every playbook that is included or
specified on the command line, therefore we can't store the global
Router there without losing all our SSH connections across playbooks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment