Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CallError: exceptions.KeyError: u'namelist' #575

Open
thbar opened this Issue Apr 2, 2019 · 12 comments

Comments

Projects
None yet
2 participants
@thbar
Copy link

commented Apr 2, 2019

  • Error (short version):
TASK [ANXS.postgresql : PostgreSQL | Make sure the PostgreSQL users are present] ***********************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was:     'paths': '\n    '.join(paths),
fatal: [the-host]: FAILED! => 
  msg: Unexpected failure during module execution.
  stdout: ''
  • Ansible version: 2.5.14 (not patched nor running custom modules)

  • Mitogen: v0.2.6 (I did not run master, but checked the diff and did not notice anything that would fix this I think - but can try later)

  • Idea of what the underlying problem may be:

I believe an error occurs while Mitogen tries to build an exception error message here:

raise IOError(MAKE_TEMP_FAILED_MSG % {
'paths': '\n '.join(paths),
})

The paths key is provided, but the interpolated string expects namelist:

MAKE_TEMP_FAILED_MSG = (
u"Unable to find a useable temporary directory. This likely means no\n"
u"system-supplied TMP directory can be written to, or all directories\n"
u"were mounted on 'noexec' filesystems.\n"
u"\n"
u"The following paths were tried:\n"
u" %(namelist)s\n"
u"\n"
u"Please check '-vvv' output for a log of individual path errors."
)

  • Host: pipenv 9.0.1, Python 2.7.14, Mac OS X 10.14.3

  • Target: Python 2.7.12, Ubuntu 16.04.4 LTS

  • If reporting a crash or hang in Ansible, please rerun with -vvv and include
    200 lines of output around the point of the error, along with a full copy of
    any traceback or error text in the log. Beware "-vvv" may include secret
    data! Edit as necessary before posting.

Sorry I do not have (at this point) the time required to properly edit and anonymize. I'll still post what I have (anonymized short version which I got with -vvv):

The full traceback is:
Traceback (most recent call last):
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 104, in run
    item_results = self._run_loop(items)
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 334, in _run_loop
    res = self._execute(variables=task_vars)
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 566, in _execute
    result = self._handler.run(task_vars=variables)
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/mixins.py", line 116, in run
    return super(ActionModuleMixin, self).run(tmp, task_vars)
  File "/Users/thbar/.local/share/virtualenvs/ansible-the-client-NS8dto-N/lib/python2.7/site-packages/ansible/plugins/action/normal.py", line 46, in run
    result = merge_hash(result, self._execute_module(task_vars=task_vars, wrap_async=wrap_async))
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/mixins.py", line 356, in _execute_module
    self._connection._connect()
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/connection.py", line 721, in _connect
    self._connect_stack(stack)
  File "/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/connection.py", line 675, in _connect_stack
    stack=mitogen.utils.cast(list(stack)),
  File "/Volumes/TheClient/mitogen-0.2.6/mitogen/core.py", line 1859, in call_service
    return recv.get().unpickle()
  File "/Volumes/TheClient/mitogen-0.2.6/mitogen/core.py", line 835, in unpickle
    raise obj
CallError: exceptions.KeyError: u'namelist'
  File "<stdin>", line 3107, in _dispatch_one
  File "master:/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/target.py", line 383, in init_child
    good_temp_dir = find_good_temp_dir(candidate_temp_dirs)
  File "master:/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/target.py", line 331, in find_good_temp_dir
    'paths': '\n    '.join(paths),


fatal: [the-host]: FAILED! => 
  msg: Unexpected failure during module execution.
  stdout: ''
  • If reporting any kind of problem with Ansible, please include the Ansible
    version along with output of "ansible-config dump --only-changed".

(anonymized)

pipenv run ansible-config dump --only-changed
ANSIBLE_PIPELINING(/Volumes/TheClient/ansible-the-client/ansible.cfg) = True
DEFAULT_HOST_LIST(/Volumes/TheClient/ansible-the-client/ansible.cfg) = [u'/Volumes/TheClient/ansible-the-client/inventory']
DEFAULT_LOAD_CALLBACK_PLUGINS(/Volumes/TheClient/ansible-the-client/ansible.cfg) = True
DEFAULT_ROLES_PATH(/Volumes/TheClient/ansible-the-client/ansible.cfg) = [u'/Volumes/TheClient/ansible-the-client/roles', u'/Volumes/TheClient/ansible-the-client/custom_roles']
DEFAULT_STDOUT_CALLBACK(/Volumes/TheClient/ansible-the-client/ansible.cfg) = yaml
DEFAULT_STRATEGY(/Volumes/TheClient/ansible-the-client/ansible.cfg) = mitogen_linear
DEFAULT_STRATEGY_PLUGIN_PATH(/Volumes/TheClient/ansible-the-client/ansible.cfg) = [u'/Volumes/TheClient/mitogen-0.2.6/ansible_mitogen/plugins/strategy']
DEFAULT_VAULT_IDENTITY_LIST(/Volumes/TheClient/ansible-the-client/ansible.cfg) = ['vault-the-client-staging', 'vault-the-client-production', 'vault-vagrant']
@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

I understand that fixing this very error will only fix the error reporting, not the underlying issue, which is related to tmp folder handling!

dw added a commit that referenced this issue Apr 2, 2019

@dw

This comment has been minimized.

Copy link
Owner

commented Apr 2, 2019

So annoying! I have 'fixed' this stupid exception text at least 3 times now :) Change will be on master in ~20 minutes. Thanks a ton for reporting!

@dw

This comment has been minimized.

Copy link
Owner

commented Apr 2, 2019

I'd be curious to know what is wrong with your machine to cause the error. Presumably you are not receiving it when running under regular Ansible? If so, that's a bug

@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

@dw you welcome on the reporting 😄 no problem - mitogen is making my life significantly better than it's totally worth helping at improving it a bit!

I'm definitely not getting the error when mitogen is disabled.

The interesting thing, though, is that this does not happen for all the target machines, only for a few.

I have an idea about what may be the culprit. I'll investigate today and report back.

@dw

This comment has been minimized.

Copy link
Owner

commented Apr 2, 2019

Thanks for investigating. Things to look out for:

  • the fixed error message path list is missing some obvious location (which would be visible with ANSIBLE_DEBUG=1 output when running with vanilla)
  • the list shows some var or tilde-expanded path where the var or tilde is expanded in the wrong context (e.g. $HOME -> /root rather than /home/sshuser etc)

Travis is being annoying -- one of the jobs is stick despite restarting it. The fix is on issue575 branch if you don't want to wait ;)

dw added a commit that referenced this issue Apr 2, 2019

Merge remote-tracking branch 'origin/issue575'
* origin/issue575:
  issue #574: fix ISSUE_TEMPLATE link
  issue #575: fix exception text rendering
@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

On the 2 failing hosts, there is indeed a problem to find an non-noexec folder where mitogen would be able to work, and this is caused by the setup.

Rather than changing this (we're trying to move away from exec in some places), I'm now looking for ways to provide a specific folder for tmp here (in a place where I know things will work, e.g. /home/the-ansible-user/tmp).

Is there a way to achieve this @dw ? If you have an idea, please let me know! (I'll dig into the code too).

Also (sidenote), I'm kind of surprised that the beginning of this method will create not the temp dir itself, but the place where the temp dir seems to be expected (and one of these folders is /home/the-ansible-user directly):

def is_good_temp_dir(path):
"""
Return :data:`True` if `path` can be used as a temporary directory, logging
any failures that may cause it to be unsuitable. If the directory doesn't
exist, we attempt to create it using :func:`os.makedirs`.
"""
if not os.path.exists(path):
try:
os.makedirs(path, mode=int('0700', 8))
except OSError:
e = sys.exc_info()[1]
LOG.debug('temp dir %r unusable: did not exist and attempting '
'to create it failed: %s', path, e)
return False
try:
tmp = tempfile.NamedTemporaryFile(
prefix='ansible_mitogen_is_good_temp_dir',
dir=path,

It won't do much harm, but I would rather not see a folder like /home/the-ansible-user attempted to be created. Is it done on purpose?

@dw

This comment has been minimized.

Copy link
Owner

commented Apr 2, 2019

The makedirs logic is to approximate the handling of the standard ansible.module_utils, which attempts to create ~/.ansible/tmp by default. I'm very happy to tighten this up so we only ever try to makedirs the same directories as Ansible. Mitogen and Ansible differ heavily in temp file handling, Mitogen tries to have only a single location for the duration of the run, whereas Ansible creates up to 3 directories for every task.

The noexec check was added to avoid picking a filesystem where running non-Python Ansible modules (e.g. written in Go, bash or perl) would fail. I don't think Ansible has any similar check, which means we could potentially just remove the check, or move it into runner.py where it prints a descriptive error if a program run fails.

Unfortunately bug #321 makes no reference to why the noexec check was added. It's quite a specific check, and I'm not sure I added it 'simply because'. It might have been due to an issue reported via IRC.

I'm tempted to simply disable the noexec check and wait for bug reports :)

@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

Thanks for the context - it's rich, as expected for such a project 😄

At least on my setups, if I remove the noexec check, things just run smoothly...

I wonder why it was added, too, in the first place, and I certainly don't want to rush you into changing this too hastily, yet waiting for bug reports could be a good way to figure out the why.

@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

Re-reading #321, I wonder if you could find a way to avoid trying to replicate exactly what Ansible does, because it seems to be sooo complicated (but I realise this is a naive view, as a total newcomer to the project). I have not pondered the implications, though!

@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

One last note for today: if I focus on the failing hosts, and disable the noexec check, I notice that for each host, 2 different tmp paths are mentioned in the logs:

$ pipenv run ansible-playbook build-all.yml --tags focus --diff --check --limit sv-tca-fluite99,sv-tca-geoite03 -vvv | grep "Selected"
[mux  25184] 17:30:13.473724 D mitogen.ctx.ssh.$$FIRST-IP$$: ansible_mitogen.target: Selected temp directory: u'/home/deploy/.ansible/tmp' (from [u'/home/deploy/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux  25184] 17:30:16.412597 D mitogen.ctx.ssh.$$FIRST-IP$$.sudo.postgres: ansible_mitogen.target: Selected temp directory: u'/var/lib/postgresql/.ansible/tmp' (from [u'/var/lib/postgresql/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux  25184] 17:30:19.125581 D mitogen.ctx.ssh.$$SECOND-IP$$: ansible_mitogen.target: Selected temp directory: u'/home/deploy/.ansible/tmp' (from [u'/home/deploy/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])
[mux  25184] 17:30:21.784794 D mitogen.ctx.ssh.$$SECOND-IP$$.sudo.postgres: ansible_mitogen.target: Selected temp directory: u'/var/lib/postgresql/.ansible/tmp' (from [u'/var/lib/postgresql/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp/user/114', '/tmp/user/114', '/tmp/user/114', '/tmp', '/var/tmp', '/usr/tmp', '/home/deploy'])

It must be noted that for a given host, the first line of log will mention /home/deploy/.ansible/tmp, whereas the second one (called from I think the ANXS.postgresql role), will attempt to use /var/lib/postgresql/.ansible/tmp.

On both hosts, this second folder would not be authorized with the noexec check, because on those two machines, /var is noexec.

So well - not sure if this is useful at this point, but mentioning this in case it helps later.

@dw

This comment has been minimized.

Copy link
Owner

commented Apr 2, 2019

Temp file handling is a complete nightmare :) It has been rewritten at least 3 or 4 times already, and every tweak breaks some install somewhere, or some older version of Ansible. It would be much better if no temp files were used at all (as in the original prototype), but far too many modules expect a temp dir to exist.

For example with Ansible, when pipelining=False and become=true, Ansible still creates temp files in the SSH login account and sets perms so the become account can read them. Mitogen always keeps temp files within the target account -- which meant the become user homedir must be writeable, breaking some installs. That's why Mitogen has quite a huge list of candidate temp dirs -- it is known in some cases to break working temp dir setups, so we try hard to find any working configuration rather than error

Re: your last comment, sorry, it is one temp dir per account, not per run.

@thbar

This comment has been minimized.

Copy link
Author

commented Apr 2, 2019

Got you! Happy to provide more testing if needed, just hit me up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.