Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX crash complaining of operation `in progress in another thread when fork() was called` #32499

Closed
elnur opened this issue Nov 2, 2017 · 16 comments

Comments

@elnur
Copy link

@elnur elnur commented Nov 2, 2017

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • lookup
ANSIBLE VERSION
ansible 2.4.1.0
  config file = /Users/elnur/proj/foo/ansible.cfg
  configured module search path = [u'/Users/elnur/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.14 (default, Oct  6 2017, 11:26:02) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
CONFIGURATION
RETRY_FILES_ENABLED(/Users/elnur/proj/foo/ansible.cfg) = False

I have system variables set in .bash_profile:

export ANSIBLE_ETCD_URL=http://etcd.cloud.internal:2379
export ANSIBLE_ETCD_VERSION=v2
OS / ENVIRONMENT

Mac OS X High Sierra with latest updates. Ansible is installed using brew.

SUMMARY

Can't do an etcd lookup since upgrading to 2.4. Worked fine before 2.4.

STEPS TO REPRODUCE
- hosts: localhost
  gather_facts: yes

  pre_tasks:
    - debug:
        msg: "{{ lookup('etcd', 'deployment/environment') }}"
EXPECTED RESULTS
PLAY [localhost] **********************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
ok: [localhost]

TASK [debug] **************************************************************************************************************************
ok: [localhost] => {
    "changed": false,
    "msg": "development"
}

PLAY RECAP ****************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0
ACTUAL RESULTS
PLAY [localhost] **********************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
ok: [localhost]

TASK [debug] **************************************************************************************************************************
objc[22402]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[22402]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

Ansible just hangs after outputting that error message.

@nitzmahone

This comment has been minimized.

Copy link
Member

@nitzmahone nitzmahone commented Nov 2, 2017

This is apparently due to some new security changes made in High Sierra that are breaking lots of Python things that use fork(). Rumor has it that adding
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES before your Ansible run should clear it up. The code that's causing issues is well below Ansible in the stack.

@nitzmahone nitzmahone removed the needs_triage label Nov 2, 2017
@nitzmahone nitzmahone changed the title etcd lookup is broken OSX crash complaining of operation `in progress in another thread when fork() was called` Nov 2, 2017
@elnur

This comment has been minimized.

Copy link
Author

@elnur elnur commented Nov 3, 2017

@nitzmahone, the question is why it works fine when I revert to Ansible 2.3. Doesn't it mean it's a regression introduced in 2.4?

@elnur

This comment has been minimized.

Copy link
Author

@elnur elnur commented Nov 5, 2017

It'd be really nice to have this fixed because I need new stuff from 2.4 but I'm stuck with 2.3 because of the bug.

At least acknowledging it's an Ansible regression instead of blaming OS X would be a step in the right direction.

@nitzmahone

This comment has been minimized.

Copy link
Member

@nitzmahone nitzmahone commented Nov 7, 2017

It doesn't appear to be an Ansible regression- 2.3.0 fails for us the exact same way on High Sierra. Are you sure you didn't upgrade to High Sierra between 2.3.x and 2.4.1 (since it was released right around that time)?

This is the best examination of the underlying issue we've found: https://blog.phusion.nl/2017/10/13/why-ruby-app-servers-break-on-macos-high-sierra-and-what-can-be-done-about-it/ - so far the bugs opened on Python core for this issue have been closed (eg https://bugs.python.org/issue30837). Regardless, the issue appears to be well outside the Ansible codebase, and the behavior doesn't appear to have changed in any recent releases, so I don't think there's much we can do.

@s8i

This comment has been minimized.

Copy link

@s8i s8i commented Nov 7, 2017

@nitzmahone export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES helped me get past the initial fork safety issue - thank you

Winrm has issue still
These Windows will ping without pinging via Ansible
Also these Windows machines would also ping via Ansible previous to my High Sierra update

$ ansible all -m ping
descartes | UNREACHABLE! => {
    "changed": false,
    "msg": "ssl: HTTPSConnectionPool(host='descartes', port=5986): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x108f7e510>: Failed to establish a new connection: [Errno 61] Connection refused',))",
    "unreachable": true
}
schrodinger | UNREACHABLE! => {
    "changed": false,
    "msg": "ssl: HTTPSConnectionPool(host='schrodinger', port=5986): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x108d20290>: Failed to establish a new connection: [Errno 61] Connection refused',))",
    "unreachable": true
}
 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

oracle | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
babbage | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
kitchenl1 | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

standup | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
kitchenl2 | UNREACHABLE! => {
    "changed": false,
    "msg": "ssl: HTTPSConnectionPool(host='kitchenl2', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x108f21b10>, 'Connection to kitchenl2 timed out. (connect timeout=30)'))",
    "unreachable": true
}
 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

shannon | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
@elnur

This comment has been minimized.

Copy link
Author

@elnur elnur commented Nov 9, 2017

It doesn't appear to be an Ansible regression- 2.3.0 fails for us the exact same way on High Sierra. Are you sure you didn't upgrade to High Sierra between 2.3.x and 2.4.1 (since it was released right around that time)?

brew lets me switch between different Ansible versions. So I switched back and forth between 2.3.0.0 and 2.4.1.0 and 2.3 works fine while 2.4 fails — both on High Sierra.

@elnur

This comment has been minimized.

Copy link
Author

@elnur elnur commented Nov 9, 2017

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES before your Ansible run should clear it up.

That fixed it for me. I can finally use 2.4. Thanks. 🎉

@mikedlr

This comment has been minimized.

Copy link
Contributor

@mikedlr mikedlr commented Jan 8, 2018

Further comment: this is almost certainly a bug in the python which apple distributes. Apple has changed their fork implementation to disallow forking with active threads. Taking the latest python distributed from python.org seems to fix this error without needing to set the environment variable (tested on python 3.6).

andrewkroh added a commit to andrewkroh/beats-tester that referenced this issue Feb 7, 2018
Add OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES to the environment to
workaround an issue with running Ansible on MacOS 10.13 (High Sierra).

See ansible/ansible#32499
@ansibot ansibot added bug and removed bug_report labels Mar 7, 2018
@scottsb

This comment has been minimized.

Copy link
Contributor

@scottsb scottsb commented Jun 28, 2018

For anybody else who comes across this running Ansible from Vagrant, you can work around this by adding this hack to your Vagrantfile (can be placed anywhere due to how Vagrant works). This allows vagrant up/provision to be run without having to export the variable yourself (especially useful if supporting a team using the same Vagrant config, as I do):

# Temporary workaround to Python bug in macOS High Sierra which can break Ansible
# https://github.com/ansible/ansible/issues/34056#issuecomment-352862252
# This is an ugly hack tightly bound to the internals of Vagrant.Util.Subprocess, specifically
# the jailbreak method, self-described as "quite possibly, the saddest function in all of Vagrant."
# That in turn makes this assignment the saddest line in all of Vagrantfiles.
ENV["VAGRANT_OLD_ENV_OBJC_DISABLE_INITIALIZE_FORK_SAFETY"] = "YES"
@dmrzzz

This comment has been minimized.

Copy link

@dmrzzz dmrzzz commented Sep 8, 2018

Taking the latest python distributed from python.org seems to fix this error without needing to set the environment variable (tested on python 3.6).

I'm admittedly way out of my depth here, but AFAICT the above isn't as reassuring as it seems. Discussion in https://bugs.python.org/issue33725 suggests that

  • the reason we don't get this error using the python.org installers is because those installers aren't (yet) being built with a 10.13 SDK

  • the ultimate resolution is for framework and application code to either avoid or be very careful with os.fork()

If that's true, maybe this should be solvable in the Ansible codebase after all?

FWIW, I do see the error using a shiny new Python 3.7.0 from homebrew (unless I apply the environment variable workaround).

@mikedlr

This comment has been minimized.

Copy link
Contributor

@mikedlr mikedlr commented Sep 14, 2018

Trying to remember what this bug is about I came across this article which gives a good explanation.

Essentially, to safely support threaded applications we should eliminate all calls to fork except a) when immediately followed by an exec() b) at the very start of a program when creating a special worker which can then do subsequent forks but which never runs threads. Unfortunately "threaded applications" means all applications because there are now many libraries which privately embed threading.

donovanmuller added a commit to absa-subatomic/local-hadron-collider that referenced this issue Sep 25, 2018
* Verified working on OpenShift 3.10 cluster
* Added workaround for ansible/ansible#32499
* Tweaked Bitucket resource usage
@Sohair63

This comment has been minimized.

Copy link

@Sohair63 Sohair63 commented Nov 10, 2018

I upgraded my system to Mojave

  • rails - 5.1.3
  • ruby - 2.3.3

I am getting this error which is not resolved by export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

2.3.3 :002 > User.first
objc[56844]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[56844]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

exited rails console
@swaathi

This comment has been minimized.

Copy link

@swaathi swaathi commented Dec 7, 2018

The export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES fix does not work in Mojave.

Did you get it to work @Sohair63?

@mikedlr

This comment has been minimized.

Copy link
Contributor

@mikedlr mikedlr commented Dec 7, 2018

@swaathi please could you open a new bug report with full information - especially all software versions (MacOS,Python,Ansible etc), the exact playbook you are using that triggers the crash and exactly how you run it. I'll upgrade soon and then attempt to debug if I can reproduce. Put a link to your new bug from #49207

N.B. you have to set the environment variable in the shell before you launch the ansible process - it's not enough to set it in the ansible process. That has occasionally been a problem with debugging this situation.

@Sohair63

This comment has been minimized.

Copy link

@Sohair63 Sohair63 commented Dec 7, 2018

@swaathi Yes it is working fine for me now, but not sure how it got fixed.
maybe recent updates of Xcode fixed it.

@mera12

This comment has been minimized.

Copy link

@mera12 mera12 commented Dec 7, 2018

since I upgraded my system to Mojave
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES fix was working I had to put it through more than once though, but now it completely stopped working.
rails -5.2.1
ruby -2.3.3

@ansible ansible locked and limited conversation to collaborators Dec 7, 2018
@dagwieers dagwieers added the macos label Jan 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
You can’t perform that action at this time.