Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

advise updating controlpath settings when ssh throws 'unix domain socket "too long"' error #11536

Closed
lukehoersten opened this issue Jul 9, 2015 · 66 comments

Comments

@lukehoersten
Copy link

@lukehoersten lukehoersten commented Jul 9, 2015

ISSUE TYPE

Feature Idea

COMPONENT NAME

ssh control persist

ANSIBLE VERSION

2.0

SUMMARY

When trying to use the ec2 plugin, ssh fails with this error:

SSH Error: unix_listener: "/Users/luke/.ansible/cp/ansible-ssh-ec2-255-255-255-255.compute-1.amazonaws.com-22-ubuntu.CErvOvRE5U0urCgm" too long for Unix domain socket

Here's the full example:

$ ansible -vvvv -i ec2.py -u ubuntu us-east-1 -m ping
<ec2-255-255-255-255.compute-1.amazonaws.com> ESTABLISH CONNECTION FOR USER: ubuntu
<ec2-255-255-255-255.compute-1.amazonaws.com> REMOTE_MODULE ping
<ec2-255-255-255-255.compute-1.amazonaws.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/Users/luke/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 ec2-255-255-255-255.compute-1.amazonaws.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1436458336.4-21039895766180 && chmod a+rx $HOME/.ansible/tmp/ansible-tmp-1436458336.4-21039895766180 && echo $HOME/.ansible/tmp/ansible-tmp-1436458336.4-21039895766180'
ec2-255-255-255-255.compute-1.amazonaws.com | FAILED => SSH Error: unix_listener: "/Users/luke/.ansible/cp/ansible-ssh-ec2-255-255-255-255.compute-1.amazonaws.com-22-ubuntu.CErvOvRE5U0urCgm" too long for Unix domain socket
    while connecting to 255.255.255.255:22
It is sometimes useful to re-run the command using -vvvv, which prints SSH debug output to help diagnose the issue.

I've changed some of the sensitive info in here like the IP etc.

@lukehoersten
Copy link
Author

@lukehoersten lukehoersten commented Jul 9, 2015

Added this to my ansible config to shorten the path:

[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r

Might be useful to include that in the error output or do something else more graceful instead of failing.

@bcoca bcoca changed the title ec2 ssh unix domain socket "too long" advise updating controlpath settings when ssh throws 'unix domain socket "too long"' error Jul 10, 2015
@firemanxbr
Copy link

@firemanxbr firemanxbr commented Aug 13, 2015

for me same error! I agree with LukeHoersten in this fix.

@johnhamelink
Copy link
Contributor

@johnhamelink johnhamelink commented Aug 14, 2015

Thanks for pointing your solution out @lukehoersten

@lukehoersten
Copy link
Author

@lukehoersten lukehoersten commented Aug 14, 2015

No problem. Hopefully we can get a more solid fix in there. It's bad for newcomers especially.

@IkeLutra
Copy link

@IkeLutra IkeLutra commented Sep 7, 2015

The ansible config has another commented out suggestion
control_path = %(directory)s/%%h-%%r

But yes a help message would be useful.

@DJHoltkamp
Copy link

@DJHoltkamp DJHoltkamp commented Sep 23, 2015

I just hit this as well. I'm new and wasted huge amounts of time. Thanks for the answer! And I agree, needs to be fixed.

@mieciu
Copy link

@mieciu mieciu commented Oct 2, 2015

I also 👍 for this feature.

Faced that today. Thanks for the hints on ansible.cfg !!

@hf16136
Copy link

@hf16136 hf16136 commented Oct 6, 2015

Editing control_path does not work on Mac OSX El Capitan.

@deyvsh
Copy link
Contributor

@deyvsh deyvsh commented Oct 7, 2015

This works for me in El Capitan:

[ssh_connection]
control_path = %(directory)s/%%h-%%r

As @willotter pointed out, it's one of the commented out statements in https://raw.githubusercontent.com/ansible/ansible/devel/examples/ansible.cfg

Interested to know why it's an issue - since when are long pathnames a problem outside Windows?

@liul85
Copy link

@liul85 liul85 commented Oct 8, 2015

this works for me after upgrading to EI Capitan.

[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r
@cswarth
Copy link

@cswarth cswarth commented Oct 21, 2015

@deyvsh why it's an issue - since when are long pathnames a problem outside Windows?

Since El Capitan was released by Apple. Aside from a page in Chinese, this is the only page that seems to reference this new behavior in MacOS. I ran into the same issue when trying to use Tramp mode in emacs which allows transparent access to remote files via ssh. Same error about long file names for a unix domain socket, but not as easy to workaround as in Ansible.

@IkeLutra
Copy link

@IkeLutra IkeLutra commented Oct 21, 2015

@cswarth The ansible config is just passed to your ssh client. You might be able to set up a control_path in your ssh config file ~/.ssh/config like this:

Host *
  ControlPath /tmp/%r@%h:%p

I don't have Mac OS X so I can't test this but this should work unless emacs passes any specific parameters through to SSH.

@emcniece
Copy link

@emcniece emcniece commented Oct 27, 2015

@willotter I had to adapt this idea and add it to my ansible.cfg file to get it to work.

[ssh_connection]
control_path = /tmp/%%h-%%p-%%r

2017 update: looks like @willotter no longer exists :(

@madrobby
Copy link

@madrobby madrobby commented Oct 29, 2015

@lukehoersten Thanks for this, fixed the issue for me!

@isotopp
Copy link

@isotopp isotopp commented Nov 2, 2015

The root cause for this is at

https://github.com/openssh/openssh-portable/blob/9ada37d36003a77902e90a3214981e417457cf13/misc.c#L1070

int
unix_listener(const char *path, int backlog, int unlink_first)
{
    struct sockaddr_un sunaddr;
    int saved_errno, sock;

    memset(&sunaddr, 0, sizeof(sunaddr));
    sunaddr.sun_family = AF_UNIX;
    if (strlcpy(sunaddr.sun_path, path, sizeof(sunaddr.sun_path)) >= sizeof(sunaddr.sun_path)) {
        error("%s: \"%s\" too long for Unix domain socket", __func__,
            path);
        errno = ENAMETOOLONG;
        return -1;
    }

To know the limit (sizeof(sunaddr.sun_path)), we need to look at https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man4/unix.4.html

           struct sockaddr_un {
                   u_char  sun_len;
                   u_char  sun_family;
                   char    sun_path[104];
           };

The path is limited to 104 characters including the \0 terminator.

This is also being discussed in https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing#Manually_Establishing_Multiplexed_Connections which also suggests you are using

Starting with 6.7, the combination of %r@%h:%p and variations on it can be replaced with %C which by itself generates a hash from the concatenation of %l%h%p%r.

In the end, you want to use

[ssh_connection]
control_path = %(directory)s/%%C

Also, you want to stay the fuck out of /tmp or any other world-writeable, world-readable location, because security.

See also http://pastebin.com/ugXKMFsv

@lukehoersten
Copy link
Author

@lukehoersten lukehoersten commented Nov 2, 2015

@isotopp good suggestions. I wonder why we don't just change the default to control_path = %(directory)s/%%C to avoid future issues.

@isotopp
Copy link

@isotopp isotopp commented Nov 2, 2015

@lukehoersten I think ansible should change the default, too. In fact, I did

[:~] $ grep -i control ~/.ssh/config
ControlMaster auto
ControlPath ~/.ssh/_%C

Ping @bcoca - see analysis and proposed changes above.

@lukehoersten
Copy link
Author

@lukehoersten lukehoersten commented Nov 2, 2015

+1

@bcoca
Copy link
Member

@bcoca bcoca commented Nov 2, 2015

because it would not work on many many OSs/distros that run even slightly older versions of openssh

@isotopp
Copy link

@isotopp isotopp commented Nov 2, 2015

Proposed change in http://pastebin.com/ugXKMFsv changes docs and comments only. Will work with old versions of openssh, but make pointer to %C more obvious.

@allisonfong
Copy link

@allisonfong allisonfong commented Nov 12, 2015

I have a long username on my machine (11 characters), this caused my directory to go over the character limit.

https://github.com/ansible/ansible/blob/devel/examples/ansible.cfg#L216-L225

I dropped the -%%r and it solved this problem for me.

@srt32
Copy link

@srt32 srt32 commented Nov 12, 2015

👍 to #11536 (comment)

zachmullen added a commit to girder/covalic that referenced this issue Nov 30, 2015
@jimi-c jimi-c removed the P3 label Dec 7, 2015
@makmanalp
Copy link
Contributor

@makmanalp makmanalp commented Dec 21, 2015

I hit this error today because instead of my inventory file, I supplied my group_vars file and ansible happily parsed the encrypted file somehow and accepted something like 182937891273891723981723891723987189237189237981273981 as the hostname. SSH also didn't think that was weird before it noticed the long ControlPath. A warning for posterity - run everything with -vvvv and make sure you're pointing to the right host and all that.

leseb added a commit to ceph/ceph-ansible that referenced this issue May 11, 2017
Default ansible control_path option is too long, so we shorten it by
changing the ansible.cfg file.

For more info see: ansible/ansible#11536
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447569

Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to ceph/ceph-ansible that referenced this issue May 15, 2017
Default ansible control_path option is too long, so we shorten it by
changing the ansible.cfg file.

For more info see: ansible/ansible#11536
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447569

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 0e931d2)
Signed-off-by: Sébastien Han <seb@redhat.com>
guits added a commit to ceph/ceph-ansible that referenced this issue May 17, 2017
Default ansible control_path option is too long, so we shorten it by
changing the ansible.cfg file.

For more info see: ansible/ansible#11536
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447569

Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
guits added a commit to ceph/ceph-ansible that referenced this issue May 17, 2017
Default ansible control_path option is too long, so we shorten it by
changing the ansible.cfg file.

For more info see: ansible/ansible#11536
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447569

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 0e931d2)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
leseb added a commit to ceph/ceph-ansible that referenced this issue May 22, 2017
Default ansible control_path option is too long, so we shorten it by
changing the ansible.cfg file.

For more info see: ansible/ansible#11536
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447569

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 0e931d2)
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to ceph/ceph-ansible that referenced this issue May 22, 2017
Default ansible control_path option is too long, so we shorten it by
changing the ansible.cfg file.

For more info see: ansible/ansible#11536
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447569

Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 0e931d2)
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
@thefourtheye
Copy link

@thefourtheye thefourtheye commented Aug 10, 2017

My server has this problem and I don't have permissions to change it. How can I solve it in the client's end?

@antoineco
Copy link
Contributor

@antoineco antoineco commented Aug 10, 2017

@thefourtheye it's purely a client problem, not a server problem. You can find the option to set in your ansible.cfg file earlier in this thread.

@thefourtheye
Copy link

@thefourtheye thefourtheye commented Aug 10, 2017

@antoineco Oh, thank you. I am totally new to ansible and I don't even have it installed in my machine. Still having the file ansible.cfg in the home directory would work?

@dzungpv
Copy link

@dzungpv dzungpv commented Aug 10, 2017

I have the same problem, i try all solution include add config file .ansible.cfg in ~/:
[defaults] inventory=/etc/ansible/hosts [ssh_connection] control_path=%(directory)s/%%h-%%r control_path_dir=~/.ansible/cp

And add know host and ip to ssh known_hosts. But it is still not work, it is ubuntu on EC2.
This is the error:

fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Warning: Permanently added 'ec2-xx-192-174-42.ap-northeast-1.compute.amazonaws.com,xx.192.174.42' (ECDSA) to the list of known hosts.\r\nunix_listener: \"/Users/name/.ansible/cp/ec2-xx-192-174-42.ap-northeast-1.compute.amazonaws.com-ubuntu.1fndG2vtHPliheeZ\" too long for Unix domain socket\r\n", "unreachable": true

@akostadinov
Copy link

@akostadinov akostadinov commented Aug 10, 2017

You're not using the proposed solution which is control_path = %(directory)s/%%C.

@dzungpv
Copy link

@dzungpv dzungpv commented Aug 10, 2017

@akostadinov Thanks you, it work. Too much solution here.

@emcniece
Copy link

@emcniece emcniece commented Aug 10, 2017

Too much solution here.

If only it was harder... curse those solution providers!

@thefourtheye
Copy link

@thefourtheye thefourtheye commented Aug 10, 2017

I tried adding all the lines suggested here in the ~/ansible.cfg file in my location machine, but it hasn't helped. I am giving up.

What works for me now, is getting the IP address of the machine with nslookup and logging in with that.

@akostadinov
Copy link

@akostadinov akostadinov commented Aug 11, 2017

@thefourtheye , I'm not sure how many "lines suggested" you see here. Use the post with 50+ likes. But besides proper option you need to use a configuration file that ansible knows about. In your case ~/.ansible.cfg. Try to pay attention to details, dot in front of user config file is a common unix convention.

@thefourtheye
Copy link

@thefourtheye thefourtheye commented Aug 11, 2017

@akostadinov I am sorry, that was a typo. This is how it looks like

➜  ~ cat ~/.ansible.cfg
[ssh_connection]
control_path = %(directory)s/%%h-%%p-%%r
@jayenashar
Copy link

@jayenashar jayenashar commented Aug 11, 2017

I just want to chime in with my .ansible.cfg:

[ssh_connection]
control_path = /tmp/control_%%l_%%h_%%p_%%r

for me, directory was something ridiculously long, the latter part was just the straw that broke the camel's back. Also I have this in my .ssh/config so I can reuse the same connection:

ControlMaster                    auto
ControlPath                      /tmp/control_%l_%h_%p_%r
@ssbarnea
Copy link
Contributor

@ssbarnea ssbarnea commented Aug 11, 2017

Sorry but hardcoded tmp is not only not portable but also a serious security risk. For good reasons MacOS does not allow users to write to /tmp and provides isolated (private) tmp folders for each user.

Tmp would work only if you use OS provided tmp path, something like %(tmp)s ... after patching ansible.

@akostadinov
Copy link

@akostadinov akostadinov commented Aug 11, 2017

Guys, please read existing comments, it's ridiculous everybody to come ask the same thing and somebody to add same solution. Use proper config file and see #11536 (comment).

Somebody, please close the thread to avoid further spam.

@jayenashar
Copy link

@jayenashar jayenashar commented Aug 11, 2017

@ssbarnea harcoded anything is not portable... that's why it's not the default in ansible... not sure i agree about the security issue or macOS issue since /tmp is sticky and openssh uses a sensible mode (0600) for these files.

regarding the solution using %C that requires a recent openssh...

@ssbarnea
Copy link
Contributor

@ssbarnea ssbarnea commented Aug 11, 2017

I don't really care about ancient ssh versions, especially on the ansible controller. In order to evolve we need to let few things behind and in this case is not really a big deal because those affected could change config in order to be able to continue to use it.

I think is essential for the Ansible user experience (UX), to provide defaults that will suit most users, minimizing the need for change. I doubt that we have more than 1-2% of users using versions of open openssh that does not support %C.

I think that we need to implement in Ansible few critical INI variables ASAP because every other week we encounter bugs that are caused by lack of them: %(tmpdir)sm $(configdir)s, %(inventorydir)s.

If we have these people would be able to create reliable relative paths.

Sadly, in my case the problem is even worse because we are using Ansible as part of CI and because like many we have multiple Jenkins nodes on the same machine, running under same user we did encounter ssh session highjacking quite often. Anyway my problem is more complex and outside the scope of this ticket.

@jctanner
Copy link
Member

@jctanner jctanner commented Aug 11, 2017

I fixed this problem in a generic way for all versions of ssh 6 months ago. If anyone is seeing the problem with Ansible 2.3+, it is because you have set a custom control path in ansible.cfg instead of leaving it blank.

ac78347

https://github.com/ansible/ansible/blob/devel/examples/ansible.cfg#L360-L367

# The path to use for the ControlPath sockets. This defaults to a hashed string of the hostname, 
# port and username (empty string in the config). The hash mitigates a common problem users 
# found with long hostames and the conventional %(directory)s/ansible-ssh-%%h-%%p-%%r format. 
# In those cases, a "too long for Unix domain socket" ssh error would occur.
#
# Example:
# control_path = %(directory)s/%%h-%%r
#control_path =

Since this conversation keeps continuing without referencing the patch above, I am going to lock it. If you have further questions about the topic, please use the mailing list.

@ansible ansible locked and limited conversation to collaborators Aug 11, 2017
@ansibot ansibot added feature and removed feature_idea labels Mar 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can’t perform that action at this time.