Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Running AWX Job SSH Fails #130

Closed
knechtionscoding opened this issue Sep 12, 2017 · 9 comments
Closed

When Running AWX Job SSH Fails #130

knechtionscoding opened this issue Sep 12, 2017 · 9 comments

Comments

@knechtionscoding
Copy link

ISSUE TYPE
  • Bug Report
COMPONENT NAME

Jobs

SUMMARY

After putting a basic YUM Update playbook into AWX, configuring inventory, running the playbook returns a "connection refused failed to connect to new control master.

ENVIRONMENT
STEPS TO REPRODUCE

Add yum job to new AWX install. Run job based on template.

EXPECTED RESULTS

Yum update.

ACTUAL RESULTS

fatal: Unreachable! => failed to connect to the host via ssh: Control Socket Connect: Connect Refused. Failed to connect to new control master.

ADDITIONAL INFORMATION

I've confirmed that the host is reachable via SSH. I've confirmed that AWX is reaching the machine ( I changed the port in SSHD on the target host and I got the correct ssh error.) I've tested the adding:

ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s as a variable.

I'm curious if I am missing something somewhere in my ansible config that allows Docker to SSH properly? SELINUX is turned off.

@matburt
Copy link
Member

matburt commented Sep 12, 2017

I'll be honest, I'm not sure... if you exec into the container can you connect directly from the shell there?

@knechtionscoding
Copy link
Author

Yes. Inside of the aws_task container I can ssh to all the servers correctly.

@matburt
Copy link
Member

matburt commented Sep 12, 2017

That's pretty strange stuff. It all seems to be working here with our default settings. If you can find more information to help us reproduce it then that'd be helpful.

@knechtionscoding
Copy link
Author

I'll try rebuilding on a new machine with the latest version and see if that fixes it. Closing this for now.

@techraf
Copy link

techraf commented Oct 9, 2017

I'm running into the same issue now with a freshly installed AWX.

ssh from the container works ok. AWX tasks fail, ansible from inside the container fails too.

This is a trace of a manual run:

Failed to connect to the host via ssh: OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Stale control socket /root/.ansible/cp/2dcfc37e77, unlinking
debug2: resolving \"10.20.2.21\" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to 10.20.2.21 [10.20.2.21] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.

... manual cut ...

debug1: Enabling compression at level 6.
debug1: Authentication succeeded (publickey).
Authenticated to 10.20.2.21 ([10.20.2.21]:22).
debug1: setting up multiplex master socket
debug3: muxserver_listen: temporary control path /root/.ansible/cp/2dcfc37e77.deaNouoPlmV991XZ
debug2: fd 4 setting O_NONBLOCK
debug3: fd 4 is O_NONBLOCK
debug3: fd 4 is O_NONBLOCK
debug1: channel 0: new [/root/.ansible/cp/2dcfc37e77]
debug3: muxserver_listen: mux listener channel 0 fd 4
debug2: fd 3 setting TCP_NODELAY
debug3: ssh_packet_set_tos: set IP_TOS 0x08
debug1: control_persist_detach: backgrounding master process
debug2: control_persist_detach: background process is 2097
Control socket connect(/root/.ansible/cp/2dcfc37e77): Connection refused
Failed to connect to new control master

AWX uses different path:

Failed to connect to the host via ssh: Control socket connect(/tmp/awx_17_yhnYV5/cp/10.20.2.2122awx): Connection refused
Failed to connect to new control master

@innossh
Copy link

innossh commented Oct 17, 2017

I got the same error with the latest AWX a few days ago. I guess it's the multiplex SSH connections problem in docker container.
As a workaround, I set ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=/dev/shm/cp%%h-%%p-%%r on /etc/ansible/ansible.cfg in awx_web/awx_task docker containers, then it was successful to run a playbook on AWX.

@matburt Is there any way to overwrite /etc/ansible/ansible.cfg in the containers by group_vars or such vars?

@techraf
Copy link

techraf commented Oct 22, 2017

@innossh I can see this is coming from the docs, but adding -o ControlPath=/dev/shm/cp%%h-%%p-%%r might result in a disaster. It works for a single job at a time running against a single target, but effectively it creates a socket under the name:

srw-------. 1 root root 0 Oct 22 12:41 cp%h-%p-%r

if two jobs are concurrent, or a play runs against multiple targets, all plays will run on a randomly selected target. As long as credentials permit, nothing stops Ansible from running random plays against non-intended machines.

It either should be -o ControlPath=/dev/shm/cp%h-%p-%r or ssh_args = -C.

@innossh
Copy link

innossh commented Oct 23, 2017

ya, it's stupid mistake. Although it should be double percent sign %% in control_path of ansible.cfg.
So it should be ssh_args = -C -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=/dev/shm/cp%h-%p-%r.

@xzy256
Copy link

xzy256 commented Feb 9, 2022

I got same error -- "Connection refused " when i run command. It's fuzzy, Running a command success in centos8, but ubuntu-20.04 failed, I change ssh_args value refer as @innossh, it was success.
as if it's a bug.
ansible: 2.9.18
awx: 17.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants