Skip to content
This repository has been archived by the owner on Jan 23, 2020. It is now read-only.

Cannot SSH into node after VM restart - no agent container #65

Open
vesylapp opened this issue May 3, 2018 · 3 comments
Open

Cannot SSH into node after VM restart - no agent container #65

vesylapp opened this issue May 3, 2018 · 3 comments

Comments

@vesylapp
Copy link

vesylapp commented May 3, 2018

Expected behavior

Node should be accessible via SSH after VM restart

Actual behavior

Node is not accessible via SSH after VM restart

swarm-manager000000:~$ ssh swarm-manager000002

ssh: connect to host swarm-manager000002 port 22: Connection refused

Information

  • Full output of the diagnostics from "docker-diagnose" ran from one of the instance
swarm-manager000000:~$ docker-diagnose
OK hostname=swarm-manager000000 session=1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
OK hostname=swarm-manager000001 session=1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
OK hostname=swarm-manager000002 session=1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
OK hostname=swarm-worker000000 session=1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
OK hostname=swarm-worker000001 session=1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
OK hostname=swarm-worker000002 session=1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
Done requesting diagnostics.
Your diagnostics session ID is 1525372578-WTs5wJ17TPj4xeSY6hyt8strirowLuoR
Please provide this session ID to the maintainer debugging your issue.
  • After the node restart, the agent container is not running:

image

Steps to reproduce the behavior

  1. Go to https://docs.docker.com/docker-for-azure/
  2. Create a swarm (stable channel)
  3. Attempt to SSH into one of the nodes - works OK
  4. Restart that node VM from the Azure portal
  5. Attempt to SSH into the restarted node - fails
@FrenchBen
Copy link
Contributor

@FSLDev Can you look at the boot logs from the VM? Any information there that helps? did the machine join the cluster? If so, you can always target that machine and deploy another ssh container, that you can use.

@vesylapp
Copy link
Author

vesylapp commented May 7, 2018

@FrenchBen

did the machine join the cluster?

Yes.

If so, you can always target that machine and deploy another ssh container, that you can use.

I tried several times but was unable to get another agent container to run correctly. Do you have a docker run incantation that works? I can't find any documentation on how to properly start the agent container.

Without setting a bunch of binds and/or volumes, the container just exits abnormally. I tried to duplicate the env and binds/volumes based on a working agent container and that resulted in a container that appears to run somewhat correctly (sshd starts) but still will not accept incoming SSH connections for some reason.

Any information there that helps?

Before the restart, the boot log is 2704 lines long. After the restart, the boot log only goes to line 463. And, there is an error, /lib/rc/sh/openrc-run.sh: line 250: can't create /sys/fs/cgroup/openrc/diagnostics-server/tasks: nonexistent directory that didn't appear in the log before the restart.

Here is the last bit of the bootlog after the restart.

* Starting DHCP Client Daemon ... [ ok ]
/lib/rc/sh/openrc-run.sh: line 250: can't create /sys/fs/cgroup/openrc/diagnostics-server/tasks: nonexistent directory
 * Starting diagnostics server ... [ ok ]
 * Starting networking ... *   lo ... [ ok ]
 * Initializing random number generator ... [ ok ]
 * Starting busybox acpid ... [ ok ]
 * Running system containerd ... [ ok ]
 * Running system containers ... * [ ok ]
 * Configuring host settings from database ... [ ok ]
 * Starting Docker ...   

@vesylapp
Copy link
Author

any update on this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant