"swarm init --force-new-cluster" ignores advertise-addr flag #523

burner-account · 2018-12-11T16:12:19Z

This is a bug report
I searched existing issues before opening this one

Expected behavior

If admins follow the instructions on backup/restore (https://docs.docker.com/engine/swarm/admin_guide/), they should be able to transfer old swarm data (secrets, ...) to a new swarm.
When using
docker swarm init --force-new-cluster
to do so, admins should be able to expect other flags (e.g. --advertise-addr) to work as well.

Actual behavior

Following the instructions i was able to restore the secrets of an old swarm to the new one.
Although i set the --advertise-addr to the NEW_IP, the swarm initialization script returns a join command as:
docker swarm join --token long-token-string OLD_IP:2377

Manually changing the IP in the join command allows nodes to join the swarm BUT - as the old advertise addr is pushed to the nodes - things enter fringe mode.

mesh routing stops working
stack deployments still work as expected

Steps to reproduce the behavior

1.) enter yourCorpNet 10.0.1.0/24, machine 10.0.1.1
2.) init a swarm, no need to join nodes.
3.) store a swarm secret as marker for state backup/restore test.
4.) backup folder, see https://docs.docker.com/engine/swarm/admin_guide/#back-up-the-swarm

5.) enter yourCorpNet 10.0.2.0/24, machine 10.0.2.1
6.) stop docker
7.) restore folder, see https://docs.docker.com/engine/swarm/admin_guide/#restore-from-a-backup
8.) start docker
9.) docker swarm init --force-new-cluster --advertise-addr 10.0.2.1

Output of docker version:

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:22 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:19:08 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: rsx29bg80yn126lf9g2p5cvyr
 Is Manager: true
 ClusterID: dqfryv63sxml0imlpc9n46jkd
 Managers: 1
 Nodes: 4
 Default Address Pool: 10.0.0.0/8  
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: NEW_IP
 Manager Addresses:
  OLD_IP:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.19.7-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.946GiB
Name: ********
ID: ORLV:D6MD:O4H7:JNVQ:AY3A:L5AO:S7FZ:JKYN:NFKF:CA63:4Q3T:SZAD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

The text was updated successfully, but these errors were encountered:

bmedici · 2018-12-20T16:52:01Z

Same problem here, keeps advertising 10.14.x.x IP which is now unreachable, though I force the right adv ip addr:

docker swarm init --force-new-cluster --advertise-addr=10.16.83.29

same without = sign

docker swarm init --force-new-cluster --advertise-addr 10.16.83.29

Mrzhangxd · 2019-07-24T07:25:28Z

The same problem

yunweizhe11 · 2019-07-24T07:30:43Z

The same problem

sych74 · 2019-08-03T00:35:49Z

It seems like related to the unsolved issue moby/moby#34306.

risyou · 2020-05-19T15:13:48Z

As my workaround, from master node

docker swarm leave -f

Then

docker swarm init --force-new-cluster --advertise-addr 10.x.x.x

reifnir · 2021-03-18T16:28:07Z

As my workaround, from master node

docker swarm leave -f

Then

docker swarm init --force-new-cluster --advertise-addr 10.x.x.x

That isn't a workaround if you care to keep the state of the swarm (secrets, configs, stacks, services, etc.)

dehy · 2021-04-25T17:43:35Z

I'm having the same problem here. The initial swarm was created with the wrong ip. Services were deployed. I cannot destroy the swarm state in the fear of loosing something. The advertise-addr does not seem to work as it seems the old ip is stored in the raft consensus database, and retrieved upon force-new-cluster :(
I cannot add a second manager as the ip advertised is unreachable.

TheWiresharkGuy · 2021-07-23T08:45:57Z

I'm having the same problem here. The initial swarm was created with the wrong ip. Services were deployed. I cannot destroy the swarm state in the fear of loosing something. The advertise-addr does not seem to work as it seems the old ip is stored in the raft consensus database, and retrieved upon force-new-cluster :(
I cannot add a second manager as the ip advertised is unreachable.

Hi, I'm in the same situation, did you manage to solve this? Maybe update the IP address in the raft DB with the docker service stopped?

reifnir · 2021-07-24T03:02:13Z

I was able to restore swarm to a functioning state with high availability with different IP addresses (totally different CIDR range).

I'm almost certain this was all of the steps. The next time I need to do it, I'll test these instructions and write it up more properly. This is to help anyone who's been stuck where I was.

Stand up a single manager node and restore the state onto it. (Calling this Node1)
- The node will now accept calls on the IP address on eth0 as normal.
- Every other node will appear as offline.
- In docker node inspect [itself], it will report that it has the IP address from the node in which the backup was taken.
- You all know this already, just setting context.
Get another node ready to join the swarm. (Calling this Node2)
On Node2, use iptables to direct all traffic from Node1's old IP address, to the new one. It's possible that both 2377 (or whatever port you use) and 2375 don't need to be redirected, but this worked. Ex:

iptables -t nat -A OUTPUT -p tcp -d $OLD_NODE_1_IP --dport 2375 -j DNAT --to-destination $NEW_NODE_1_IP:2375
iptables -t nat -A OUTPUT -p tcp -d $OLD_NODE_1_IP --dport 2377 -j DNAT --to-destination $NEW_NODE_1_IP:2377

Have Node2 join the swarm as a worker.
Wait. You may have to wait 5 or 10 minutes. Just be patient while the manager reports Node2's status as Unknown.
Promote Node2 to manager
Wait. The swarm state doesn't synchronize immediately. Unless you're sure you know how to detect when a new manager has synched all of its state, just wait 10 minutes.
- In the past, I'd messed this part up by immediately demoting the old manager and then giving it the boot from the swarm. The new manager had an incomplete view of swarm state.
- Don't be like Jim. Be patient.

You're can now join other managers and workers without trouble. Just be sure to get rid of that iptables rule. Preferably by killing that node entirely once you have 5 other managers.

HTH

obsidiangroup · 2022-09-21T14:14:57Z

There has been no other fix for this? Having to recreate an entire swarm just to update the advertise-ip is destructive and time consuming. This problem has been around for ages with no fix. It feels like no work is being done to actually solve this issue because there are "work-arounds", though these are not really viable solutions. We should not have to essentially break the swarm (even more than already is due to this), create a new swarm, then join all members to that. But right now, have no choice to do this, which will result in some downtime and just a late night/early morning.

paliok2021 · 2022-11-04T09:41:29Z

Hi, I am facing to problem with moving swarm on different port Data Path Port: 9789. We have created new swarm on port 9789 but all nodes use old 4789. We didnt mention that real communication on ports and only check docker info and this parameter Data Path Port: 9789. It is ok. Now I am trying add new server connected on manager with port 9789 in configuration and new server docker info shows Data Path Port: 9789 and netstat -plun shows 9789 but all servers in swarm still works on default port 4789 and thats way new server cannot communicate with others in swarm. It is very strange and I dont know what was wrong during migration old swarm on port 4789 to new port 9789. Anybody has some experiencie with this strange situation ? It was very hard way move all services,secrets,configs, etc .. to new swarm and everything is the same.

doctorpangloss mentioned this issue Jun 28, 2020

Changing even 1 swarm manager node IP address completely breaks Swarm moby/moby#41043

Closed

acortes-okode mentioned this issue Sep 22, 2022

--data-path-port parameter ignored while executing docker swarm init --force-new-cluster from backup moby/swarmkit#2932

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"swarm init --force-new-cluster" ignores advertise-addr flag #523

"swarm init --force-new-cluster" ignores advertise-addr flag #523

burner-account commented Dec 11, 2018

bmedici commented Dec 20, 2018 •

edited

Loading

Mrzhangxd commented Jul 24, 2019

yunweizhe11 commented Jul 24, 2019

sych74 commented Aug 3, 2019

risyou commented May 19, 2020

reifnir commented Mar 18, 2021

dehy commented Apr 25, 2021

TheWiresharkGuy commented Jul 23, 2021 •

edited

Loading

reifnir commented Jul 24, 2021

obsidiangroup commented Sep 21, 2022

paliok2021 commented Nov 4, 2022

"swarm init --force-new-cluster" ignores advertise-addr flag #523

"swarm init --force-new-cluster" ignores advertise-addr flag #523

Comments

burner-account commented Dec 11, 2018

Expected behavior

Actual behavior

Steps to reproduce the behavior

bmedici commented Dec 20, 2018 • edited Loading

Mrzhangxd commented Jul 24, 2019

yunweizhe11 commented Jul 24, 2019

sych74 commented Aug 3, 2019

risyou commented May 19, 2020

reifnir commented Mar 18, 2021

dehy commented Apr 25, 2021

TheWiresharkGuy commented Jul 23, 2021 • edited Loading

reifnir commented Jul 24, 2021

obsidiangroup commented Sep 21, 2022

paliok2021 commented Nov 4, 2022

bmedici commented Dec 20, 2018 •

edited

Loading

TheWiresharkGuy commented Jul 23, 2021 •

edited

Loading