New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker -H tcp://<swarm-ip>:<swarm-port> info only sees 1 node whereas swarm list sees whole cluster #1467

Closed
barbarello opened this Issue Nov 30, 2015 · 10 comments

Comments

Projects
None yet
5 participants
@barbarello

barbarello commented Nov 30, 2015

I've setup a 3 node cluster each running docker, swarm and consul. All nodes are identical in that they run the same version for the docker,swarm, consul stack, including same base ubuntu image:

  1. Here are the join cmds on each node:
    $docker run -d --name swarm-node-0 swarm join --addr=10.128.1.65:2375 consul://10.128.1.65:8500/swarm
    $docker run -d --name swarm-node-1 swarm join --addr=10.128.2.104:2375 consul://10.128.2.104:8500/swarm
    $docker run -d --name swarm-node-2 swarm join --addr=10.128.3.174:2375 consul://10.128.3.174:8500/swarm
  2. Here is the swarm manager cmd with no replicas:
    $docker run -d -p 8333:2375 --name swarm-manager swarm manage consul://10.128.1.65:8500/swarm
  3. Here is docker/swarm/go stack version (same for all 3 nodes)
    $docker -H tcp://10.128.1.65:2375 version
    Client:
    Version: 1.9.1
    API version: 1.21
    Go version: go1.4.2
    Git commit: a34a1d5
    Built: Fri Nov 20 13:12:04 UTC 2015
    OS/Arch: linux/amd64
    Server:
    Version: 1.9.1
    API version: 1.21
    Go version: go1.4.2
    Git commit: a34a1d5
    Built: Fri Nov 20 13:12:04 UTC 2015
    OS/Arch: linux/ad64
  4. Cluster members as reported with swarm list:
    $docker -H tcp://10.128.1.65:8333 run --rm swarm -l debug list consul://10.128.1.65:8500/swarm
    time="2015-11-30T17:01:06Z" level=debug msg="Initializing discovery service" name=consul uri="10.128.1.65:8500/swarm"
    time="2015-11-30T17:01:06Z" level=debug msg="Initializing discovery without TLS"
    time="2015-11-30T17:01:06Z" level=debug msg="Watch triggered with 3 nodes" discovery=consul
    10.128.1.65:2375
    10.128.2.104:2375
    10.128.3.174:2375

    docker -H tcp://10.128.2.104:2375 run --rm swarm -l debug list consul://10.128.2.104:8500/swarm
    time="2015-11-30T17:02:45Z" level=debug msg="Initializing discovery service" name=consul uri="10.128.2.104:8500/swarm"
    time="2015-11-30T17:02:45Z" level=debug msg="Initializing discovery without TLS"
    time="2015-11-30T17:02:45Z" level=debug msg="Watch triggered with 3 nodes" discovery=consul
    10.128.1.65:2375
    10.128.2.104:2375
    10.128.3.174:2375

    $docker -H tcp://10.128.3.174:2375 run --rm swarm -l debug list consul://10.128.3.174:8500/swarm
    time="2015-11-30T17:03:37Z" level=debug msg="Initializing discovery service" name=consul uri="10.128.3.174:8500/swarm"
    time="2015-11-30T17:03:37Z" level=debug msg="Initializing discovery without TLS"
    time="2015-11-30T17:03:37Z" level=debug msg="Watch triggered with 3 nodes" discovery=consul
    10.128.1.65:2375
    10.128.2.104:2375
    10.128.3.174:2375
  5. Consul cluster is also a 3 node cluster:
    consul members
    Node Address Status Type Build Protocol DC
    ip-10-128-1-65 10.128.1.65:8301 alive server 0.5.2 2 lab
    ip-10-128-2-104 10.128.2.104:8301 alive server 0.5.2 2 lab
    ip-10-128-3-174 10.128.3.174:8301 alive server 0.5.2 2 lab
  6. Consul leader query
    curl http://10.128.1.65:8500/v1/kv/swarm?recurse | python -m json.tool
    [
    {
    "CreateIndex": 11,
    "Flags": 0,
    "Key": "swarm/docker/swarm/nodes/10.128.1.65:2375",
    "LockIndex": 1,
    "ModifyIndex": 241,
    "Session": "749058b0-2167-0173-61cb-21c87b23a768",
    "Value": "MTAuMTI4LjEuNjU6MjM3NQ=="
    },
    {
    "CreateIndex": 18,
    "Flags": 0,
    "Key": "swarm/docker/swarm/nodes/10.128.2.104:2375",
    "LockIndex": 1,
    "ModifyIndex": 242,
    "Session": "4990f682-3bb5-491b-f564-46f1a301d765",
    "Value": "MTAuMTI4LjIuMTA0OjIzNzU="
    },
    {
    "CreateIndex": 14,
    "Flags": 0,
    "Key": "swarm/docker/swarm/nodes/10.128.3.174:2375",
    "LockIndex": 1,
    "ModifyIndex": 240,
    "Session": "5f36ae30-ecf5-3dc9-b714-e19062289fea",
    "Value": "MTAuMTI4LjMuMTc0OjIzNzU="
    },
    {
    "CreateIndex": 20,
    "Flags": 0,
    "Key": "swarm/docker/swarm/nodes",
    "LockIndex": 0,
    "ModifyIndex": 20,
    "Value": null
    }
    ]
  7. The swarm manager can contact docker on other nodes and itself at port 2375 as shown below:

$docker -H tcp://10.128.1.65:2375 info
Containers: 4
Images: 15
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: EXT's
Dirs: 23
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-68-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 1
Total Memory: 992.5 MiB

$docker -H tcp://10.128.2.104:2375 info
Containers: 3
Images: 15
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 21
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-68-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 1
Total Memory: 992.5 MiB

$docker -H tcp://10.128.3.174:2375 info
Containers: 3
Images: 15
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 21
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-68-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 1
Total Memory: 992.5 MiB


Issue:
**However , when querying the swarm manager with docker -H tcp://: info, i get the following:

docker -H tcp://10.128.1.65:8333 info
Containers: 3
Images: 3
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 1</color=red>
ip-10-128-2-104: 10.128.2.104:2375
└ Containers: 3
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.018 GiB
└ Labels: executiondriver=native-0.2, kernelversion=3.13.0-68-generic, operatingsystem=Ubuntu 14.04.3 LTS, storagedriver=aufs
CPUs: 1
Total Memory: 1.018 GiB
Name: 163af7e872a2

I have tried this countless times but the swarm manager ALWAYS ONLY list the same node (10.128.2.104). it fails to to list itself(10.128.1.65) , and node 10.128.3.174.

Any insight appreciated.

@abronan

This comment has been minimized.

Show comment
Hide comment
@abronan

abronan Nov 30, 2015

Contributor

All nodes are identical

Hi @barbarello, are you cloning your VMs to setup your nodes? My first guess is that your docker daemons have the same unique IDs because of the cloning, thus the Manager will only see one Agent as it will assume that this is the same underlying docker Engine.

Contributor

abronan commented Nov 30, 2015

All nodes are identical

Hi @barbarello, are you cloning your VMs to setup your nodes? My first guess is that your docker daemons have the same unique IDs because of the cloning, thus the Manager will only see one Agent as it will assume that this is the same underlying docker Engine.

@barbarello

This comment has been minimized.

Show comment
Hide comment
@barbarello

barbarello Nov 30, 2015

@abronan thanx for your prompt reply.

I'm using the same packer ubuntu ami for all 3 nodes. docker is pre-installed in the ami.

barbarello commented Nov 30, 2015

@abronan thanx for your prompt reply.

I'm using the same packer ubuntu ami for all 3 nodes. docker is pre-installed in the ami.

@barbarello

This comment has been minimized.

Show comment
Hide comment
@barbarello

barbarello Nov 30, 2015

@abronan

docker -H tcp://10.128.1.65:2375 info | grep ID
docker -H tcp://10.128.2.104:2375 info | grep ID
docker -H tcp://10.128.3.174:2375 info | grep ID

all return the same ID: DSPS:LTON:S4ZH:5VJY:PU6A:MUMO:5KQT:LPMU:VN7G:PZVE:Q6ZX:E3QF

If this is what ye're referring to, then
I will remove the docker from the base image, and do a manual install of docker on each node so that the deamon have their own unique IDs.

Will report back here after that.

barbarello commented Nov 30, 2015

@abronan

docker -H tcp://10.128.1.65:2375 info | grep ID
docker -H tcp://10.128.2.104:2375 info | grep ID
docker -H tcp://10.128.3.174:2375 info | grep ID

all return the same ID: DSPS:LTON:S4ZH:5VJY:PU6A:MUMO:5KQT:LPMU:VN7G:PZVE:Q6ZX:E3QF

If this is what ye're referring to, then
I will remove the docker from the base image, and do a manual install of docker on each node so that the deamon have their own unique IDs.

Will report back here after that.

@MHBauer

This comment has been minimized.

Show comment
Hide comment
@MHBauer

MHBauer Nov 30, 2015

Member

Wait for someone else to confirm, but I think you can achieve the same thing by finding key.json, deleting the file and restarting docker daemon service.

Member

MHBauer commented Nov 30, 2015

Wait for someone else to confirm, but I think you can achieve the same thing by finding key.json, deleting the file and restarting docker daemon service.

@Farjad

This comment has been minimized.

Show comment
Hide comment
@Farjad

Farjad Nov 30, 2015

Yes what @MHBauer said should fix your issue.

Farjad commented Nov 30, 2015

Yes what @MHBauer said should fix your issue.

@barbarello

This comment has been minimized.

Show comment
Hide comment
@barbarello

barbarello Nov 30, 2015

@abronan
removing docker from the base ami , and doing a docker manual install on each node, + reboot did NOT solve the issue. ID is still the same.

Now trying @MHBauer 's suggestion.

barbarello commented Nov 30, 2015

@abronan
removing docker from the base ami , and doing a docker manual install on each node, + reboot did NOT solve the issue. ID is still the same.

Now trying @MHBauer 's suggestion.

@Farjad

This comment has been minimized.

Show comment
Hide comment
@Farjad

Farjad Nov 30, 2015

@barbarello

It will continue to use the ID specified in key.json until it is deleted. I don't think reinstalling changes it.

Farjad commented Nov 30, 2015

@barbarello

It will continue to use the ID specified in key.json until it is deleted. I don't think reinstalling changes it.

@abronan

This comment has been minimized.

Show comment
Hide comment
@abronan

abronan Nov 30, 2015

Contributor

@barbelio Yes because it will keep the same key.json file. The file with the ID should be explicitly removed before attempting a reinstall. Sorry for the confusion. This should work fine with @MHBauer 's suggestion.

Tagging as a docs issue, we should emphasize more on this aspect and properly warn users.

Contributor

abronan commented Nov 30, 2015

@barbelio Yes because it will keep the same key.json file. The file with the ID should be explicitly removed before attempting a reinstall. Sorry for the confusion. This should work fine with @MHBauer 's suggestion.

Tagging as a docs issue, we should emphasize more on this aspect and properly warn users.

@barbarello

This comment has been minimized.

Show comment
Hide comment
@barbarello

barbarello Nov 30, 2015

Hi @MHBauer , your fix worked. thanks very much.

@abronan , thank you for updating the public facing docs.

barbarello commented Nov 30, 2015

Hi @MHBauer , your fix worked. thanks very much.

@abronan , thank you for updating the public facing docs.

@aluzzardi

This comment has been minimized.

Show comment
Hide comment
@aluzzardi

aluzzardi Dec 3, 2015

Contributor

Thanks for reporting the issue, @barbarello.

We are going to fix this ASAP in the docs (#1472) but longer term we are going to make the problem more obvious by reporting it in docker info, as explained in #1486.

Contributor

aluzzardi commented Dec 3, 2015

Thanks for reporting the issue, @barbarello.

We are going to fix this ASAP in the docs (#1472) but longer term we are going to make the problem more obvious by reporting it in docker info, as explained in #1486.

@aluzzardi aluzzardi closed this Dec 3, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment