Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No route to host / connection refused - swarm mode on ARM #25892

Closed
alexellis opened this issue Aug 19, 2016 · 20 comments
Closed

No route to host / connection refused - swarm mode on ARM #25892

alexellis opened this issue Aug 19, 2016 · 20 comments

Comments

@alexellis
Copy link
Contributor

alexellis commented Aug 19, 2016

I've set up a Node / ExpressJS service on port 3000 with a Raspberry Pi Model 2 using the install from get.docker.com. If I run the container with docker run it works OK and returns the text "hello" from curl.

CC/ @ManoMarks @DieterReuter @StefanScherer

Expected: curl -4 localhost:3000 should return hello

Actual: curl: (7) Failed to connect to localhost port 3000: Connection refused

Image built from:

https://github.com/alexellis/arm-alpinehello/

$ docker service create --name hello --publish 3000:3000 --replicas=1 alexellis2/arm-alpinehello:latest
docker service ps hello
ID                         NAME     IMAGE                              NODE       DESIRED STATE  CURRENT STATE          ERROR
2m0sodl714p3qowp6hk9zq3nx  hello.1  alexellis2/arm-alpinehello:latest  pi2swarm7  Running        Running 2 minutes ago 
ID                           HOSTNAME   STATUS  AVAILABILITY  MANAGER STATUS
207yx66h5q1i3qn4zwqxy49gc    pi2swarm6  Ready   Active        Reachable
6009dx3xhfmyc5qdkjuwlzgd9    pi2swarm5  Ready   Active        
6ftjhnqqmtjkyk7r5trxcrm9m    pi2swarm2  Ready   Active        
7725832n2rsj1e39edrb02nsr *  pi2swarm1  Ready   Active        Leader
ae26fnpuyh9slt9olan725uir    pi2swarm3  Ready   Active        
cdewqt2lzd3trkretleghe2v9    pi2swarm4  Ready   Active        Reachable
evrpb5uq9li9qbw9a28spkyon    pi2swarm7  Ready   Active   
docker version
Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:31:15 2016
 OS/Arch:      linux/arm

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:31:15 2016
 OS/Arch:      linux/arm
pi@pi2swarm1:~ $ cat /etc/issue
Raspbian GNU/Linux 8 \n \l

pi@pi2swarm1:~ $ uname -a
Linux pi2swarm1 4.4.11-v7+ #888 SMP Mon May 23 20:10:33 BST 2016 armv7l GNU/Linux
pi@pi2swarm1:~ $ 
$ docker info
Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 27
Server Version: 1.12.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host null overlay
Swarm: active
 NodeID: 7725832n2rsj1e39edrb02nsr
 Is Manager: true
 ClusterID: bf29ss6elcv9xy866z2pcwymf
 Managers: 3
 Nodes: 7
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.0.54
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.11-v7+
Operating System: Raspbian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 925.5 MiB
Name: pi2swarm1
ID: XMTN:LXMA:MUKR:WDLH:AOQ5:QSZR:SOKF:6MT6:KPDW:AKO4:BIDQ:4HHG
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: alexellis2
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpuset support
Insecure Registries:
 127.0.0.0/8
@ManoMarks
Copy link

I was able to reproduce the problem. However, using the wan0 address I was able to access.
pi@raspberrypi:~ $ curl 192.168.86.104:3000
Hellopi@raspberrypi:~ $

when I originally init'd the swarm it gave the message that used the wan0 address:
docker swarm join
--token sometoken
192.168.86.104:2377

@alexellis
Copy link
Contributor Author

alexellis commented Aug 20, 2016

Thanks for looking into this @ManoMarks . I have done a docker leave on the first Pi which was the swarm leader. I then noted the ethernet address of the new swarm it created on eth0 - it gave connection refused initially and then worked for a single host (I guess the container was still starting)

When I join a single worker then scale to two replicas I get the error on a round robin basis. Hmm I doing something wrong here?

Summary: A single manager works on its own, when > 1 node exists in the swarm: routing error.

pi@pi2swarm1:~ $ docker service scale hello=2
hello scaled to 2
pi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
curl: (7) Failed to connect to 192.168.0.54 port 3000: No route to host
pi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
Hellopi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
curl: (7) Failed to connect to 192.168.0.54 port 3000: No route to host
pi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
Hellopi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
curl: (7) Failed to connect to 192.168.0.54 port 3000: No route to host
pi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
Hellopi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
curl: (7) Failed to connect to 192.168.0.54 port 3000: No route to host
pi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
Hellopi@pi2swarm1:~ $ curl -4 http://192.168.0.54:3000
curl: (7) Failed to connect to 192.168.0.54 port 3000: No route to host
pi@pi2swarm1:~ $ 

Diagnostics:

pi@pi2swarm1:~ $ docker service ls
ID            NAME   REPLICAS  IMAGE                              COMMAND
7gpmfme3i6fo  hello  2/2       alexellis2/arm-alpinehello:latest  
pi@pi2swarm1:~ $ docker service ps hello
ID                         NAME     IMAGE                              NODE       DESIRED STATE  CURRENT STATE               ERROR
3dekmrc17dp0rpapmffeiqxqi  hello.1  alexellis2/arm-alpinehello:latest  pi2swarm1  Running        Running 6 minutes ago       
7qtx18kyacgiknt6hwokpsnat  hello.2  alexellis2/arm-alpinehello:latest  pi2swarm2  Running        Running about a minute ago  

docker service inspect hello
[
    {
        "ID": "7gpmfme3i6fokgnqlgi6jnc8j",
        "Version": {
            "Index": 25
        },
        "CreatedAt": "2016-08-20T07:33:56.317634185Z",
        "UpdatedAt": "2016-08-20T07:38:56.091576739Z",
        "Spec": {
            "Name": "hello",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "alexellis2/arm-alpinehello:latest"
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 2
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 3000,
                        "PublishedPort": 3000
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 3000,
                        "PublishedPort": 3000
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 3000,
                    "PublishedPort": 3000
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "79fwvv34c6di92phd5fwid8w4",
                    "Addr": "10.255.0.4/16"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]

@JohannesBertens
Copy link

JohannesBertens commented Aug 21, 2016

I am having the same problems, with a single manager in a 3-node swarm - the manager is set to drain.
Connecting to localhost:port only works for the node the container is running on.

@alexellis
Copy link
Contributor Author

Can you try updating the docker.service file to add a --debug flag? See if you get errors. @DJBnjack

@alexellis
Copy link
Contributor Author

1st use-case is as documented above is broken on Raspbian and Arch Linux but not Hypriot - start a web service and scale it over > 1 node then try to curl it through the manager. You will get no route to host.

2nd broken use-case (on Raspbian and Hypriot and Arch Linux):

Intercontainer communication:

$ docker network create --driver overlay armnet
$ docker service create --replicas=1 --network=armnet --name redis alexellis2/redis-arm:v6
$ docker service create --name counter --replicas=5 --network=armnet --publish 3000:3000 alexellis2/arm_redis_counter

@ManoMarks
Copy link

Are you trying Raspbian Jesse Lite and Raspbian Jesse?

On Sun, Aug 21, 2016 at 11:55 AM, Alex Ellis notifications@github.com
wrote:

1st use-case is as documented above is broken on Raspbian and Arch Linux
but not Hypriot - start a web service and scale it over > 1 node then try
to curl it through the manager. You will get no route to host.

2nd broken use-case (on Raspbian and Hypriot and Arch Linux):

Intercontainer communication:

$ docker network create --driver overlay armnet
$ docker service create --replicas=1 --network=armnet --name redis alexellis2/redis-arm:v6
$ docker service create --name counter --replicas=5 --network=armnet --publish 3000:3000 alexellis2/arm_redis_counter


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#25892 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAG8v6dybAueql1Z5-SbzpsDP17XZ-Nbks5qiJ8rgaJpZM4Jo4_T
.

@JohannesBertens
Copy link

JohannesBertens commented Aug 21, 2016

@alexellis - my bad, did not notice this was an open issue for ARM and not in general. I am running docker on x86 machines.

My problem seemed to have been that the services were killed/created too fast: waiting 30 seconds after killing a service before re-creating it seems to be a working work-around for me now.

@ManoMarks
Copy link

I've been able to reproduce the issue with Raspbian Jesse. Single node works, two or more doesn't work.

@alexellis
Copy link
Contributor Author

alexellis commented Aug 22, 2016

On the worker and the manager, when I "join" I get an error:

Failed to create testvxlan interface: error creating vxlan interface: operation not supported

Worker:

ERRO[0012] Error getting node eefxmubv8v1ad8bgb5wiib729: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager. 
ERRO[0012] Handler for GET /v1.24/nodes/eefxmubv8v1ad8bgb5wiib729 returned error: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager. 
This node joined a swarm as a worker.
pi@pi2swarm2:~ $ DEBU[0012] Assigning addresses for endpoint ingress-endpoint's interface on network ingress 
DEBU[0012] RequestAddress(LocalDefault/10.255.0.0/16, 10.255.0.4, map[]) 
DEBU[0012] Assigning addresses for endpoint ingress-endpoint's interface on network ingress 
ERRO[0012] Failed to create testvxlan interface: error creating vxlan interface: operation not supported 
DEBU[0012] checkEncryption(c5nmn7f, 192.168.0.54, 256, false) 
peerdbupdate in sandbox failed for ip 10.255.0.3 and mac 02:42:0a:ff:00:03: could not add neighbor entry into the sandbox: could not find the interface with name vx-000100-c5nmnDEBU[0012] checkEncryption(c5nmn7f, <nil>, 256, true)   
INFO[0000] Firewalld running: false                     
DEBU[0013] Assigning addresses for endpoint gateway_ingress-sbox's interface on network docker_gwbridge 
DEBU[0013] RequestAddress(LocalDefault/172.18.0.0/16, <nil>, map[]) 
DEBU[0013] Assigning addresses for endpoint gateway_ingress-sbox's interface on network docker_gwbridge 
DEBU[0013] Programming external connectivity on endpoint gateway_ingress-sbox (ff9eaa303e24f3683dd444e3ed9ae15d46a0ee7a34324aef708e8eee289c9524) 
DEBU[0024] 2016/08/22 07:58:40 [DEBUG] memberlist: TCP connection from=192.168.0.54:36420

DEBU[0024] pi2swarm2: Initiating bulk sync for networks [c5nmn7f23qrocnja87xs2lzj2] with node pi2swarm1 
DEBU[0034] 2016/08/22 07:58:50 [DEBUG] memberlist: TCP connection from=192.168.0.54:36422

Manager:

=(*Server).updateCluster module=ca
DEBU[0011] Assigning addresses for endpoint ingress-endpoint's interface on network ingress 
DEBU[0011] RequestAddress(LocalDefault/10.255.0.0/16, 10.255.0.3, map[]) 
DEBU[0011] Assigning addresses for endpoint ingress-endpoint's interface on network ingress 
ERRO[0011] Failed to create testvxlan interface: error creating vxlan interface: operation not supported 
DEBU[0012] checkEncryption(c5nmn7f, <nil>, 256, true)   

@justincormack
Copy link
Contributor

@alexellis looks like your kernel might not have vxlan support? The check-config script might help diagnose https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh

@DieterReuter
Copy link
Contributor

DieterReuter commented Aug 22, 2016

@justincormack you are absolutely right, this is a standard Raspbian Jessie OS and this kernel did not have all the recommended Docker settings included, only the mandatory ones. In this case not all the features are working, e.g. VXLAN is not included for sure!

@alexellis
Copy link
Contributor Author

@justincormack That is presumably going to stop swarmmode from functioning fully?

@DieterReuter even with HypriotOS (after reflashing 3x Model 3 Pis) the third test scenario did not work for me. If you could find time to repo it would be appreciated.

I've summarised the basic test scenarios I wanted to see working to be able to launch Mano's swarm visualizer tool: https://github.com/alexellis/swarmmode-tests/tree/master/arm

@alexellis
Copy link
Contributor Author

@justincormack

warning: /proc/config.gz does not exist, searching other paths for kernel config ...
error: cannot find kernel config
  try running this script again, specifying the kernel config:
    CONFIG=/path/to/kernel/.config bash or bash /path/to/kernel/.config

@justincormack
Copy link
Contributor

justincormack commented Aug 22, 2016

@alexellis not sure where the kernel config is (I wish people would configure /proc/config.gz!). modprobe vxlan should indicate if you have it, unless it is built in (unlikely).

I think this may be the kernel config being used, maybe someone could verify https://github.com/raspberrypi/linux/blob/rpi-4.4.y/arch/arm/configs/bcm2709_defconfig
or this
https://github.com/raspberrypi/linux/blob/rpi-4.4.y/arch/arm/configs/bcmrpi_defconfig

@popcornmix
Copy link

@justincormack

sudo modprobe configs
zcat /proc/config.gz

Your link to bcm2709_defconfig is correct.

@justincormack
Copy link
Contributor

The vxlan module has been included in the 4.4.19 kernel update.

@ManoMarks
Copy link

I tested it using @alexellis 's test suite and it worked for me after running rpi-update and rebooting. Thanks @justincormack

@justincormack
Copy link
Contributor

I am going to close this now it is resolved - let us know if anyone has any more issues.

@alexellis
Copy link
Contributor Author

It would be ideal if this worked on Arch Linux for ARM because it has AUFS baked into the Kernel and appears to anecdotally perform much faster than Overlay on Raspbian. I'll try to see what is missing - it may be vxlan and/or other things.

@MathiasRenner
Copy link

@DieterReuter even with HypriotOS (after reflashing 3x Model 3 Pis) the third test scenario did not work for me. If you could find time to repo it would be appreciated.

@alexellis I confirm that this problem does not exist with HypriotOS v. 1.0.1 (it might have been an issue with previous v. 1.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants