IP address + exported port instead of endpoint from marathon #1243

lcottereau · 2017-03-07T10:52:55Z

What version of Traefik are you using (`traefik version`)?

Version: v1.2.0-rc2
Codename: morbier
Go version: go1.7.5
Built: 2017-03-01_01:13:09PM
OS/Arch: linux/amd64

What is your environment & configuration (arguments, toml...)?

Linux RHEL 7.2

my configuration file

    debug = true
    accessLogsFile = "log/access.log"
    logLevel = "INFO"
    defaultEntryPoints = ["http"]
    [entryPoints]
       [entryPoints.http]
       address = "infra-q-i-mes01:8008"
    [web]
    address = ":8009"
    [web.statistics]
       RecentErrors = 10
    [marathon]
    endpoint = "http://infra-q-i-mes01:8080"
    watch = true
    domain = "infra-q-i-mes01"

The deployed app details in Marathon :

    Host: infra-q-i-mes01
    IP Addresses: 10.16.0.3
    Ports: [31855]
    Endpoints: infra-q-i-mes01:31855
    Service Discovery: n/a
    Status: Started
    Staged at: 03/03/2017 à 18:26:26
    Started at: 03/03/2017 à 18:26:27
    Version: 2017-03-03T17:13:55.405Z
    Health: Healthy
    Mesos details: link

The deployed app configuration

Notice the 2 traefik labels

    {
      "id": "trace",
      "cmd": null,
      "cpus": 1,
      "mem": 512,
      "disk": 0,
      "instances": 1,
      "container": {
        "docker": {
          "image": "sysadm-reg/assemblage/trace:1.4-20170303.164750-23",
          "network": "BRIDGE",
          "parameters": [],
          "portMappings": [
            {
              "containerPort": 8080,
              "protocol": "tcp",
              "name": "tomcat",
              "labels": null
            }
          ],
          "forcePullImage": true
        },
        "type": "DOCKER",
        "volumes": []
      },
      "env": {},
      "healthChecks": [
        {
          "protocol": "HTTP",
          "path": "/trace/api/health"
        }
      ],
      "labels": {
        "traefik.frontend.rule": "PathPrefix:/trace",
        "traefik.backend.loadbalancer.sticky": "true"
      },
      "uris": [ "file:///etc/catsa-anonymous-puller.credentials.tar.gz" ],
      "upgradeStrategy": {
        "minimumHealthCapacity": 0,
        "maximumOverCapacity": 0
      }
    }

What did you do?

I try to access my application through traefik with the url http://infra-q-i-mes01:8008/trace/

What did you expect to see?

I expect to see the login webpage of my application trace.

What did you see instead?

I get an HTTP error : 502 Bad Gateway

Just to confirm, the IP address of the trace container 10.16.0.3 is indeed not routable. So the problem seems to come from traefik using the IP address provided by marathon instead of the endpoint. Is that normal (in which case do you know of a way to configure marathon to provide the IP address of the Docker host) or is it a bug or configuraton issue with Traefik ? In anycase it seems uncoherent to me as the IP address is the address of the container and the port is the exported port (hence on the Docker host.)

The traefik log

    INFO[2017-03-07T11:41:34+01:00] Traefik version v1.2.0-rc2 built on 2017-03-01_01:13:09PM 
    INFO[2017-03-07T11:41:34+01:00] Using TOML configuration file /root/traefik/conf.toml 
    DEBU[2017-03-07T11:41:34+01:00] Global configuration loaded   {"GraceTimeOut":10,"Debug":true,"CheckNewVersion":true,"AccessLogsFile":"log/access.log","TraefikLogsFile":"","LogLevel":"DEBUG","EntryPoints":{"http":{"Network":"","Address":"infra-q-i-mes01:8008","TLS":null,"Redirect":null,"Auth":null,"Compress":false}},"Cluster":null,"Constraints":[],"ACME":null,"DefaultEntryPoints":["http"],"ProvidersThrottleDuration":2000000000,"MaxIdleConnsPerHost":200,"InsecureSkipVerify":false,"Retry":null,"Docker":null,"File":null,"Web":{"Address":":8009","CertFile":"","KeyFile":"","ReadOnly":false,"Statistics":{"RecentErrors":10},"Metrics":null,"Auth":null},"Marathon":{"Watch":true,"Filename":"","Constraints":[],"Endpoint":"http://infra-q-i-mes01:8080","Domain":"infra-q-i-mes01","ExposedByDefault":true,"GroupsAsSubDomains":false,"DCOSToken":"","MarathonLBCompatibility":false,"TLS":null,"DialerTimeout":60,"KeepAlive":10,"Basic":null},"Consul":null,"ConsulCatalog":null,"Etcd":null,"Zookeeper":null,"Boltdb":null,"Kubernetes":null,"Mesos":null,"Eureka":null,"ECS":null,"Rancher":null} 
    INFO[2017-03-07T11:41:34+01:00] Preparing server http &{Network: Address:infra-q-i-mes01:8008 TLS:<nil> Redirect:<nil> Auth:<nil> Compress:false} 
    INFO[2017-03-07T11:41:34+01:00] Starting provider *provider.Marathon {"Watch":true,"Filename":"","Constraints":[],"Endpoint":"http://infra-q-i-mes01:8080","Domain":"infra-q-i-mes01","ExposedByDefault":true,"GroupsAsSubDomains":false,"DCOSToken":"","MarathonLBCompatibility":false,"TLS":null,"DialerTimeout":60,"KeepAlive":10,"Basic":null} 
    INFO[2017-03-07T11:41:34+01:00] Starting provider *main.WebProvider {"Address":":8009","CertFile":"","KeyFile":"","ReadOnly":false,"Statistics":{"RecentErrors":10},"Metrics":null,"Auth":null} 
    INFO[2017-03-07T11:41:34+01:00] 0s                                           
    INFO[2017-03-07T11:41:34+01:00] Starting server on infra-q-i-mes01:8008      
    WARN[2017-03-07T11:41:34+01:00] clientTLS is nil         
    DEBU[2017-03-07T11:41:34+01:00] Creating frontend frontend-trace             
    DEBU[2017-03-07T11:41:34+01:00] Wiring frontend frontend-trace to entryPoint http 
    DEBU[2017-03-07T11:41:34+01:00] Creating route route-host-trace PathPrefix:/trace 
    DEBU[2017-03-07T11:41:34+01:00] Creating backend backend-trace               
    DEBU[2017-03-07T11:41:34+01:00] Creating load-balancer wrr                   
    DEBU[2017-03-07T11:41:34+01:00] Sticky session with cookie _TRAEFIK_BACKEND  
    DEBU[2017-03-07T11:41:34+01:00] Creating server server-trace-8790b0ed-0036-11e7-8a86-0242c8fc4f18 at http://10.16.0.3:31855 with weight 0 
    INFO[2017-03-07T11:41:34+01:00] Server configuration reloaded on infra-q-i-mes01:8008 
    WARN[2017-03-07T11:45:09+01:00] Error forwarding to http://10.16.0.3:31855, err: dial tcp 10.16.0.3:31855: getsockopt: connection refused

The text was updated successfully, but these errors were encountered:

timoreimann · 2017-03-07T12:28:29Z

Traefik uses the IP addresses from the application tasks as defined by the task's ipAddresses field. What does that give for you? Can you paste the /task endpoint for your application?

Are you using a real Marathon cluster, or possibly some reduced, local version?

lcottereau · 2017-03-07T12:48:52Z

The /task/trace endpoint is below :

    {
      "app": {
        "id":"/trace",
        "cmd": null,
        "args": null,
        "user": null,
        "env": {},
        "instances": 1,
        "cpus": 1,
        "mem": 512,
        "disk": 0,
        "gpus": 0,
        "executor": "",
        "constraints": [],
        "uris": ["file:///etc/catsa-anonymous-puller.credentials.tar.gz"],
        "fetch":[
          {
            "uri": "file:///etc/catsa-anonymous-puller.credentials.tar.gz",
            "extract": true,
            "executable": false,
            "cache": false
          }
        ],
        "storeUrls": [],
        "backoffSeconds": 1,
        "backoffFactor": 1.15,
        "maxLaunchDelaySeconds": 3600,
        "container": {
          "type": "DOCKER",
          "volumes": [],
          "docker": {
            "image": "sysadm-reg/assemblage/trace:1.4-20170303.164750-23",
            "network":"BRIDGE",
            "portMappings": [
              {
                "containerPort": 8080,
                "hostPort": 0,
                "servicePort": 10000,
                "protocol": "tcp",
                "name": "tomcat",
                "labels": {}
              }
            ],
            "privileged": false,
            "parameters": [],
            "forcePullImage": true
          }
        },
        "healthChecks": [
          {
            "path": "/trace/api/health",
            "protocol": "HTTP",
            "portIndex": 0,
            "gracePeriodSeconds": 300,
            "intervalSeconds": 60,
            "timeoutSeconds": 20,
            "maxConsecutiveFailures": 3,
            "ignoreHttp1xx": false
          }
        ],
        "readinessChecks": [],
        "dependencies": [],
        "upgradeStrategy": {
          "minimumHealthCapacity": 0,
          "maximumOverCapacity": 0
        },
        "labels": {
          "traefik.frontend.rule": "PathPrefix:/trace",
          "traefik.backend.loadbalancer.sticky": "true"
        },
        "acceptedResourceRoles": null,
        "ipAddress": null,
        "version": "2017-03-03T17:13:55.405Z",
        "residency": null,
        "secrets": {},
        "taskKillGracePeriodSeconds": null,
        "ports": [10000],
        "portDefinitions": [
          {
            "port": 10000,
            "protocol": "tcp",
            "labels": {}
          }
        ],
        "requirePorts": false,
        "versionInfo": {
          "lastScalingAt": "2017-03-03T17:13:55.405Z",
          "lastConfigChangeAt": "2017-03-03T17:13:55.405Z"
        },
        "tasksStaged": 0,
        "tasksRunning": 1,
        "tasksHealthy": 1,
        "tasksUnhealthy": 0,
        "deployments": [],
        "tasks": [
          {
            "id": "trace.8790b0ed-0036-11e7-8a86-0242c8fc4f18",
            "slaveId": "546d7d9b-b7de-4745-8eb8-3c2993b7b300-S0",
            "host": "infra-q-i-mes01",
            "state": "TASK_RUNNING",
            "startedAt": "2017-03-03T17:26:27.815Z",
            "stagedAt": "2017-03-03T17:26:26.074Z",
            "ports": [31855],
            "version": "2017-03-03T17:13:55.405Z",
            "ipAddresses": [
              {
                "ipAddress": "10.16.0.3",
                "protocol": "IPv4"
              }
            ],
            "appId": "/trace",
            "healthCheckResults": [
              {
                "alive": true,
                "consecutiveFailures": 0,
                "firstSuccess": "2017-03-03T17:27:07.040Z",
                "lastFailure": null,
                "lastSuccess": "2017-03-07T12:32:50.072Z",
                "lastFailureCause": null,
                "taskId": "trace.8790b0ed-0036-11e7-8a86-0242c8fc4f18"
              }
            ]
          }
        ],
        "lastTaskFailure": {
          "appId": "/trace",
          "host": "infra-q-i-mes01",
          "message": "Task was killed since health check failed. Reason: AskTimeoutException: Ask timed out on [Actor[akka://marathon/user/IO-HTTP#-1431489078]] after [20000 ms]. Sender[null] sent message of type \"spray.http.HttpRequest\".",
          "state": "TASK_KILLED",
          "taskId": "trace.c93df68c-0034-11e7-8a86-0242c8fc4f18",
          "timestamp": "2017-03-03T17:25:20.734Z",
          "version": "2017-03-03T17:13:55.405Z",
          "slaveId":"546d7d9b-b7de-4745-8eb8-3c2993b7b300-S0"
        }
      }
    }

As of now, the marathon server is unique, with a unique slave. traefik and the 2 marathon processes are all launched on the same machine (infra-q-i-mes01). But there is no specific configuration and it should work pretty much the same way in larger scale...

lcottereau · 2017-03-07T12:56:16Z

What I find incoherent is that the address used should either be :

IP or DNS of the host (infra-q-i-mes01) and exposed port (31855)
or the IP or DNS of the container(10.16.0.3) and the port on the container (8080)

The second solution wouldn't work for us as this IP would not be routable from outside infra-q-i-mes01 but at least I would understand the logic...

Once again, it might be a problem with our configuration of Marathon (as suggested by app.tasks[0].ipAddresses[0].ipAddress not being coherent with app.tasks[0].host) in which case I would be grateful for any suggestions.

lcottereau · 2017-03-07T15:41:08Z

I tried to use the mesos provider, with the following configuration

[mesos]
endpoint = "infra-q-i-mes01:5050"
watch = true
domain = "infra-q-i-mes01"
RefreshSeconds = 30

(the last line is necessary, see #1248)

In that configuration it worked but with backend urls of the form http://:31721 . I thought this was strange and tried launching traefik on a different server. Then, domain still wasn't specified and consequently it didn't work.

Does that help ?

timoreimann · 2017-03-09T12:06:34Z

Apologies for the delay.

I also have been wondering why Traefik uses the container IP address and the public port by default. Chances are this was introduced by a series of changes which aren't coherent anymore. Let's use this ticket to track investigations and possibly drive a change.

You can get to a working state using the Mesos slave host names along with the exposed ports by making a slight modification to the default Marathon template file: Replace

{{getBackendServer . $apps}}

by

{{.Host}}

lcottereau · 2017-03-09T12:17:15Z

The workaround you offered works perfectly (although it forces me to manually update the template regularly) . I gather you want me to leave the issue open to follow the longer correction but I thank you profusely for your quick and effective help.

Also, I am available for tests with this as we have a small platform to qualify if we want to have traefik in production.

timoreimann · 2017-03-09T14:35:04Z

@lcottereau glad it worked for you. 🚀

And thanks for your offer -- I suppose I'll get back to that once/if we have a correction in place.

lcottereau · 2017-03-09T16:01:43Z

OK. Thanks again @timoreimann .

diegooliveira · 2017-03-10T02:11:28Z

@lcottereau @timoreimann when using docker, the IP address reported by the marathon API might not be reachable due some docker NAT/proxy magic. I don't see a simple way to automatically choose between app.tasks[0].ipAddresses[0].ipAddress and app.tasks[0].host other than a label. If it's OK I might make a pull request to handle this.

lcottereau · 2017-03-10T09:51:05Z

@diegooliveira As stated above, even if the ipAddress was reachable, the fact that the port used is the exposed port (on the host) would make the result incorrect. It seems to me there is something else at stake here.

timoreimann · 2017-03-10T10:47:54Z

@diegooliveira thanks for chiming in, I appreciate it.

@lcottereau AFAICS, the port does not have to be a Docker-exposed port: If you schedule applications other than Docker containers via Marathon, the task port (which Traefik gives you) could be accessible and not be hidden behind a bridging interface like Docker's. There's also the IP-per-task feature in Marathon, which may give you direct access to Docker containers? (Never worked with that mode, so not exactly sure.)

I'm not exactly sure what the motivation for the initial implementation back then was; I'm going to dig a bit in git history to see if I can find something.

Either way, I think making the host setting configurable through a label so that users can pick what they want to have without having to modify the default template makes sense. Diego, if you'd like to work on that, I'd be happily reviewing any PR you submit.

diegooliveira · 2017-03-10T11:06:08Z

@timoreimann I'll do that

@lcottereau there are some tests cases for how the marathon provider handles the task port. You might take a look here https://github.com/containous/traefik/blob/master/provider/marathon_test.go#L1000. In this test case there is no one that points to the container port. I think this is in the same condition of choosing the task's host name or IP address. Is it OK to always use the container port or you might point which one to use in a label? If you know the port in advance it's possible to use the traefik.port label.

lcottereau · 2017-03-13T09:51:35Z

@diegooliveira in my use case, the issue is rather in the DNS/IP used rather than the port (which would be unroutable from my understanding) and I don't see a test related to this (except maybe TestMarathonGetBackend but its content is not relevant.) Maybe in another test file ?

timoreimann · 2017-03-19T06:42:54Z

Thinking a bit more about the label-based solution, I'm starting to wonder if users would really need to distinguish the host part on a per-application basis frequently. It might seem easier to just introduce a global configuration flag (e.g., host_mode) and honor that during getBackendServer.

lcottereau · 2017-03-19T18:33:44Z

@timoreimann the global configuration flag would suit my use case

timoreimann · 2017-03-19T20:16:34Z

@diegooliveira Should we take the global config flag route? WDYT?

Gabitchov · 2017-03-24T15:43:09Z

Hi,

I have a similar problem since I tried to migrate from «camembert» to «morbier». My backend references Docker container IP and not the Marathon endpoint.

I have to postpone the migration due to this regression.

Regards

diegooliveira · 2017-03-24T17:07:46Z

@Gabitchov @timoreimann did some tests in an environment with and without IP per task and found some guideline to make an implementation that is more backward compatible, but also adjustable to specific use cases.

In my tests it looks like using the IP address is only relevant when there is an IP-Per-Task application description. I'm planning to work on a patch that uses the hostname if there is no IP-Per-Task information in the application definition, use the task IP if there is one, but allowing to force one specific behavior with a global marathon configuration.

timoreimann · 2017-03-24T20:31:09Z

@diegooliveira I'm mostly positive on your approach. Somewhat of a concern I see is that there might be (non-Docker) applications and networking topologies which do not follow one of the two patterns we've been discussing so far. I'm not too deep into the CNI space, but I know that Mesos supports it and AFAIU it enables very different kinds of networking models, some of which may not be covered by our binary classification. For those cases, however, the manual override should hopefully do the trick.

So 👍 on moving forward with your suggestion.

timoreimann · 2017-03-24T22:10:14Z

@Gabitchov If you're fine with making a small modification to the vanilla Marathon template until better auto-detection lands, getting the Marathon provider to speak to hostnames instead of task IP addresses is pretty easy: In line 4, simply replace {{getBackendServer . $apps}} by {{.Host}}. This is how it looks for me:

 url = "{{getProtocol . $apps}}://{{.Host}}:{{getPort . $apps}}"

This change does not require to (re-)compile Traefik: Copying the existing template, making the adjustment, and referencing it via --marathon.filename (or the corresponding config file option) should be enough.

diegooliveira · 2017-03-26T20:01:59Z

@timoreimann I have a path ready to fix the unsound behavior, please review it #1345 .

traefik#1243 (comment) + add ability to override "docker build" command in Makefile (help to coss corporate proxy): make DOCKER_BUILD="docker build --build-arg ..." + commit the big fat traefik binary so dockerhub is happy Signed-off-by: Gaetan Semet <gaetan@xeberon.net>

ldez · 2017-05-02T19:36:11Z

Fix by #1345

timoreimann self-assigned this Mar 9, 2017

timoreimann mentioned this issue Mar 19, 2017

ip-per-task breaks Marathon provider for DCOS (mesos versions >= 0.25.0) #1311

Closed

timoreimann mentioned this issue Mar 29, 2017

Incorrect backend in marathon provider #1362

Closed

timoreimann added the area/provider/marathon label Apr 6, 2017

timoreimann mentioned this issue Apr 6, 2017

Marathon backend with CNI Plugin doesn't work #1390

Closed

timoreimann mentioned this issue Apr 22, 2017

Detect proper hostname automatically. #1345

Merged

timoreimann added the bug label Apr 22, 2017

timoreimann added this to the 1.3 milestone Apr 22, 2017

ldez added the kind/bug/confirmed a confirmed bug (reproducible). label Apr 25, 2017

ldez removed the bug label Apr 25, 2017

timoreimann mentioned this issue Apr 30, 2017

[Rancher] No IP in backend in host networking mode #1498

Closed

ldez closed this as completed May 2, 2017

traefik locked and limited conversation to collaborators Sep 1, 2019

traefiker added the status/5-frozen-due-to-age label Sep 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP address + exported port instead of endpoint from marathon #1243

IP address + exported port instead of endpoint from marathon #1243

lcottereau commented Mar 7, 2017 •

edited by ldez

Loading

timoreimann commented Mar 7, 2017

lcottereau commented Mar 7, 2017 •

edited by ldez

Loading

lcottereau commented Mar 7, 2017 •

edited

Loading

lcottereau commented Mar 7, 2017

timoreimann commented Mar 9, 2017

lcottereau commented Mar 9, 2017

timoreimann commented Mar 9, 2017

lcottereau commented Mar 9, 2017

diegooliveira commented Mar 10, 2017

lcottereau commented Mar 10, 2017 •

edited

Loading

timoreimann commented Mar 10, 2017

diegooliveira commented Mar 10, 2017

lcottereau commented Mar 13, 2017

timoreimann commented Mar 19, 2017 •

edited

Loading

lcottereau commented Mar 19, 2017

timoreimann commented Mar 19, 2017

Gabitchov commented Mar 24, 2017

diegooliveira commented Mar 24, 2017

timoreimann commented Mar 24, 2017

timoreimann commented Mar 24, 2017 •

edited

Loading

diegooliveira commented Mar 26, 2017

ldez commented May 2, 2017

IP address + exported port instead of endpoint from marathon #1243

IP address + exported port instead of endpoint from marathon #1243

Comments

lcottereau commented Mar 7, 2017 • edited by ldez Loading

What version of Traefik are you using (traefik version)?

What is your environment & configuration (arguments, toml...)?

my configuration file

The deployed app details in Marathon :

The deployed app configuration

What did you do?

What did you expect to see?

What did you see instead?

The traefik log

timoreimann commented Mar 7, 2017

lcottereau commented Mar 7, 2017 • edited by ldez Loading

lcottereau commented Mar 7, 2017 • edited Loading

lcottereau commented Mar 7, 2017

timoreimann commented Mar 9, 2017

lcottereau commented Mar 9, 2017

timoreimann commented Mar 9, 2017

lcottereau commented Mar 9, 2017

diegooliveira commented Mar 10, 2017

lcottereau commented Mar 10, 2017 • edited Loading

timoreimann commented Mar 10, 2017

diegooliveira commented Mar 10, 2017

lcottereau commented Mar 13, 2017

timoreimann commented Mar 19, 2017 • edited Loading

lcottereau commented Mar 19, 2017

timoreimann commented Mar 19, 2017

Gabitchov commented Mar 24, 2017

diegooliveira commented Mar 24, 2017

timoreimann commented Mar 24, 2017

timoreimann commented Mar 24, 2017 • edited Loading

diegooliveira commented Mar 26, 2017

ldez commented May 2, 2017

lcottereau commented Mar 7, 2017 •

edited by ldez

Loading

What version of Traefik are you using (`traefik version`)?

lcottereau commented Mar 7, 2017 •

edited by ldez

Loading

lcottereau commented Mar 7, 2017 •

edited

Loading

lcottereau commented Mar 10, 2017 •

edited

Loading

timoreimann commented Mar 19, 2017 •

edited

Loading

timoreimann commented Mar 24, 2017 •

edited

Loading