Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP address + exported port instead of endpoint from marathon #1243

Closed
lcottereau opened this issue Mar 7, 2017 · 22 comments
Closed

IP address + exported port instead of endpoint from marathon #1243

lcottereau opened this issue Mar 7, 2017 · 22 comments
Assignees
Milestone

Comments

@lcottereau
Copy link

lcottereau commented Mar 7, 2017

What version of Traefik are you using (traefik version)?

  • Version: v1.2.0-rc2
  • Codename: morbier
  • Go version: go1.7.5
  • Built: 2017-03-01_01:13:09PM
  • OS/Arch: linux/amd64

What is your environment & configuration (arguments, toml...)?

Linux RHEL 7.2

my configuration file

    debug = true
    accessLogsFile = "log/access.log"
    logLevel = "INFO"
    defaultEntryPoints = ["http"]
    [entryPoints]
       [entryPoints.http]
       address = "infra-q-i-mes01:8008"
    [web]
    address = ":8009"
    [web.statistics]
       RecentErrors = 10
    [marathon]
    endpoint = "http://infra-q-i-mes01:8080"
    watch = true
    domain = "infra-q-i-mes01"

The deployed app details in Marathon :

    Host: infra-q-i-mes01
    IP Addresses: 10.16.0.3
    Ports: [31855]
    Endpoints: infra-q-i-mes01:31855
    Service Discovery: n/a
    Status: Started
    Staged at: 03/03/2017 à 18:26:26
    Started at: 03/03/2017 à 18:26:27
    Version: 2017-03-03T17:13:55.405Z
    Health: Healthy
    Mesos details: link

The deployed app configuration

Notice the 2 traefik labels

    {
      "id": "trace",
      "cmd": null,
      "cpus": 1,
      "mem": 512,
      "disk": 0,
      "instances": 1,
      "container": {
        "docker": {
          "image": "sysadm-reg/assemblage/trace:1.4-20170303.164750-23",
          "network": "BRIDGE",
          "parameters": [],
          "portMappings": [
            {
              "containerPort": 8080,
              "protocol": "tcp",
              "name": "tomcat",
              "labels": null
            }
          ],
          "forcePullImage": true
        },
        "type": "DOCKER",
        "volumes": []
      },
      "env": {},
      "healthChecks": [
        {
          "protocol": "HTTP",
          "path": "/trace/api/health"
        }
      ],
      "labels": {
        "traefik.frontend.rule": "PathPrefix:/trace",
        "traefik.backend.loadbalancer.sticky": "true"
      },
      "uris": [ "file:///etc/catsa-anonymous-puller.credentials.tar.gz" ],
      "upgradeStrategy": {
        "minimumHealthCapacity": 0,
        "maximumOverCapacity": 0
      }
    }

What did you do?

I try to access my application through traefik with the url http://infra-q-i-mes01:8008/trace/

What did you expect to see?

I expect to see the login webpage of my application trace.

What did you see instead?

I get an HTTP error : 502 Bad Gateway

Just to confirm, the IP address of the trace container 10.16.0.3 is indeed not routable. So the problem seems to come from traefik using the IP address provided by marathon instead of the endpoint. Is that normal (in which case do you know of a way to configure marathon to provide the IP address of the Docker host) or is it a bug or configuraton issue with Traefik ? In anycase it seems uncoherent to me as the IP address is the address of the container and the port is the exported port (hence on the Docker host.)

The traefik log

    INFO[2017-03-07T11:41:34+01:00] Traefik version v1.2.0-rc2 built on 2017-03-01_01:13:09PM 
    INFO[2017-03-07T11:41:34+01:00] Using TOML configuration file /root/traefik/conf.toml 
    DEBU[2017-03-07T11:41:34+01:00] Global configuration loaded   {"GraceTimeOut":10,"Debug":true,"CheckNewVersion":true,"AccessLogsFile":"log/access.log","TraefikLogsFile":"","LogLevel":"DEBUG","EntryPoints":{"http":{"Network":"","Address":"infra-q-i-mes01:8008","TLS":null,"Redirect":null,"Auth":null,"Compress":false}},"Cluster":null,"Constraints":[],"ACME":null,"DefaultEntryPoints":["http"],"ProvidersThrottleDuration":2000000000,"MaxIdleConnsPerHost":200,"InsecureSkipVerify":false,"Retry":null,"Docker":null,"File":null,"Web":{"Address":":8009","CertFile":"","KeyFile":"","ReadOnly":false,"Statistics":{"RecentErrors":10},"Metrics":null,"Auth":null},"Marathon":{"Watch":true,"Filename":"","Constraints":[],"Endpoint":"http://infra-q-i-mes01:8080","Domain":"infra-q-i-mes01","ExposedByDefault":true,"GroupsAsSubDomains":false,"DCOSToken":"","MarathonLBCompatibility":false,"TLS":null,"DialerTimeout":60,"KeepAlive":10,"Basic":null},"Consul":null,"ConsulCatalog":null,"Etcd":null,"Zookeeper":null,"Boltdb":null,"Kubernetes":null,"Mesos":null,"Eureka":null,"ECS":null,"Rancher":null} 
    INFO[2017-03-07T11:41:34+01:00] Preparing server http &{Network: Address:infra-q-i-mes01:8008 TLS:<nil> Redirect:<nil> Auth:<nil> Compress:false} 
    INFO[2017-03-07T11:41:34+01:00] Starting provider *provider.Marathon {"Watch":true,"Filename":"","Constraints":[],"Endpoint":"http://infra-q-i-mes01:8080","Domain":"infra-q-i-mes01","ExposedByDefault":true,"GroupsAsSubDomains":false,"DCOSToken":"","MarathonLBCompatibility":false,"TLS":null,"DialerTimeout":60,"KeepAlive":10,"Basic":null} 
    INFO[2017-03-07T11:41:34+01:00] Starting provider *main.WebProvider {"Address":":8009","CertFile":"","KeyFile":"","ReadOnly":false,"Statistics":{"RecentErrors":10},"Metrics":null,"Auth":null} 
    INFO[2017-03-07T11:41:34+01:00] 0s                                           
    INFO[2017-03-07T11:41:34+01:00] Starting server on infra-q-i-mes01:8008      
    WARN[2017-03-07T11:41:34+01:00] clientTLS is nil         
    DEBU[2017-03-07T11:41:34+01:00] Creating frontend frontend-trace             
    DEBU[2017-03-07T11:41:34+01:00] Wiring frontend frontend-trace to entryPoint http 
    DEBU[2017-03-07T11:41:34+01:00] Creating route route-host-trace PathPrefix:/trace 
    DEBU[2017-03-07T11:41:34+01:00] Creating backend backend-trace               
    DEBU[2017-03-07T11:41:34+01:00] Creating load-balancer wrr                   
    DEBU[2017-03-07T11:41:34+01:00] Sticky session with cookie _TRAEFIK_BACKEND  
    DEBU[2017-03-07T11:41:34+01:00] Creating server server-trace-8790b0ed-0036-11e7-8a86-0242c8fc4f18 at http://10.16.0.3:31855 with weight 0 
    INFO[2017-03-07T11:41:34+01:00] Server configuration reloaded on infra-q-i-mes01:8008 
    WARN[2017-03-07T11:45:09+01:00] Error forwarding to http://10.16.0.3:31855, err: dial tcp 10.16.0.3:31855: getsockopt: connection refused 
@timoreimann
Copy link
Contributor

Traefik uses the IP addresses from the application tasks as defined by the task's ipAddresses field. What does that give for you? Can you paste the /task endpoint for your application?

Are you using a real Marathon cluster, or possibly some reduced, local version?

@lcottereau
Copy link
Author

lcottereau commented Mar 7, 2017

The /task/trace endpoint is below :

    {
      "app": {
        "id":"/trace",
        "cmd": null,
        "args": null,
        "user": null,
        "env": {},
        "instances": 1,
        "cpus": 1,
        "mem": 512,
        "disk": 0,
        "gpus": 0,
        "executor": "",
        "constraints": [],
        "uris": ["file:///etc/catsa-anonymous-puller.credentials.tar.gz"],
        "fetch":[
          {
            "uri": "file:///etc/catsa-anonymous-puller.credentials.tar.gz",
            "extract": true,
            "executable": false,
            "cache": false
          }
        ],
        "storeUrls": [],
        "backoffSeconds": 1,
        "backoffFactor": 1.15,
        "maxLaunchDelaySeconds": 3600,
        "container": {
          "type": "DOCKER",
          "volumes": [],
          "docker": {
            "image": "sysadm-reg/assemblage/trace:1.4-20170303.164750-23",
            "network":"BRIDGE",
            "portMappings": [
              {
                "containerPort": 8080,
                "hostPort": 0,
                "servicePort": 10000,
                "protocol": "tcp",
                "name": "tomcat",
                "labels": {}
              }
            ],
            "privileged": false,
            "parameters": [],
            "forcePullImage": true
          }
        },
        "healthChecks": [
          {
            "path": "/trace/api/health",
            "protocol": "HTTP",
            "portIndex": 0,
            "gracePeriodSeconds": 300,
            "intervalSeconds": 60,
            "timeoutSeconds": 20,
            "maxConsecutiveFailures": 3,
            "ignoreHttp1xx": false
          }
        ],
        "readinessChecks": [],
        "dependencies": [],
        "upgradeStrategy": {
          "minimumHealthCapacity": 0,
          "maximumOverCapacity": 0
        },
        "labels": {
          "traefik.frontend.rule": "PathPrefix:/trace",
          "traefik.backend.loadbalancer.sticky": "true"
        },
        "acceptedResourceRoles": null,
        "ipAddress": null,
        "version": "2017-03-03T17:13:55.405Z",
        "residency": null,
        "secrets": {},
        "taskKillGracePeriodSeconds": null,
        "ports": [10000],
        "portDefinitions": [
          {
            "port": 10000,
            "protocol": "tcp",
            "labels": {}
          }
        ],
        "requirePorts": false,
        "versionInfo": {
          "lastScalingAt": "2017-03-03T17:13:55.405Z",
          "lastConfigChangeAt": "2017-03-03T17:13:55.405Z"
        },
        "tasksStaged": 0,
        "tasksRunning": 1,
        "tasksHealthy": 1,
        "tasksUnhealthy": 0,
        "deployments": [],
        "tasks": [
          {
            "id": "trace.8790b0ed-0036-11e7-8a86-0242c8fc4f18",
            "slaveId": "546d7d9b-b7de-4745-8eb8-3c2993b7b300-S0",
            "host": "infra-q-i-mes01",
            "state": "TASK_RUNNING",
            "startedAt": "2017-03-03T17:26:27.815Z",
            "stagedAt": "2017-03-03T17:26:26.074Z",
            "ports": [31855],
            "version": "2017-03-03T17:13:55.405Z",
            "ipAddresses": [
              {
                "ipAddress": "10.16.0.3",
                "protocol": "IPv4"
              }
            ],
            "appId": "/trace",
            "healthCheckResults": [
              {
                "alive": true,
                "consecutiveFailures": 0,
                "firstSuccess": "2017-03-03T17:27:07.040Z",
                "lastFailure": null,
                "lastSuccess": "2017-03-07T12:32:50.072Z",
                "lastFailureCause": null,
                "taskId": "trace.8790b0ed-0036-11e7-8a86-0242c8fc4f18"
              }
            ]
          }
        ],
        "lastTaskFailure": {
          "appId": "/trace",
          "host": "infra-q-i-mes01",
          "message": "Task was killed since health check failed. Reason: AskTimeoutException: Ask timed out on [Actor[akka://marathon/user/IO-HTTP#-1431489078]] after [20000 ms]. Sender[null] sent message of type \"spray.http.HttpRequest\".",
          "state": "TASK_KILLED",
          "taskId": "trace.c93df68c-0034-11e7-8a86-0242c8fc4f18",
          "timestamp": "2017-03-03T17:25:20.734Z",
          "version": "2017-03-03T17:13:55.405Z",
          "slaveId":"546d7d9b-b7de-4745-8eb8-3c2993b7b300-S0"
        }
      }
    }

As of now, the marathon server is unique, with a unique slave. traefik and the 2 marathon processes are all launched on the same machine (infra-q-i-mes01). But there is no specific configuration and it should work pretty much the same way in larger scale...

@lcottereau
Copy link
Author

lcottereau commented Mar 7, 2017

What I find incoherent is that the address used should either be :

  • IP or DNS of the host (infra-q-i-mes01) and exposed port (31855)
  • or the IP or DNS of the container(10.16.0.3) and the port on the container (8080)

The second solution wouldn't work for us as this IP would not be routable from outside infra-q-i-mes01 but at least I would understand the logic...

Once again, it might be a problem with our configuration of Marathon (as suggested by app.tasks[0].ipAddresses[0].ipAddress not being coherent with app.tasks[0].host) in which case I would be grateful for any suggestions.

@lcottereau
Copy link
Author

I tried to use the mesos provider, with the following configuration

[mesos]
endpoint = "infra-q-i-mes01:5050"
watch = true
domain = "infra-q-i-mes01"
RefreshSeconds = 30

(the last line is necessary, see #1248)

In that configuration it worked but with backend urls of the form http://:31721 . I thought this was strange and tried launching traefik on a different server. Then, domain still wasn't specified and consequently it didn't work.

Does that help ?

@timoreimann
Copy link
Contributor

Apologies for the delay.

I also have been wondering why Traefik uses the container IP address and the public port by default. Chances are this was introduced by a series of changes which aren't coherent anymore. Let's use this ticket to track investigations and possibly drive a change.

You can get to a working state using the Mesos slave host names along with the exposed ports by making a slight modification to the default Marathon template file: Replace

{{getBackendServer . $apps}}

by

{{.Host}}

@lcottereau
Copy link
Author

The workaround you offered works perfectly (although it forces me to manually update the template regularly) . I gather you want me to leave the issue open to follow the longer correction but I thank you profusely for your quick and effective help.

Also, I am available for tests with this as we have a small platform to qualify if we want to have traefik in production.

@timoreimann
Copy link
Contributor

@lcottereau glad it worked for you. 🚀

And thanks for your offer -- I suppose I'll get back to that once/if we have a correction in place.

@timoreimann timoreimann self-assigned this Mar 9, 2017
@lcottereau
Copy link
Author

OK. Thanks again @timoreimann .

@diegooliveira
Copy link
Contributor

@lcottereau @timoreimann when using docker, the IP address reported by the marathon API might not be reachable due some docker NAT/proxy magic. I don't see a simple way to automatically choose between app.tasks[0].ipAddresses[0].ipAddress and app.tasks[0].host other than a label. If it's OK I might make a pull request to handle this.

@lcottereau
Copy link
Author

lcottereau commented Mar 10, 2017

@diegooliveira As stated above, even if the ipAddress was reachable, the fact that the port used is the exposed port (on the host) would make the result incorrect. It seems to me there is something else at stake here.

@timoreimann
Copy link
Contributor

@diegooliveira thanks for chiming in, I appreciate it.

@lcottereau AFAICS, the port does not have to be a Docker-exposed port: If you schedule applications other than Docker containers via Marathon, the task port (which Traefik gives you) could be accessible and not be hidden behind a bridging interface like Docker's. There's also the IP-per-task feature in Marathon, which may give you direct access to Docker containers? (Never worked with that mode, so not exactly sure.)

I'm not exactly sure what the motivation for the initial implementation back then was; I'm going to dig a bit in git history to see if I can find something.

Either way, I think making the host setting configurable through a label so that users can pick what they want to have without having to modify the default template makes sense. Diego, if you'd like to work on that, I'd be happily reviewing any PR you submit.

@diegooliveira
Copy link
Contributor

@timoreimann I'll do that

@lcottereau there are some tests cases for how the marathon provider handles the task port. You might take a look here https://github.com/containous/traefik/blob/master/provider/marathon_test.go#L1000. In this test case there is no one that points to the container port. I think this is in the same condition of choosing the task's host name or IP address. Is it OK to always use the container port or you might point which one to use in a label? If you know the port in advance it's possible to use the traefik.port label.

@lcottereau
Copy link
Author

@diegooliveira in my use case, the issue is rather in the DNS/IP used rather than the port (which would be unroutable from my understanding) and I don't see a test related to this (except maybe TestMarathonGetBackend but its content is not relevant.) Maybe in another test file ?

@timoreimann
Copy link
Contributor

timoreimann commented Mar 19, 2017

Thinking a bit more about the label-based solution, I'm starting to wonder if users would really need to distinguish the host part on a per-application basis frequently. It might seem easier to just introduce a global configuration flag (e.g., host_mode) and honor that during getBackendServer.

@lcottereau
Copy link
Author

@timoreimann the global configuration flag would suit my use case

@timoreimann
Copy link
Contributor

@diegooliveira Should we take the global config flag route? WDYT?

@Gabitchov
Copy link

Hi,

I have a similar problem since I tried to migrate from «camembert» to «morbier». My backend references Docker container IP and not the Marathon endpoint.

I have to postpone the migration due to this regression.

Regards

@diegooliveira
Copy link
Contributor

@Gabitchov @timoreimann did some tests in an environment with and without IP per task and found some guideline to make an implementation that is more backward compatible, but also adjustable to specific use cases.

In my tests it looks like using the IP address is only relevant when there is an IP-Per-Task application description. I'm planning to work on a patch that uses the hostname if there is no IP-Per-Task information in the application definition, use the task IP if there is one, but allowing to force one specific behavior with a global marathon configuration.

@timoreimann
Copy link
Contributor

@diegooliveira I'm mostly positive on your approach. Somewhat of a concern I see is that there might be (non-Docker) applications and networking topologies which do not follow one of the two patterns we've been discussing so far. I'm not too deep into the CNI space, but I know that Mesos supports it and AFAIU it enables very different kinds of networking models, some of which may not be covered by our binary classification. For those cases, however, the manual override should hopefully do the trick.

So 👍 on moving forward with your suggestion.

@timoreimann
Copy link
Contributor

timoreimann commented Mar 24, 2017

@Gabitchov If you're fine with making a small modification to the vanilla Marathon template until better auto-detection lands, getting the Marathon provider to speak to hostnames instead of task IP addresses is pretty easy: In line 4, simply replace {{getBackendServer . $apps}} by {{.Host}}. This is how it looks for me:

 url = "{{getProtocol . $apps}}://{{.Host}}:{{getPort . $apps}}"

This change does not require to (re-)compile Traefik: Copying the existing template, making the adjustment, and referencing it via --marathon.filename (or the corresponding config file option) should be enough.

@diegooliveira
Copy link
Contributor

@timoreimann I have a path ready to fix the unsound behavior, please review it #1345 .

@ldez ldez removed the bug label Apr 25, 2017
gsemet added a commit to gsemet/traefik that referenced this issue May 2, 2017
traefik#1243 (comment)

+ add ability to override "docker build" command in Makefile
  (help to coss corporate proxy):

    make DOCKER_BUILD="docker build --build-arg ..."

+ commit the big fat traefik binary so dockerhub is happy

Signed-off-by: Gaetan Semet <gaetan@xeberon.net>
@ldez
Copy link
Contributor

ldez commented May 2, 2017

Fix by #1345

@ldez ldez closed this as completed May 2, 2017
@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants