Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingresses for a sticky services with a single ingress path can get multiple cookies #1744

Closed
thomas-b-jackson opened this issue Jun 13, 2017 · 16 comments

Comments

@thomas-b-jackson
Copy link

thomas-b-jackson commented Jun 13, 2017

Do you want to request a feature or report a bug?

Bug

What did you do?

  • deploy a web app on k8s with multiple pods (3 app pods in our case)
  • create an k8s service that selects the pods and requests stickiness through traefik.backend.loadbalancer.sticky: "true"
  • deploy traefic ingress controller pods via a k8s deployment (3 traefik pods in our case)
    • configure traefik to watch a single namespace ("sets" in our case)
  • create a k8s ingress resource with a single host
    • path: /
    • kubernetes.io/ingress.class: traefik
  • start hitting endpoint from a single client via a web browser

What did you expect to see?

Since stickiness is configured, and since a single, global path was specified, all requests from a given client browser session should be directed to the same app pod.

What did you see instead?

For some web clients, all requests go to the same app pod.

  • these client have a single traefik backend cookie
    snip20170612_9

For some other web clients, requests for path path1 always to to pod A, request for path2 got to pod B, etc.

  • clients affected always have multiple traefik backend cookies, one per path
    screen shot 2017-06-12 at 4 18 20 pm
  • as many as 4 cookies, each with a different path have been observed

Output of traefik version

1.3.0

What is your environment & configuration

traefik config

apiVersion: v1
data:
  traefik.toml: |
    # traefik.toml
    logLevel = "INFO"
    defaultEntryPoints = ["http"]
    [entryPoints]
      [entryPoints.http]
      address = ":80"
    [kubernetes]
    namespaces = ["sets"]
kind: ConfigMap
metadata:
  creationTimestamp: 2017-06-06T23:08:37Z
  labels:
    app: private-ingress
  name: private-ingress
  namespace: sets
  resourceVersion: "105250320"
  selfLink: /api/v1/namespaces/sets/configmaps/private-ingress
  uid: 12642da1-4b0d-11e7-a9df-0259bfeee5dc

app ingress resource

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
  creationTimestamp: 2017-06-12T22:49:12Z
  generation: 2
  labels:
    app: jira
    env: archive
  name: jira-private-archive
  namespace: sets
  resourceVersion: "107822544"
  selfLink: /apis/extensions/v1beta1/namespaces/sets/ingresses/jira-private-archive
  uid: 5a4f53bf-4fc1-11e7-9ac4-02ba24f06372
spec:
  rules:
  - host: jiraarchive.nordstrom.net
    http:
      paths:
      - backend:
          serviceName: jira-test
          servicePort: 8080
        path: /
status:
  loadBalancer: {}

app svc

apiVersion: v1
kind: Service
metadata:
  annotations:
    traefik.backend.loadbalancer.sticky: "true"
  creationTimestamp: 2017-06-12T16:35:53Z
  labels:
    app: jira
    component: jira
    env: test
  name: jira-test
  namespace: sets
  resourceVersion: "107822074"
  selfLink: /api/v1/namespaces/sets/services/jira-test
  uid: 33b4d522-4f8d-11e7-ad5f-02587911dfec
spec:
  clusterIP: 25.0.143.41
  ports:
  - name: jira
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: jira
    component: jira
    env: test
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

If applicable, please paste the log output in debug mode (--debug switch)

(paste your output here)
@ldez ldez added area/provider/k8s/ingress area/sticky-session kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Jun 13, 2017
@ldez
Copy link
Member

ldez commented Jun 13, 2017

Seems related to #1716

@thomas-b-jackson
Copy link
Author

snip20170612_9

screen shot 2017-06-12 at 4 18 20 pm

@timoreimann
Copy link
Contributor

I have been debugging the issue together with @thomas-b-jackson on Slack for a few days.

Tom also tried the current experimental Docker Hub image which includes #1716; unfortunately, it didn't make a difference.

The problem manifests both with 1.2.3 and 1.3.0, so it doesn't look like a (recent) regression either.

@thomas-b-jackson
Copy link
Author

note that it seems to manifest more rarely with v1.2.3 than it does with v1.3.0

we saw in on multiple client sessions with v1.3.0, but have so far seen in only on one client session with v1.2.3

@ldez ldez added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Jun 13, 2017
@timoreimann
Copy link
Contributor

What's interesting is that some screenshots show multiple cookies having the same content, i.e., server URLs, which means that Traefik likely considered the cookie to not contain a valid server URL anymore at some point and created another one with the same URL. The oxy library uses http.SetCookie which never overwrites but only ever adds cookie headers, so that would fit with my hypothesis.

The question would then be: why/how do we run into this case?

@timoreimann
Copy link
Contributor

Preliminary results: Chrome sometimes seems to be losing the sticky cookie that Treafik sets. The behavior isn't super consistent but reproducible for sure. Other browsers don't seem to have that problem, and neither does the Nginx Ingress controller.

To be continued...

@thomas-b-jackson
Copy link
Author

thomas-b-jackson commented Jun 15, 2017

after lots of trial and error we've found root cause:

the problem is that when traefik sets its cookie it doesn't specify a path

so the browser has to assume a path

chrome assumes the path of the splash page you use to access the web app in the repro steps

firefox and safari assume the root path

in chrome, for subsequent call backs, if the callback path doesn't match the original path, chrome doesn't send the traefik cookie

if chrome doesn't send the traefik cookie, traefic responds by sending a new cookie with a different value.

@timoreimann
Copy link
Contributor

The Internet indicates that not setting the path can lead to problems just like the one @thomas-b-jackson describes.

AFAICS, we should set the cookie path. One question is whether we should set it to root (/) or to the frontend's path (if specified)? The former is probably simpler to implement but would expose the sticky cookie to all frontends. The latter is presumably the correct way to do it but might lead to some tricky edge cases (I'm thinking of complicated path matchers/modifiers, or combinations thereof).

@containous/traefik any thoughts?

@m4r10k
Copy link

m4r10k commented Jun 22, 2017

We are currently facing the same problem. One application with sub-paths results in multiple loadbalancer-cookies.

grafik

In my optinion both cases are valid. Setting it to root (/) is probably reasonable if the match is a Host: matcher. If the matcher is of type PathPrefix:, than this path should be used. Maybe it would not be perfect, but for the first iteration it would solve the problem of currently totaly not working applications.

@marcopaga
Copy link
Contributor

I got the simple fix working for our servers working. I forked the oxy libs and set the cookie-path to "/". Now the clients can handle the cookie correctly. I pushed it as marcopaga/traefik:1.3.5.2 based on 1.3.5 and the simple fixing commit.

@timoreimann
Copy link
Contributor

/cc @marco-jantke

@m3co-code
Copy link
Contributor

m3co-code commented Aug 9, 2017

I am trying to figure out a more general solution for the problem at hand. One precondition I would like to mention is that the cookie name contains the backend name in the latest version and so we can have multiple cookies, one per backend.

I was thinking whether we still have to add a specific Path to a cookie or whether / suffices after the addition of the Backend name. Consider the following configuration:

[frontends]
  [frontends.frontend1]
  backend = "backend1"
    [frontends.frontend1.routes.test_1]
    rule = "Path:/ok"
  [frontends.frontend2]
  backend = "backend1"
    [frontends.frontend2.routes.test_1]
    rule = "Path:/notOk"

[backends]
  [backends.backend1]
    [backends.backend1.LoadBalancer]
    sticky = true
    [backends.backend1.servers.server1]
    url = "http://localhost:8081"
    [backends.backend1.servers.server2]
    url = "http://localhost:8082"

To my understanding it seems the desired behaviour that for requests to /ok and /notOk we have the same sticky backend and this will always be the case when we set the cookie's path to /.

If that is the case, I think we can make the logical conclusion that the same holds true for other rule specifics of a Frontend (e.g. Host) and that the cookie name addition with the backend's name actually suffices. This means that simply adding a / to the cookie like @marcopaga did is enough and a valid solution even in general terms.

@timoreimann @marcopaga WDYT about it?

@timoreimann
Copy link
Contributor

@marco-jantke makes sense to me.

One caveat we should mention is that the security/privacy concern I alluded to before would still be present. Given the complexity of matching paths properly when regular expressions are involved, however, I say we move forward with the root-based solution for now. After all, a lot of folks are presumably operating Traefik in a trusted environment where these concerns don't matter.

@marcopaga would you mind submitting a PR against containous/oxy and afterwards one against containous/traefik to vendor the updated oxy package? Thanks!

@marcopaga
Copy link
Contributor

Sure, great :)

@timoreimann I created the pull request for oxy to set the cookie path. Once it is merged I will create a PR for containous/traefik.

@marco-jantke I changed the commit to reflect my new knowledge regarding "iff"

@marcopaga
Copy link
Contributor

marcopaga commented Aug 12, 2017

I created the PR to pick up the changes in oxy.

@traefiker traefiker added this to the 1.4 milestone Aug 13, 2017
@sriyer
Copy link

sriyer commented Jan 2, 2019

@marco-jantke @timoreimann

We see the same problem on our environments, we use v1.3.3 of traefik.
Before we think of making a patch with this change, although the change is very simple, i'm curious to understand if the solution would be viable for a situation where the same backend does NOT serve requests for both prefixes /ok and /notOK, the host (header) is the same however the prefixes differ.

The situation usually happens when there are different backends that serve the UI and APIs for an application and there may be cases where the UI makes async calls to the API service. Both being served by the same domain but have different prefixes.

a hypothetical eg:
me.mydomain.com/ui
me.mydomain.com/api

users use the /ui prefix to access the application on the browser, the browser serves content based on results from async calls to /api both requiring sticky behavior.

@traefik traefik locked and limited conversation to collaborators Sep 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants