Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not login after upgrading to 4 #2463

Closed
phynias opened this issue Aug 4, 2018 · 43 comments
Closed

can not login after upgrading to 4 #2463

phynias opened this issue Aug 4, 2018 · 43 comments

Comments

@phynias
Copy link

phynias commented Aug 4, 2018

i have the following set for my web config

CONCOURSE_BASIC_AUTH_USERNAME=admin
CONCOURSE_BASIC_AUTH_PASSWORD=xxxxxx
CONCOURSE_EXTERNAL_URL=https://ci.xxxx.com

CONCOURSE_MAIN_TEAM_LOCAL_USER=admin
CONCOURSE_ADD_LOCAL_USER=admin:xxxxxx
CONCOURSE_MAIN_TEAM_ALLOW_ALL_USERS=true

now when i try and login, it just sits there spinning. if i look in the inspector i can see it pending on

https://ci.xxxx.com/sky/callback?code=btkstm4h47f23yugenht5v5s2&state=eyJyZWRpcmVjdF91cmkiOiIvIiwiZW50cm9weSI6IjQzNWY1N2NiZDNiMWZmYTMzNGZmNGUxYmRhOThmYjMxNzUxNTg4YThhYzFkNDQ2N2QxMGJkMmYyMzkyOTg1MzIifQ%3D%3D

it will finally time out after a while.

in the logs i see

Aug  4 00:21:36 ip-10-200-1-205 concourse[4491]: {"timestamp":"1533342096.544905424","source":"atc","message":"atc.dex.event","log_level":1,"data":{"fields":{},"message":"login successful: connector \"local\", name=\"\", email=\"admin\", groups=[]","session":"5"}}

if i put in the wrong login info it instantly comes back and says bad login/password.
when it does work it seems to just timeout.

after it times out i see this in the log:

Aug 4 00:27:10 ip-10-200-1-205 concourse[4491]: {"timestamp":"1533342430.388325930","source":"atc","message":"atc.sky.callback.failed-to-fetch-dex-token","log_level":2,"data":{"error":"Post https://ci.wizr.com/sky/issuer/token: dial tcp 34.210.127.211:443: i/o timeout","session":"4.277"}}

@phynias
Copy link
Author

phynias commented Aug 4, 2018

i also noticed this happens a lot in the logs
Aug 4 00:40:31 ip-10-200-1-205 concourse[8986]: {"timestamp":"1533343231.696431637","source":"atc","message":"atc.sky.userinfo.failed-to-parse-authorization-header","log_level":1,"data":{"session":"4.112"}}

@UniqueElphie
Copy link

Yes, it is also happening for me! The login is failing most of the time due to the above error. The login randomly succeeds :(
If anyone has solved the issue, please let me know, very appreciated!! :)

@r-chris
Copy link

r-chris commented Aug 6, 2018

I'm seeing the same behaviour. Further, when trying to log-in with Safari (v11.1.2) clicking on Login actually downloads an empty file called callback.dms.

@skibum55
Copy link

skibum55 commented Aug 6, 2018

This happened to us also. FYI, we upgraded from 3.14. There was one user in Main and another team with a single local user. Xenial 97.3 azure stemcell.

@phynias
Copy link
Author

phynias commented Aug 6, 2018

to add on i've also setup GitHub authentication. which also seems to work. i login to github and it does the same timeout waiting on the /sky/ redirect.

@skibum55
Copy link

skibum55 commented Aug 6, 2018

Our DNS was not working correctly. We updated the resolv.conf to use 8.8.8.8 and were able to login.
76e6d0a4-6bd1-4c92-aa9c-c0282fbccc3d

@phynias
Copy link
Author

phynias commented Aug 6, 2018

@skibum55 nice! unfortunately dns is fine on my setup.

@phynias
Copy link
Author

phynias commented Aug 6, 2018

so after seeing your dns fix i wanted to see why my Post was failing. i figured out that i wasn't allow my ec2 instance the ability to talk to itself. i fixed that, and now when i login instead of it taking a while to timeout. it instantly dies on the same call:

https://ci.xxxx.com/sky/callback?code=jggh7gmybeftku4767tb3kmse&state=eyJyZWRpcmVjdF91cmkiOiIvIiwiZW50cm9weSI6ImMxNmZmMzI3YTQ0Nzg0NzZiYWE1YmFkOTliZjM2MmE0NDE5YjM5MGYyOGZlMTYzMjRhM2VhYTFkM2VmZDMxOTQifQ%3D%3D

and this is in the logs:

Aug  6 18:27:20 ip-10-200-1-205 concourse[9694]: {"timestamp":"1533580040.156766415","source":"atc","message":"atc.dex.event","log_level":1,"data":{"fields":{},"message":"login successful: connector \"local\", name=\"\", email=\"admin\", groups=[]","session":"5"}}
Aug  6 18:27:20 ip-10-200-1-205 concourse[9694]: {"timestamp":"1533580040.246052265","source":"atc","message":"atc.sky.callback.failed-to-fetch-cookie-state","log_level":2,"data":{"error":"http: named cookie not present","session":"4.46850"}}
Aug  6 18:27:21 ip-10-200-1-205 concourse[9694]: {"timestamp":"1533580041.411123037","source":"atc","message":"atc.sky.userinfo.failed-to-parse-authorization-header","log_level":1,"data":{"session":"4.46851"}}

@phynias
Copy link
Author

phynias commented Aug 6, 2018

ok so a little progress. it seems like my nginx is causing this. when i bypass it, it now works and logs me in. it seems like nginx may be doing something to the cookies? i thought having
CONCOURSE_COOKIE_SECURE=false
set would fix it, but it does not.

@phynias
Copy link
Author

phynias commented Aug 6, 2018

ok i removed nginx as my lb and put in a alb and everything works fine. not sure but something breaks on the newest concourse with the same nginx config i have been using.

@r-chris
Copy link

r-chris commented Aug 7, 2018

Ok, generally sounds like this is a resolve issue. I'm currently testing the docker quickstart example (on an Ubuntu 16.04 server): https://github.com/concourse/concourse-docker/blob/master/docker-compose-quickstart.yml

I could imaging the docker networking setup to cause similar resolve issues here, but haven't been able to find a config that would fix the login. There is no nginx involved and since we are running on a local network server with the server IP configured as the CONCOURSE_EXTERNAL_URL I don't see a DNS issue to be able to cause this either.

@sanjay2916
Copy link

sanjay2916 commented Aug 7, 2018

@phynias @UniqueElphie give the concourse server ip with port or dns name in below parameter

eg:

  • name: CONCOURSE_EXTERNAL_URL
    value: http:/ap-south-1.compute.amazonaws.com:38686/

In latest version of concourse by default it will redirecting to localhost:8080 if you do not specify CONCOURSE_EXTERNAL_URL.

use below parameters in latest version instead of basic authentication parameters and password should be in bcrypted format:

  • name: CONCOURSE_ADD_LOCAL_USER
    value: test:$2y$10$yE9dQn0P1KvpynigktO5neqvb/dZQricFZH1d1PhwmGfUMVJoac0y
  • name: CONCOURSE_MAIN_TEAM_LOCAL_USER
    value: test

for reference: #2421

@r-chris
Copy link

r-chris commented Aug 7, 2018

Ok, so I think I figured out what is going on:

  1. We are using docker to change the port which is exposed by the running container:
    ports: ["9090:8080"]

  2. We are then defining CONCOURSE_EXTERNAL_URL to point to a local IP with that port: http://192.178.111.111:9090, but inside the docker container concourse is still running on port 8080 and doesn't actually know about the changed port.

Conclusion: Do NOT change the port without telling concourse about it! This did work until version 4 came along, but I can understand how the authentication is getting confused when it expects to run on a different port.

Workaround: either do not change the port with docker or tell concourse about the correct port to startup with. I'm going to assume CONCOURSE_BIND_PORT would be the appropriate way to set the correct startup port and there is also a CONCOURSE_BIND_IP, which I will try to use.

@r-chris
Copy link

r-chris commented Aug 7, 2018

Setting CONCOURSE_BIND_PORT=9090 and ports: ["9090:9090"] does bring up the concourse front page, but login is still broken and I can see a lot of log entries still using the default 8080 port:

Post http://127.0.0.1:8080/api/v1/workers?ttl=1m0s: dial tcp 127.0.0.1:8080: connect: connection refused

[UPDATE]

This fixed it: To get the different port working all of these variables have to be set: ports: ["9090:9090"]
CONCOURSE_EXTERNAL_URL=http://192.168.111.111:9090
CONCOURSE_BIND_PORT=9090

@vito
Copy link
Member

vito commented Aug 22, 2018

Woops, we really shouldn't be relying on the external URL internally.

I think the problem is just this one line:

https://github.com/concourse/atc/blob/8819682a06f74aa284de084b14955f42919a64cb/atccmd/command.go#L518

If we were to change that to the bind IP/port (except 127.0.0.1 if it binds to 0.0.0.0), this should be fixed.

I'll prioritize this highly somewhere. Sorry for the turbulence everyone! @r-chris I'm also gonna prioritize #2519 which should make changing the ports work a lot more smoothly.

@vito vito added this to Icebox in Operations via automation Aug 22, 2018
@vito vito added the bug label Aug 22, 2018
@vito vito moved this from Icebox to Backlog in Operations Aug 22, 2018
@xtremerui xtremerui moved this from Backlog to In Flight in Operations Aug 23, 2018
@xtremerui xtremerui self-assigned this Aug 23, 2018
@jamieklassen jamieklassen moved this from In Flight to Done in Operations Aug 27, 2018
@eedwards-sk
Copy link

eedwards-sk commented Aug 27, 2018

Does anyone know if this problem applies to the TSA port as well?

Currently for a CI solution which stands up a local concourse stack, I have to launch a concourse stack on non-standard ports (so it won't conflict with the existing stack).

Thus I have the web node docker image binding tsa to 2222 internally and web to 8080 as normal, but externally for the 'test stack' they get mapped on the host to 2323 and 8181, respectively.

@vito vito moved this from Done to Backlog in Operations Aug 31, 2018
@vito
Copy link
Member

vito commented Aug 31, 2018

@pivotal-jamie-klassen Is this actually done? I see the code still using the external URL here and don't see any commits pushed.

Moving back to the backlog. I thought this was in 4.1.0 but hadn't done acceptance on it yet so I'm not sure. We may need to do another release soon to fix this. :/

@eedwards-sk I don't think this issue applies to the TSA.

@UniqueElphie
Copy link

Hey @vito I just setup v4.1, but when I try to login as local user or github user, the login fails due to the same error. Is this issue fixed and verified?
The relevant logging when I try to login as local user:
{"timestamp":"1536016247.228306532","source":"atc","message":"atc.dex.event","log_level":1,"data":{"fields":{},"message":"login successful: connector \"local\", username=\"concourse\", email=\"concourse\", groups=[]","session":"5"}} {"timestamp":"1536016249.150294065","source":"atc","message":"atc.sky.userinfo.failed-to-parse-authorization-header","log_level":1,"data":{"session":"4.104"}} {"timestamp":"1536016254.152165890","source":"atc","message":"atc.sky.userinfo.failed-to-parse-authorization-header","log_level":1,"data":{"session":"4.105"}} {"timestamp":"1536016255.191825628","source":"atc","message":"atc.build-tracker.track.start","log_level":0,"data":{"session":"19.48"}} {"timestamp":"1536016255.195375443","source":"atc","message":"atc.build-tracker.track.done","log_level":0,"data":{"session":"19.48"}} {"timestamp":"1536016257.772825003","source":"atc","message":"atc.sky.callback.failed-to-fetch-dex-token","log_level":2,"data":{"error":"Post https://externalurl/sky/issuer/token: net/http: TLS handshake timeout","session":"4.103"}} {"timestamp":"1536016259.153844595","source":"atc","message":"atc.sky.userinfo.failed-to-parse-authorization-header","log_level":1,"data":{"session":"4.106"}}

@vito
Copy link
Member

vito commented Sep 4, 2018

@UniqueElphie Correct, this ended up not making it in to 4.1. We've picked it up again and plan to push a 4.1.1 out this week.

topherbullock pushed a commit that referenced this issue Sep 4, 2018
#2463

Submodule src/github.com/concourse/atc 084a5efe2..b079f507c:
  > use bind ip for sky server config
  > clean up unused code in worker and GC
Submodule src/github.com/concourse/flag b093f65..06be967ec:
  > allow setting writer sink

Signed-off-by: Topher Bullock <cbullock@pivotal.io>
@enugentdt
Copy link

@vito Actually, this very issue was one of the reasons I wanted to completely disable authentication, as I'm already using an auth solution. Most of the time, however, I think that the issues stem from not being able to access concourse directly, but being able to access the reverse proxy (so the 301 messes up the auth flow).

If there is anything else I can answer, heck, if you want to ssh into my concourse deployment and reverse proxy to see the setup, I am more than happy to help.

@vito
Copy link
Member

vito commented Sep 10, 2018

@enugentdt Hmm, I say this having no experience with Pritunl Zero/BeyondCorp myself, but I wonder if you'd be able to instead configure Concourse to use it as either an OIDC or oAuth2 provider? It seems a shame to forego all of the Concourse auth flow, as you might end up missing out on things like fine-grained access control (#1317) in the future. Is the reverse-proxy aspect a hard requirement from your organisation?

@enugentdt
Copy link

@vito Unfortunately, Pritunl Zero itself does not support custom OAuth providers, only SAML, and even then, it's at a price point which is unreachable for most users. I'd agree that it's a shame to lose on Concourse ABAC, though. Admittedly, the lack of OAuth drives me up a wall, and one day, I plan to PR (or fork) it.

Running this reverse proxy allows us to have an extremely resilient architecture, which might not be as easy with Concourse being exposed publicly. Zero does the usual "check if it's up" stuff and directs as it pleases.

The other side is IP addresses - using a reverse proxy lets us have less public IPs, and have services behind a centrally-secured and -audited endpoint. Not to mention the general load-balance ability of reverse proxies (CloudFlare or custom solutions).

Would Concourse be able to function as an oAuth2 provider, while still obtaining users from Google? If so, I could see that being very useful for reverse proxies that are pass-through only.

I'm going to do some testing, and see how non-authenticating proxies work with regards to Concourse, and try to pinpoint anything that would be useful with that.

If it would help, I'm @space55 on the Discord (multiple GitHub accounts, yay!), so I can chat there if it works better than on this issue.

ddadlani pushed a commit that referenced this issue Sep 11, 2018
#2463

Submodule src/github.com/concourse/atc 3379392b9..fd2a08bd0:
  > Remove internal URL from skyserver config
Submodule src/github.com/concourse/skymarshal b72c6d513..fe0656e2b:
  > Remove internal URL from skyserver config

Signed-off-by: Josh Winters <jwinters@pivotal.io>
@ddadlani
Copy link
Contributor

We investigated doing an internal redirect for all of the auth components, but that doesn't work because of the way Dex is designed. To perform an internal redirect, the issuer URL in Dex needs to be set to the internal URL. This breaks for the following reasons:

  • Dex uses the issuer URL to issue tokens. Setting this URL to the internal URL (e.g. 127.0.0.1) would mean that the tokens issued are valid for any instance with that internal URL, which is too generic.

  • Dex also checks that the callback URL used by the external auth provider matches the issuer URL. If the issuer URL is the internal URL, this check would fail for external auth providers.

@siennathesane
Copy link

To double tap on this, I am using Concourse behind a reverse proxy that doesn't use authentication of any form, it just performs a TLS/SSL termination.

I'm still seeing errors:

{"timestamp":"1536690818.222700834","source":"atc","message":"atc.sky.callback.failed-to-fetch-dex-token","log_level":2,"data":{"error":"Post https://externalUrl/sky/issuer/token: dial tcp publicIP:443: i/o timeout","session":"4.4"}}

Reading @vito's comments about this not making it into 4.1.1 is a bit disheartening, but I wanted to validate some assumptions around the external URLs so there was more visibility into the issue. For me, externalUrl is resolvable both within and external to my network, and publicIP is reachable by everyone.

Here are my deployed versions:

releases:
- name: concourse
  sha1: 513e3a88d135e6e2cd8a974702e2e63caa0cb82b
  url: https://bosh.io/d/github.com/concourse/concourse?v=4.1.0
  version: 4.1.0
- name: garden-runc
  sha1: 2a7c813e7e4d862e19334addf022916fb6b91eb0
  url: https://bosh.io/d/github.com/cloudfoundry/garden-runc-release?v=1.16.3
  version: 1.16.3
- name: postgres
  sha1: 24d2e2887a45258b71bc40577c0f406180e47701
  url: https://bosh.io/d/github.com/cloudfoundry/postgres-release?v=29
  version: "29"

ddadlani pushed a commit that referenced this issue Sep 13, 2018
#2463

Submodule src/github.com/concourse/atc 228d6457..40107fc6:
  > Use internal url for token requests to dex
Submodule src/github.com/concourse/skymarshal fe0656e2b..cb41319bc:
  > Use internal url for token requests to dex

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
@troykinsella
Copy link
Contributor

troykinsella commented Sep 13, 2018

Having scanned this ticket a dozen times and not being able to extract a smoking gun, I wonder if anyone could elaborate on the circumstances by which this error occurs. Specifically, the atc.sky.userinfo.failed-to-parse-authorization-header one. I'm getting this error on AWS with the ATC behind an ELB, as well as in a Vagrant environment with nothing in front. Sparing you an enumeration of things I've tried, which are copious, I feel like I've tried "everything". Any tips would be appreciated. I can provide more detail on my setup if needed.

@jwntrs
Copy link
Contributor

jwntrs commented Sep 13, 2018

The main problem here is that when the ATC can't reach the external_url then login doesn't work. This happens because after the authorization_code gets issued the token exchange happens using the external_url even though both components are running on the same host. We're going to use the loopback address for the token exchange. This fix is going through our pipeline now.

ddadlani pushed a commit that referenced this issue Sep 13, 2018
#2463

Submodule src/github.com/concourse/skymarshal cb41319b..5b92fd83:
  > Always use http for loopback communication because certs

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
@troykinsella
Copy link
Contributor

@pivotal-jwinters Thanks for that. So I can prove that my ATC box has access back to its own web service (80) through the external_url, but that error persists. Are there any other alignments with other options to consider?

@jwntrs
Copy link
Contributor

jwntrs commented Sep 14, 2018

@troykinsella that log message just means that an unauthenticated user is using the web interface. It doesn't mean theres a problem. It was a little spammy/misleading so we removed it for the next release.

Are you having problems logging into your concourse installation?

vito added a commit to vmware-archive/atc that referenced this issue Sep 14, 2018
we can't rely on being able to reach the external URL from the ATC for a
few reasons:

* it might be pointing to a reverse-proxy with its own auth
* it might be pointing to a reverse-proxy with SSL termination and a
  cert not trusted by the `web` node
* it might be literally unreachable from the ATC because of
  firewall/security groups/etc

so, inject a client that just re-routes requests from the external URL
to the internal URL.

concourse/concourse#2463
vito added a commit that referenced this issue Sep 14, 2018
#2463

Submodule src/github.com/concourse/atc 40107fc6a..374d3ce91:
  > reroute login flow dex traffic to internal url
Submodule src/github.com/concourse/skymarshal 5b92fd837..30842c3e1:
  > Revert "Use internal url for token requests to dex"
  > Revert "Always use http for loopback communication because certs"
@antonu17
Copy link
Contributor

antonu17 commented Sep 14, 2018

Hi all,
After 4.1 was released, we deployed a new concourse cluster in Kube. It has internal authentication disabled and uses LDAP.

CONCOURSE_EXTERNAL_URL=https://ci.internal.io is pointing to ingress (nginx) host for SSL and LB. Outbound is allowed, we are able to curl localhost:8080 and curl https://ci.internal.io from inside concourse-web container.

Auth doesn't work when CONCOURSE_EXTERNAL_URL is set: ingress returns 502, logs "atc.sky.userinfo.failed-to-parse-authorization-header". But if we unset CONCOURSE_EXTERNAL_URL, so it defaults to '127.0.0.1:8080', auth works. (checked via kubectl port-forward <concourse-web-pod-id> 8080:8080)

UPDATE:
Breaking news!
I was wondering why LDAP auth was working yesterday when I checked it first. Given I didn't make any changes to concourse-web container config, and it just stopped working auth, the only place that was the same is our database. So I created a brand new database and restarted the pod and now authentication works again!

MORE UPDATE:
So, with empty DB everything worked fine, until I started adding teams. I added ~40 teams and checked if I can login. I couldn't. The same 502 error. But my fly was still authenticated, so I removed 20 teams and checked if I can login again. I cloud... 😕
I repeated test several times, and looks like when there are many teams added, login doesn't work.
@vito ^^^

FINAL UPDATE:
So in our case problem was caused by nginx ingress controller. That's what is showed in logs:

[error] 7910#7910: *10923710 upstream sent too big header while reading response header from upstream

We've fixed this by configuring proxy buffering in our ingress like this:

ingress:
    enabled: true
    annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/server-alias: "ci.internal.io"
    nginx.ingress.kubernetes.io/proxy-body-size: 8m
    nginx.ingress.kubernetes.io/proxy-buffering: "on"
    nginx.ingress.kubernetes.io/proxy-buffer-size: "32k"

External Url and Amount of teams wasn't our problem. Sorry for the noise. 😊

@troykinsella
Copy link
Contributor

troykinsella commented Sep 14, 2018

@pivotal-jwinters Yes, the most basic auth setup I can think of fails all login attempts:

--add-local-user admin:bcryped-admin
--main-team-local-user admin

Tried a bcrypted password and a plain password. I'm accessing the ATC directly through an ssh tunnel over plain http.

@vito vito added accepted and removed rejected labels Sep 17, 2018
@vito vito closed this as completed Sep 17, 2018
@vito vito added this to the v4.2.0 milestone Sep 17, 2018
@troykinsella
Copy link
Contributor

For posterity: 4.2.0 fixes my login issue.

@UniqueElphie
Copy link

Just setup v4.2.1 and login via github/local is working perfectly now!!
Thank you so much for sorting the issue out!!! 💯

@enugentdt
Copy link

Hooray! Works for me too! Thanks guys!

@siennathesane
Copy link

I deployed 4.2.1 today and it works for me! Thanks @vito and team.

@jeffawang
Copy link

jeffawang commented Oct 20, 2018

Hey there, I keep getting redirected to 127.0.0.1 right when I click the login button with a fresh install of concourse 4.2.1.

Could you please clarify if I need to specify an external URL for this not to happen? While some users seem to be happy, they haven't specified their setup, and I'm still having problems.

Here's my situation:

  • I'm currently trying to use local user auth (--add-local-user and --main-team-local-user).
  • My users are accessing concourse through multiple domain names, so I haven't set the external url.
  • Some or all of those domain names are unresolvable from within the concourse web container.
  • This was working before I tried to upgrade from 3.14.1.

If in v4 I must specify a single external URL, I'll have to stay on 3.14.1.

@MatthiasScholz
Copy link

I can confirm. Just ran into the same issue.
Click on the login button redirects to 127.0.0.1.

Hey there, I keep getting redirected to 127.0.0.1 right when I click the login button with a fresh install of concourse 4.2.1.

Could you please clarify if I need to specify an external URL for this not to happen? While some users seem to be happy, they haven't specified their setup, and I'm still having problems.

Here's my situation:

  • I'm currently trying to use local user auth (--add-local-user and --main-team-local-user).
  • My users are accessing concourse through multiple domain names, so I haven't set the external url.
  • Some or all of those domain names are unresolvable from within the concourse web container.
  • This was working before I tried to upgrade from 3.14.1.

If in v4 I must specify a single external URL, I'll have to stay on 3.14.1.

@vito
Copy link
Member

vito commented Oct 22, 2018

Please open new issues instead of commenting on this one - there was a very specific thing to fix, and we fixed it, and received feedback from those affected by this issue that it was indeed fixed. The title of this GitHub issue is very open-ended and we'll keep getting comments on here as long as anyone is having login issues. :) (We should maybe lock it eventually.)

@jeffawang An external URL must be configured for auth to work properly and securely. If you have a use case for multiple external URLs, we'd like to hear about it (but again, in a separate issue), but it at least sounds a bit strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests