Login attempt fails sometimes after 1.5.3 upgrade #3547

eroji · 2020-05-05T17:07:28Z

After upgrading to 1.5.3 I'm getting these errors on the first attempt to authenticate via /api/v1/session route after some idle period. It would eventually work if I retry a second or few more times. I did not configure any of the rate limiting and the credentials used to hit the route is the admin user. Not quite sure why this is happening.

time="2020-05-05T16:58:46Z" level=error msg="finished unary call with code Unknown" error="failed to enforce max concurrent logins limit: EOF" grpc.code=Unknown grpc.method=Create grpc.service=session.SessionService grpc.start_time="2020-05-05T16:58:46Z" grpc.time_ms=0.664 span.kind=server system=grpc

The text was updated successfully, but these errors were encountered:

jannfis · 2020-05-05T17:14:42Z

Hi @eroji, can you share a little more details about your environment please? This error suggest that the Redis cache is not available (although seems to be intermittent according to your error description).

Interesting to know would be:

Have you installed ArgoCD in HA setup?
How did you upgrade, and from where (which version)?

eroji · 2020-05-05T17:29:21Z

My apologies. I'm using the HA install. Only modification I added was --insecure flag for argocd-server. I upgraded from 1.5.1.

alexmt · 2020-05-05T17:32:46Z

I've encountered the same issue during upgrade. The solution was to "restart" both Redis statefulset and redis HA proxy. @eroji , can you give it a try please?

I've seen the same issue with 1.5.1.

eroji · 2020-05-05T17:34:54Z

Trying it now.

eroji · 2020-05-05T17:37:16Z

It seems to be working? I'll check throughout the day to see if I hit this error again and report back.

alexmt · 2020-05-05T17:40:34Z

We should look for/file upstream bug in Redis HA helm chart. Looks like it is not happening often. In my case, it happened for 1 out of ~40 argocd instances.

eroji · 2020-05-06T07:56:38Z

Looks like it's still happening. I see that 1.5.4 has been released. I will try upgrading to that to see if it helps.

time="2020-05-06T07:54:14Z" level=error msg="finished unary call with code Unknown" error="failed to enforce max concurrent logins limit: EOF" grpc.code=Unknown grpc.method=Create grpc.service=session.SessionService grpc.start_time="2020-05-06T07:54:14Z" grpc.time_ms=0.886 span.kind=server system=grpc

alexmt · 2020-05-06T18:37:24Z

Hello @eroji , 1.5.4 does not include redis related changes. I don't think it will help.

As a quick workaround you might disable concurrent login limit feature: set env ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT=0 in argocd-server deployment.

Going to enable retries in redis client and test it on local deployments.

eroji · 2020-05-07T04:05:39Z

@alexmt not sure why but it seems like upgrading to 1.5.4 resolved the issue. I didn't have to add the env var at all...

alexmt · 2020-05-12T20:13:39Z

Created ticket in redis-ha chart repository: DandyDeveloper/charts#26

alexmt · 2020-05-13T17:51:55Z

PR that introduces redis retries during login flow is merged: #3575

alexmt · 2020-05-13T17:59:08Z

Adding big WARNING to 1.4 -> 1.5 upgrade instructions about possible redis issue as well: #3584. Probably this is as much as we can do:

Intermittent login issues fix: fix: enable redis retries; add redis request duration metric #3575 (cherry-picked into 1.5)
Document possible downtime during upgrade: fix: enable redis retries; add redis request duration metric #3575
Confirm with Redis HA chart maintainers that issue might happen only during upgrade. [charts/redis-ha][BUG] EOF errors after upgrading to v4.5.3 DandyDeveloper/charts#26 (comment)

Once all three are done I think ticket can be closed. Does it look reasonable to you @jannfis , @jessesuen ?

alexmt · 2020-05-18T17:29:31Z

v1.5.5 with the redis retries had been released. Please give it try. Closing ticket until we hear again about redis issues.

asvasyanin · 2020-05-26T12:38:19Z

v1.5.5 with the redis retries had been released. Please give it try. Closing ticket until we hear again about redis issues.

still have this issue in 1.5.5, like @eroji only modification I have is --insecure flag

samhuss · 2020-05-29T17:44:16Z

Same issue with v1.5.5, works only when setting the env variable to argocd-server:
ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT=0

In logs I'm getting this after many time outs, thought this might help

5/29/2020 7:13:54 PM 2020/05/29 17:13:54 cache: Get key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
5/29/2020 7:13:54 PM time="2020-05-29T17:13:54Z" level=error msg="Could not retrieve login attempts: dial tcp: i/o timeout"
5/29/2020 7:14:14 PM 2020/05/29 17:14:14 cache: Get key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
5/29/2020 7:14:14 PM time="2020-05-29T17:14:14Z" level=error msg="Could not retrieve login attempts: dial tcp: i/o timeout"
5/29/2020 7:14:34 PM 2020/05/29 17:14:34 cache: Set key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
5/29/2020 7:14:34 PM time="2020-05-29T17:14:34Z" level=error msg="Could not update login attempts: dial tcp: i/o timeout"
5/29/2020 7:14:34 PM time="2020-05-29T17:14:34Z" level=info msg="Issuing claims: { 0 1590772474 argocd 1590772474 admin}"
5/29/2020 7:14:34 PM time="2020-05-29T17:14:34Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Create grpc.service=session.SessionService grpc.start_time="2020-05-29T17:13:34Z" grpc.time_ms=60206.44 span.kind=server system=grpc
5/29/2020 7:14:35 PM time="2020-05-29T17:14:35Z" level=info msg="received unary call /session.SessionService/GetUserInfo" grpc.method=GetUserInfo grpc.request.claims="{\"iat\":1590772474,\"iss\":\"argocd\",\"nbf\":1590772474,\"sub\":\"admin\"}" grpc.request.content= grpc.service=session.SessionService grpc.start_time="2020-05-29T17:14:35Z" span.kind=server system=grpc
5/29/2020 7:14:35 PM time="2020-05-29T17:14:35Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetUserInfo grpc.service=session.SessionService grpc.start_time="2020-05-29T17:14:35Z" grpc.time_ms=0.456 span.kind=server system=grpc
5/29/2020 7:14:35 PM time="2020-05-29T17:14:35Z" level=info msg="received unary call /cluster.ClusterService/List" grpc.method=List grpc.request.claims="{\"iat\":1590772474,\"iss\":\"argocd\",\"nbf\":1590772474,\"sub\":\"admin\"}" grpc.request.content= grpc.service=cluster.ClusterService grpc.start_time="2020-05-29T17:14:35Z" span.kind=server system=grpc

onelapahead · 2020-06-03T13:14:17Z

Does this affect users signing in via an IDP such as Okta?

ajayr5 · 2020-06-22T11:07:25Z

adding this to my argocd-server deployment resolved the issue

env:
  - name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
    value: "0"

jannfis · 2020-06-24T17:58:11Z

Often, when log entries like these

5/29/2020 7:14:34 PM 2020/05/29 17:14:34 cache: Set key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout

can be observed, there is a problem with either in-cluster DNS resolution or otherwise interconnectivity issues within the cluster or the redis pod is not running at all.

ajayr5 · 2020-06-25T14:41:48Z

Often, when log entries like these
5/29/2020 7:14:34 PM 2020/05/29 17:14:34 cache: Set key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
can be observed, there is a problem with either in-cluster DNS resolution or otherwise interconnectivity issues within the cluster or the redis pod is not running at all.

I get this issue only when creating cluster on bare-metal azure vm. Works perfectly fine with cluster on ec2 instance.
Now I'm getting error while adding git repo
rpc error: code = Unknown desc = Get "https://gitlab.com/xxxxx/xxxxxxx.git/info/refs?service=git-upload-pack": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

creet0007 · 2020-06-30T17:26:02Z

env:

name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
value: "0

where exactly to add these values? can u show me the screenshot for this?

ajayr5 · 2020-07-01T07:18:47Z

env:

name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
value: "0

where exactly to add these values? can u show me the screenshot for this?

Add this in argocd-server Deployment in the install.yaml. You can try adding it at https://github.com/argoproj/argo-cd/blob/master/manifests/install.yaml#L2646

creet0007 · 2020-07-02T10:39:22Z

env:

name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
value: "0

where exactly to add these values? can u show me the screenshot for this?

Add this in argocd-server Deployment in the install.yaml. You can try adding it at https://github.com/argoproj/argo-cd/blob/master/manifests/install.yaml#L2646

I got this:

error: error validating "install.yaml": error validating data: ValidationError(Deployment.spec.template.spec.containers[0]): unknown field "-env" in io.k8s.api.core.v1.Container; if you choose to ignore these errors, turn validation off with --validate=false

creet0007 · 2020-07-02T11:14:15Z

Now worked. Thanks a lot :)

rachelwang20 · 2020-08-07T20:24:47Z

Made the change in - #4049

eroji added the bug Something isn't working label May 5, 2020

jannfis added the bug/in-triage This issue needs further triage to be correctly classified label May 5, 2020

alexmt mentioned this issue May 12, 2020

fix: enable redis retries; add redis request duration metric #3575

Merged

alexmt closed this as completed in #3575 May 12, 2020

alexmt reopened this May 12, 2020

alexmt mentioned this issue May 13, 2020

Upgrade to 1.5.4 from 1.4.3 is failing #3572

Closed

alexmt added bug/severity:criticial A critical bug in ArgoCD, possibly resulting in data loss or severe degraded overall functionality bug/priority:high Should be fixed in the next patch release and removed bug/in-triage This issue needs further triage to be correctly classified labels May 13, 2020

jannfis added the component:api API bugs and enhancements label May 14, 2020

alexmt closed this as completed May 18, 2020

dudadornelles mentioned this issue May 27, 2020

StatefulSet with OnDelete Update Strategy stuck progressing #1881

Closed

alexmt reopened this May 29, 2020

alexmt added this to the v1.6 GitOps Engine milestone May 29, 2020

jameshochadel mentioned this issue Jun 8, 2020

Admin login in 1.5.6 works in Chrome, but not Safari #3727

Closed

3 tasks

alexmt modified the milestones: v1.6 GitOps Engine, v1.7 Jun 17, 2020

WaldoFR mentioned this issue Jun 29, 2020

App synch fails with ComparisonError rpc error: code = DeadlineExceeded desc = context deadline exceeded #3864

Open

3 tasks

rachelwang20 self-assigned this Aug 3, 2020

rachelwang20 closed this as completed Aug 7, 2020

rachelwang20 mentioned this issue Aug 7, 2020

feat: Switch to use weighted semaphore instead of Redis (Distributed lock) #4049

Merged

jessesuen mentioned this issue Aug 24, 2020

failed to enforce max concurrent logins limit #4148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Login attempt fails sometimes after 1.5.3 upgrade #3547

Login attempt fails sometimes after 1.5.3 upgrade #3547

eroji commented May 5, 2020

jannfis commented May 5, 2020

eroji commented May 5, 2020

alexmt commented May 5, 2020

eroji commented May 5, 2020

eroji commented May 5, 2020

alexmt commented May 5, 2020

eroji commented May 6, 2020

alexmt commented May 6, 2020

eroji commented May 7, 2020

alexmt commented May 12, 2020

alexmt commented May 13, 2020

alexmt commented May 13, 2020 •

edited

Loading

alexmt commented May 18, 2020

asvasyanin commented May 26, 2020

samhuss commented May 29, 2020

onelapahead commented Jun 3, 2020

ajayr5 commented Jun 22, 2020

jannfis commented Jun 24, 2020

ajayr5 commented Jun 25, 2020

creet0007 commented Jun 30, 2020 •

edited

Loading

ajayr5 commented Jul 1, 2020 •

edited

Loading

creet0007 commented Jul 2, 2020

creet0007 commented Jul 2, 2020

rachelwang20 commented Aug 7, 2020 •

edited

Loading

Login attempt fails sometimes after 1.5.3 upgrade #3547

Login attempt fails sometimes after 1.5.3 upgrade #3547

Comments

eroji commented May 5, 2020

jannfis commented May 5, 2020

eroji commented May 5, 2020

alexmt commented May 5, 2020

eroji commented May 5, 2020

eroji commented May 5, 2020

alexmt commented May 5, 2020

eroji commented May 6, 2020

alexmt commented May 6, 2020

eroji commented May 7, 2020

alexmt commented May 12, 2020

alexmt commented May 13, 2020

alexmt commented May 13, 2020 • edited Loading

alexmt commented May 18, 2020

asvasyanin commented May 26, 2020

samhuss commented May 29, 2020

onelapahead commented Jun 3, 2020

ajayr5 commented Jun 22, 2020

jannfis commented Jun 24, 2020

ajayr5 commented Jun 25, 2020

creet0007 commented Jun 30, 2020 • edited Loading

ajayr5 commented Jul 1, 2020 • edited Loading

creet0007 commented Jul 2, 2020

creet0007 commented Jul 2, 2020

rachelwang20 commented Aug 7, 2020 • edited Loading

alexmt commented May 13, 2020 •

edited

Loading

creet0007 commented Jun 30, 2020 •

edited

Loading

ajayr5 commented Jul 1, 2020 •

edited

Loading

rachelwang20 commented Aug 7, 2020 •

edited

Loading