fix: Add initial implementation for rate limiting failed logins #3404

jannfis · 2020-04-13T16:22:50Z

PR Summary

This PR implements rate limiting of failed login attempts in ArgoCD and thereby
addresses CVE-2020-8827.

Implementation details

The rate limiter uses the Redis cache of ArgoCD to store information about
failed login attempts for non-SSO users, i.e. targets SessionManager's
VerifyUsernamePassword() method.

For this purpose, we introduce the application cache to SessionManager and
make it a mandatory component.

The cache key used is session|login.attempts, and it stores a data structure
containing information about failed login attempts. When the password for any
given account has been consecutively entered wrong for a configurable number of
times (failures) within a configurable timeframe (failure window), the
validation of the following logon attempts is delayed by a configurable number
of seconds (delay). The delay is removed after a successful login successful
or when the last login attempt has been made outside the failure window.

As to not reintroduce account name guessing, the limiter makes no difference
between existing and non-existing accounts. To prevent possible DoS attacks
on memory consumption of the cache by unauthenticated users, a maximum number
of entries in the cache is enforced. When this limit is reached, a random
entry in the cache will be deleted before a new entry is added.

Design decision 1: Focus on accounts, not origins

The algorithm doesn't take the origin of the logon failure (i.e. remote IP
address) into account. The logon failures are counted per account, regardless
of where the logon attempts originates from. For one, we just cannot
reliably determine the origin of the logon request, because we might be
behind a complex reverse-proxy structure (i.e. load balancers, Ingress, etc.).
Second, an attacker itself might hide behind a huge number of different source
addresses i.e. by utilizing anonymous HTTP proxy networks, a botnet etc. On the
other hand, legitimate users might all be connecting via i.e. a corporate HTTP
proxy server, thus sharing the same source IP. Extracting the real user's source
address reliably and in a correct way in all possible user setups might be next
to impossible.

Design decision 2: Delay attempts, but do not lock accounts

As a measure to prevent locking out legitimate users without reliably knowing
their origin, it was chosen to perform a (configurable) delay before actually
validating a username/password combination and letting the requester know the
result. As described above, this delay kicks in after a number of failures
within given failure window. This means, that if a delay is active, also
legitimate users will be affected by it - if the legitimate user enters their
valid credentials, they have to wait for delay time before the system will
actually log them in.

Design decision 3: Do not exponentially increase delay

To prevent an attacker from deliberately lock out legitimate users from using
their accounts, we chose not to exponentially increase the delay to a possibly
indefinite time. The goal is to make the logon process slow enough for the
attacker to prevent him from quickly trying different combinations of passwords
while still giving legitimate users a reasonable possibility to use their
account while the account is being (or was) under a brute-force attempt. The
algorithm can be configured to increase the delay successively by a fixed
amount of time, up to a configurable maximum limit.

Configuration and defaults

The feature can be configured by setting the following environment variables
on the argocd-server pod:

ARGOCD_SESSION_MAX_FAIL_COUNT: Maximum number of failed logins before the
delay kicks in. Default: 5.
ARGOCD_SESSION_FAILURE_DELAY_START: Time in seconds the authentication
should be delayed for if the limiter becomes first active. Default: 3
ARGOCD_SESSION_FAILURE_DELAY_INCREASE: Time in seconds the authentication
delay should be increased on consecutive login failures after max fail count
has been reached. Default: 2
ARGOCD_SESSION_FAILURE_DELAY_MAX: Max time in seconds the authentication
delay can be increased to. Default: 30
ARGOCD_SESSION_FAILURE_WINDOW: Number of seconds for the failure window.
Default: 300 (5 minutes). If this is set to 0, the failure window is
disabled and the delay kicks in after 10 consecutive logon failures,
regardless of the time frame they happened.
ARGOCD_SESSION_MAX_CACHE_SIZE: Maximum number of entries allowed in the
cache. Default: 1000

Notes

The UI has not been adapted to indicate the delay. This is left for a later
point in time. It should display a spinner or something along these lines
while waiting for session creation by the server. For this, a new API method
to get the currently applied delay has to be implemented.
This change also introduces a maximum length of 32 for usernames, mainly to
prevent cache pollution and DoS by using insanely long usernames in invalid
requests. This has be documented in the user's documentation.

sonarcloud · 2020-04-13T16:30:52Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
2 Code Smells

93.6% Coverage
0.0% Duplication

codecov · 2020-04-13T16:42:39Z

Codecov Report

Merging #3404 into master will increase coverage by 0.24%.
The diff coverage is 86.13%.

@@            Coverage Diff             @@
##           master    #3404      +/-   ##
==========================================
+ Coverage   42.90%   43.15%   +0.24%     
==========================================
  Files         179      179              
  Lines       19953    20142     +189     
  Branches      272      237      -35     
==========================================
+ Hits         8561     8692     +131     
- Misses      10396    10454      +58     
  Partials      996      996

Impacted Files	Coverage Δ
util/oidc/oidc.go	`7.57% <0.00%> (ø)`
server/cache/cache.go	`75.51% <53.84%> (-9.86%)`	⬇️
util/session/sessionmanager.go	`69.88% <92.30%> (+18.60%)`	⬆️
server/server.go	`57.59% <100.00%> (ø)`
util/cache/appstate/cache.go	`93.10% <100.00%> (ø)`
ui/src/app/applications/components/utils.tsx	`49.83% <0.00%> (ø)`
ui/src/app/shared/services/projects-service.ts	`19.71% <0.00%> (ø)`
server/application/application.go	`28.25% <0.00%> (+0.21%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75d9f23...2a8c32a. Read the comment docs.

…s/rate-limit-failed-logins

jannfis · 2020-04-17T07:12:08Z

I think codecov might be drunk.

util/session/sessionmanager.go

…asier to understand meaning of each variable

…ndom delay

Bugs/rate limit failed logins

fix: Add initial implementation for rate limiting failed logins

1473e0f

jannfis marked this pull request as ready for review April 13, 2020 16:52

jannfis requested review from alexmt and jessesuen April 13, 2020 16:56

Trigger test build

68e490a

jannfis mentioned this pull request Apr 15, 2020

Make it harder to brute force passwords #3423

Closed

jannfis added 2 commits April 16, 2020 19:17

Merge branch 'master' of https://github.com/argoproj/argo-cd into bug…

e94869a

…s/rate-limit-failed-logins

Remove deprecated code and fix new project tests

d8a3a5e

alexmt reviewed Apr 17, 2020

View reviewed changes

util/session/sessionmanager.go Outdated Show resolved Hide resolved

alexmt reviewed Apr 17, 2020

View reviewed changes

util/session/sessionmanager.go Outdated Show resolved Hide resolved

alexmt and others added 9 commits April 17, 2020 12:08

move cache related code from sessionmanager to cache access wrapper

4e149a3

avoid using sleep in sessionmanager tests

1cb2e9f

mention SECONDS in session manager environment variables to make it e…

3028f35

…asier to understand meaning of each variable

Login button should be disabled while user is waiting for login result

a3ddb91

prevent timing-based user enumeration attack

2c62318

reject too many failed attempts; always compute hash and introduce ra…

ad102dc

…ndom delay

Merge pull request #7 from alexmt/bugs/rate-limit-failed-logins

d6f0761

Bugs/rate limit failed logins

remove unused constants

3f43d62

fix linter errors

2a8c32a

alexmt approved these changes Apr 21, 2020

View reviewed changes

alexmt merged commit 76bacfd into argoproj:master Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add initial implementation for rate limiting failed logins #3404

fix: Add initial implementation for rate limiting failed logins #3404

jannfis commented Apr 13, 2020

sonarcloud bot commented Apr 13, 2020

codecov bot commented Apr 13, 2020 •

edited

Loading

jannfis commented Apr 17, 2020

fix: Add initial implementation for rate limiting failed logins #3404

fix: Add initial implementation for rate limiting failed logins #3404

Conversation

jannfis commented Apr 13, 2020

PR Summary

Implementation details

Design decision 1: Focus on accounts, not origins

Design decision 2: Delay attempts, but do not lock accounts

Design decision 3: Do not exponentially increase delay

Configuration and defaults

Notes

sonarcloud bot commented Apr 13, 2020

codecov bot commented Apr 13, 2020 • edited Loading

Codecov Report

jannfis commented Apr 17, 2020

codecov bot commented Apr 13, 2020 •

edited

Loading