Allow token refresh for multiple requests in a small window #36872

epixa · 2018-12-20T00:20:49Z

The problem

The current behavior for refreshing a token is to immediately invalidate a refresh token when it is used the first time. In principle this is a sensible way to prevent the refresh token from being used maliciously, but in practice it can trivially break a client that is making many requests to Elasticsearch.

In Kibana, the consequence is that sessions using our token-based providers (saml and token) occasionally get destroyed as multiple requests race to refresh an expired access token. This problem will only get worse as we expand the usage of canvas expressions which can result in more requests in parallel.

Let's say a user allows their session to idle for a bit and their access token expires, then they click refresh on a dashboard. Requests A, B, and C are fired off to Elasticsearch in parallel, all three get rejected due to an expired access token, and all three attempt to refresh the session using the same refresh token. A succeeds and returns a new session cookie to the client. B and C fail since the refresh token has already been used. The client sees the failures of B and C and assumes the session must be dead, so it either errors or sends the user to the login form with a cleared session.

Only processing a single refresh token in Kibana for parallel requests isn't practical because there could be multiple Kibana instances behind a load balancer handling requests for the same session.

The proposal

I propose that we add an optional nonce property to the refresh_token grant type. If the first request to refresh a token contains a nonce, then subsequent refresh token requests for that same refresh token using the same nonce will return the same new access token.

A client like Kibana can generate a random nonce value each time a user does a page reload on Kibana, and it'll include the nonce in any refresh token requests that occurs for that user during this time. Unlike the access token and refresh token, the nonce is never stored in the session itself.

There can be a "refresh window" associated with the nonce feature as well, maybe 5 minutes or something like that, where afterwords the token cannot be refreshed even with a matching nonce.

cc @elastic/kibana-security

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-12-20T00:20:51Z

Pinging @elastic/es-security

jkakavas · 2018-12-21T10:28:25Z

That's a tricky one :) My initial reaction ( and I shared this with @kobelb in the issue we came across this ) is that this needs to be fixed on the client side. All the solutions that come to mind though, either require a shared server-side state in Kibana or do not handle the "HA with many Kibana instances" scenario.

I think the suggested approach adds a lot of complexity and I'd like to propose a different solution, one where we do not invalidate refresh tokens on first use. The original decision to revoke refresh tokens on use, still makes sense and it is in the spirit of the specification:

The authorization server MAY revoke the old
refresh token after issuing a new refresh token to the client.

but I consider revisiting it as a sensible trade-off in order to resolve the issue at hand with low to no risk involved.

Standards and Best Practices compliance

There is nothing in the original standard, the OAuth 2.0 Threat Model and Security Considerations or the recent OAuth 2.0 Security Best Current Practice draft that mandates the revocation on use of the refresh tokens.

Threat Model

Threat : Retaining persistent access

Attack1 : A legitimate client could use the same refresh token multiple times.

Countermeasures : The legitimate client would get a new access token and a new refresh token each time but

the access token would give them the same privileges that the original access token or the access token that they would get with the original refresh, would give them.
the new refresh tokens will extend the duration of its access ( each new one will give the option to refresh access for 24h ) but this is in no way different than sequentially refreshing using a new refresh token each time.

As such, this threat degrades into normal functionality and should not be considered as a risk.

Attack2 : An attacker could use the same leaked refresh token multiple times.

Countermeasures : As above, this doesn't give any additional access to an attacker. The ability to explicitly invalidate tokens (either for specific tokens or for any user/realm combination) is another mitigating factor to this attack. Finally we do authenticate the client on the refresh_grant so leaked refresh tokens cannot be used by other clients ( outside Kibana )

Threat : Performance degradation / Denial of Service

Attack1: A malicious user can continuously refresh an existing refresh token. Each of these requests would give them a new refresh token and access token and they could then start also using this new refresh token to get additional ones, etc. Compared to the old approach, this would allow the malicious user to create documents in the security index in an exponential rate which depending on deployment specific factors could hurt performance of the Token Service.

Countermeasures: We already delete expired tokens so there is an upper limit to the amount of documents they can create, as we would periodically delete ones containing expired refresh tokens. Additionally, we can decide on a sensible maximum number of refreshes that would resolve the issue but further mitigate the mentioned threat.

kobelb · 2018-12-21T14:56:40Z

Do we need to take the logout/token-invalidation behavior into consideration here? If we could potentially have multiple access/refresh tokens that were created from the initial refresh token, it becomes quite complicated (perhaps impossible) to ensure that on logout "all" of the refresh/access tokens are invalidated. This is one of the main complaints which I hear from users about our existing basic auth provider, that on logout they can re-use the previous session cookies (which would contain the previous access/refresh tokens) to authenticate subsequently.

jkakavas · 2018-12-21T18:16:12Z

@kobelb wouldn't a call to invalidate all tokens of the user satisfy the logout requirement ?

DELETE /_xpack/security/oauth2/token
{
  "username" : "user"
}

kobelb · 2018-12-21T18:56:37Z

@jkakavas I forgot we could do that now, that's awesome! Would there be a way to limit this to only the tokens associated with the instance of Kibana so it wouldn't log them out of all other instances of Kibana, or their other usages of tokens.

jkakavas · 2018-12-24T06:13:51Z

Would there be a way to limit this to only the tokens associated with the instance of Kibana

No, that's a good catch. I didn't think about this scenario at all. I don't think that we have a good way of differentiating that. We could add the API parameter to also filter by oAuth2 client but there is also a high probability that these different kibana instances will use the same user to authenticate to Elasticsearch (that principal is what we keep track as client). The original proposal will also need a solution around this as B, C, D would create new refresh_tokens also and on logout we would revoke only the one that D created.

I'll go back to the drawing board and re-evaluate @epixa 's proposal also.

albertzaharovits · 2018-12-31T11:02:12Z

I agree with @jkakavas's assessment that we will not be introducing new attack vectors by not invalidating the refresh token on use. But I worry that we might increase the possibility of leaking refresh tokens if we generate new ones on each reuse. I would propose that we not generate a new refresh token for each use of another refresh token, and maintain the original one valid for subsequent uses.

jkakavas · 2019-01-09T19:57:20Z

I would propose that we not generate a new refresh token for each use of another refresh token, and maintain the original one valid for subsequent uses.

In terms of token leakage that would also broaden the attack surface in the time dimension as we would have fewer refresh tokens but with longer validity period. ( i.e. an attacker finds a refresh token logged somewhere from 4-5 days ago which might still be active )
This also suffers from the same issue with regards to access token invalidation on logout , as the previous proposal.

In general , the fact that the refresh token can only be used by the original client makes token leakage a reduced impact threat as an attacker would also need to compromise the client(usually kibana for now) credentials in order to exploit a leaked refresh token.

I'll go back to the drawing board and re-evaluate @epixa 's proposal also.

I'll get to it in the next couple of days.

jkakavas · 2019-01-17T07:19:09Z

We discussed this today in our team meeting. Jay came up with an idea to simplify the original proposal by not requiring a nonce and using a window of 1 min. We can probably bring this down to a few seconds as we're trying to solve for concurrent requests. The idea is that we would return the same access token for the duration of this window as the original proposal also suggests.
I have an idea of an implementation that would allow this and plan to give it a shot unless someone brings up a concern that we have missed. @epixa did you have any specific ideas why the nonce would be of use that is not covered by just a time window?

epixa · 2019-01-17T14:22:53Z

I don't have strong opinions about the nonce, but I do think it solves a different problem that is possible within the time window.

In a UI client, if an attacker gets access to the access token for another user's session, they almost certainly have access to the refresh token as well. Both of these things must be associated with a session in order to be useful.

We can't prevent an attacker from initiating the session reset themselves, but without the nonce feature, these two things are reality:

If the attacker is the one that first initiates the token refresh, the original user remains unaware that something is wrong as their session will also seamlessly refresh (to an already-exploited new token). If a nonce was in place, the original user's session would fail to refresh and we could potentially error with something like "this refresh token has already been used" which might tip them off to a problem.
If the original user is the one that first initiates the token refresh, the attacker can still safely refresh the token anyway (still unbeknownst to the user). If the nonce was in place, the attacker's request would be rejected.

To "quietly" exploit the nonce feature, the attacker needs to attack a different mechanism entirely in order to get a valid nonce. The act of getting the session info isn't enough.

With the time window alone, no additional exploit is necessary, the timing just needs to be right. This window could be made easier to exploit if the attacker can determine approximately when a token was due to expire. I notice the token service itself returns with expires_in, so if that info is available on an ongoing basis from ES, then it would be pretty easy for an attacker to exploit the window. If not available easily through ES, it's also not unlikely that a client would store the whole token service payload (including expires_in) along with a created_at timestamp with the session, so in that case the attacker could exploit the window using the info they had on hand.

I'm comfortable leaving it to this team to decide the extent to which this is acceptable in our auth threat model.

albertzaharovits · 2019-01-17T15:08:02Z

@epixa If I'm following correctly the nonce is a use-once token, that unlike the refresh token, which is also use-once atm, is generated by Kibana and not by ES. For that matter, the nonce is generated instead of stored.

I think the nonce is a raw method of binding the access-refresh token pair to a Kibana instance.
Instead, and this is not mentioned in the previous description, the refresh token is bound to the credentials of the kibana process that generated the access-refresh token pair in the first place. So, an attacker does not simply has to race for the refresh, or guess the generated nonce, it has to know the secret kibana password.

I guess, this is not how things work right now, but each authenticator process in a multiple kibana deployment should use different secret credentials. In this way, the nonce is not required to identify a Kibana instance (it is still useful, if credentials are shared, but then I don't think the refresh token is useful anymore).

jkakavas · 2019-01-17T17:07:15Z

I don't think the nonce binds the access-refresh token pair to a client. It binds the pair to a user

A client like Kibana can generate a random nonce value each time a user does a page reload on Kibana,

and Court's argument was that this can be used as a canary token of sorts to indicate that only one user is allowed to refresh a refresh token ( someone else attempting to refresh again with a leaked refresh token within the proposed allowed window will not be successful as they won't send the same nonce ).

In a UI client, if an attacker gets access to the access token for another user's session, they almost certainly have access to the refresh token as well. Both of these things must be associated with a session in order to be useful.

The premise of this attack, presupposes that someone (in increasing order of complexity/decreasing order of probability) either

a. happens to get a kibana auth cookie laying around and try to use it as is
b. has access to the encryptionKey value ( thus access to kibana.yml and probably the kibana user credentials ) and gets a kibana auth cookie somewhere (logged in debug logs in the server ?- assuming physical access) so that they can decrypt an auth cookie they got somehow to get the values of the access token and refresh token and use them with the token API
c. have access to the kibana process
d. are in a MITM position between Elasticsearch and a client and can see raw traffic which means that they need to be doing SSL stripping as the token service requires TLS on the http layer of Elasticsearch.

c and d we can't protect against. They have the kibana user credentials, access to the values of access_token and refresh_token , can see the nonce being sent etc. Silently refreshing a token is the least they could do.

b has physical access to the kibana server, I believe they can do far more destructive actions than silently refreshing a user's token.

So, we're trying to solve this for a: This attacker can't see the refresh token as it is encrypted, and even if they could decrypt they cannot use it on their own because they don't have the credentials of the client that originally got the refresh token. They can only trick Kibana to refresh it for them taking advantage of our refresh window.

Let's assume the refresh window is 10sec and the access token is valid for 10 mins.

Without the refresh window and multiple refreshes ,the attacker would be able to refresh the token anytime within the first 10 mins of access token validity, before the legitimate user refreshed it because their access token expired. The legitimate user will get an error when their access token expired and would need to re-authenticate.
With the refresh window and multiple refreshes, the attacker would still be able to do the above plus do it without the user noticing if they could time the refresh (by sending a request to kibana) somewhere between 9m50s and 10m10s. They would then need to keep doing this every 10 min to retain access without logging out the legitimate user.

I'm leaning to this being an acceptable risk given the prerequisites to exploit this. The stealth property of this attack (which is the only thing an attacker gains here as compared to our current behavior) is , I think, not particularly interesting from an attacker perspective in a web environments where "you were logged out, please reauthenticate" messages are not too uncommon to make a legitimate user suspicious.

@epixa if you think that there is an easy way to have Kibana predictably send the same value for all XHR requests of the same user that might come concurrently, then it doesn't complicate our implementation that much. But if it's a significant effort from your side, I'm not sure we gain much by doing so.

albertzaharovits · 2019-01-17T19:29:31Z

I don't think the nonce binds the access-refresh token pair to a client. It binds the pair to a user

A client like Kibana can generate a random nonce value each time a user does a page reload on Kibana,

and Court's argument was that this can be used as a canary token of sorts to indicate that only one user is allowed to refresh a refresh token ( someone else attempting to refresh again with a leaked refresh token within the proposed allowed window will not be successful as they won't send the same nonce (1) ).

Understood, thanks for clarifying this @jkakavas ! But from (1) - above , and this bellow

They can only trick Kibana to refresh it for them taking advantage of our refresh window.

I understand that the threat model is that the attacker can trick Kibana to refresh the token but cannot trick it to also use the correct nonce ?

If that is so, it sounds pretty low risk to me.

epixa · 2019-01-17T19:32:29Z

@kobelb I'd like your thoughts on this:

Generating a nonce on page load in Kibana and passing it through on all requests to Elasticsearch should be pretty easy. We'd pass the nonce to the UI via injectVars, add it as a header for all HTTP requests to the backend, and then update the authenticate logic to pass the nonce through in the event of refresh.

However, multiple browser tabs complicate the situation as each would have its own nonce, which at best would be a bad experience as we break other tabs and at worst would log the user out anyway as the logout operation would fire when refresh failed on another tab.

We could address this cross-tab issue by generating the nonce in sessionStorage instead. The downside to this approach is that the nonce is a little more sticky than it was before, and it's hard to say how much that impacts the value of the nonce from a security perspective. Since our tokens are stored in a cookie and the nonce is in sessionStorage, pwning the session token doesn't mean pwning the nonce, but that feels like an implementation detail of storage that would be easy to overlook and could realistically change in the future.

At this point, I'm leaning toward just abandoning the nonce concept entirely which is the prevailing preference from the ES side anyway.

kobelb · 2019-01-17T20:32:43Z

However, multiple browser tabs complicate the situation as each would have its own nonce, which at best would be a bad experience as we break other tabs and at worst would log the user out anyway as the logout operation would fire when refresh failed on another tab.

We could address this cross-tab issue by generating the nonce in sessionStorage instead. The downside to this approach is that the nonce is a little more sticky than it was before, and it's hard to say how much that impacts the value of the nonce from a security perspective. Since our tokens are stored in a cookie and the nonce is in sessionStorage, pwning the session token doesn't mean pwning the nonce, but that feels like an implementation detail of storage that would be easy to overlook and could realistically change in the future.

At this point, I'm leaning toward just abandoning the nonce concept entirely which is the prevailing preference from the ES side anyway.

I think this is the biggest issue that we're doing to run into when trying to use a nonce for this mechanism. If the attacker is able to access the cookie using some mechanism in the browser (which is even less likely because it has the HttpOnly flag so they can't client-side) they'll definitely be able to access sessionStorage. This means that they're most likely going to get access to the nonce by intercepting a network request to Kibana, which will have both the nonce and the session, so we're already compromised in this scenario.

I'm also leaning toward abandoning the nonce concept, given the recent discussions.

epixa · 2019-01-17T21:20:31Z

Let's consider the nonce detail of this proposal abandoned from Kibana's side then.

jkakavas · 2019-01-18T07:13:05Z

@epixa thanks for raising this and thanks for the iterations @kobelb, @albertzaharovits, @jaymode . I'll move with the original proposal without the nonce.

This change adds supports for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: - In mixed clusters we keep creating tokens in the old format and consume only old format tokens - In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) Resolves #36872 Co-authored-by: Jay Modi jaymode@users.noreply.github.com

This change adds supports for the concurrent refresh of access tokens as described in elastic#36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: - In mixed clusters we keep creating tokens in the old format and consume only old format tokens - In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) Resolves elastic#36872 Co-authored-by: Jay Modi jaymode@users.noreply.github.com

This is a backport of #38382 This change adds supports for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: - In mixed clusters we keep creating tokens in the old format and consume only old format tokens - In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) Resolves #36872 Co-authored-by: Jay Modi jaymode@users.noreply.github.com

epixa · 2019-03-02T13:26:55Z

I think the change was temporary reverted based on the latest comments in the PR, so I’m reopening this. Feel free to close if I’m mistaken.

jkakavas · 2019-03-02T14:18:44Z

You re right Court, I will re-close this on Monday upon submitting the PR again

Co-authored-by: Jay Modi jaymode@users.noreply.github.com This change adds support for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: In mixed clusters we keep creating tokens in the old format and consume only old format tokens In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) When reading/writing TokensInvalidationResult objects, we take into consideration that pre 7.1.0 these contained an integer field that carried the attempt count Resolves #36872

This is a backport of #39631 Co-authored-by: Jay Modi jaymode@users.noreply.github.com This change adds support for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: In mixed clusters we keep creating tokens in the old format and consume only old format tokens In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) When reading/writing TokensInvalidationResult objects, we take into consideration that pre 7.1.0 these contained an integer field that carried the attempt count Resolves #36872

epixa added >enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) labels Dec 20, 2018

epixa mentioned this issue Dec 20, 2018

SAML/token Concurrent XHR Requests elastic/kibana#26744

Closed

jkakavas self-assigned this Dec 21, 2018

jkakavas mentioned this issue Feb 4, 2019

Support concurrent refresh of refresh tokens #38382

Merged

jkakavas closed this as completed in #38382 Mar 1, 2019

jkakavas mentioned this issue Mar 1, 2019

Support concurrent refresh of refresh tokens #39559

Merged

epixa reopened this Mar 2, 2019

jkakavas mentioned this issue Mar 4, 2019

Support concurrent refresh of refresh tokens #39631

Merged

jkakavas closed this as completed in #39631 Mar 4, 2019

jkakavas mentioned this issue Mar 4, 2019

Support concurrent refresh of refresh tokens #39647

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow token refresh for multiple requests in a small window #36872

Allow token refresh for multiple requests in a small window #36872

epixa commented Dec 20, 2018

elasticmachine commented Dec 20, 2018

jkakavas commented Dec 21, 2018

kobelb commented Dec 21, 2018

jkakavas commented Dec 21, 2018

kobelb commented Dec 21, 2018

jkakavas commented Dec 24, 2018

albertzaharovits commented Dec 31, 2018

jkakavas commented Jan 9, 2019

jkakavas commented Jan 17, 2019

epixa commented Jan 17, 2019

albertzaharovits commented Jan 17, 2019

jkakavas commented Jan 17, 2019

albertzaharovits commented Jan 17, 2019

epixa commented Jan 17, 2019

kobelb commented Jan 17, 2019 •

edited

epixa commented Jan 17, 2019

jkakavas commented Jan 18, 2019

epixa commented Mar 2, 2019

jkakavas commented Mar 2, 2019

Allow token refresh for multiple requests in a small window #36872

Allow token refresh for multiple requests in a small window #36872

Comments

epixa commented Dec 20, 2018

The problem

The proposal

elasticmachine commented Dec 20, 2018

jkakavas commented Dec 21, 2018

Suggested Solution

Standards and Best Practices compliance

Threat Model

Threat : Retaining persistent access

Threat : Performance degradation / Denial of Service

kobelb commented Dec 21, 2018

jkakavas commented Dec 21, 2018

kobelb commented Dec 21, 2018

jkakavas commented Dec 24, 2018

albertzaharovits commented Dec 31, 2018

jkakavas commented Jan 9, 2019

jkakavas commented Jan 17, 2019

epixa commented Jan 17, 2019

albertzaharovits commented Jan 17, 2019

jkakavas commented Jan 17, 2019

albertzaharovits commented Jan 17, 2019

epixa commented Jan 17, 2019

kobelb commented Jan 17, 2019 • edited

epixa commented Jan 17, 2019

jkakavas commented Jan 18, 2019

epixa commented Mar 2, 2019

jkakavas commented Mar 2, 2019

kobelb commented Jan 17, 2019 •

edited