New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow token refresh for multiple requests in a small window #36872
Comments
Pinging @elastic/es-security |
That's a tricky one :) My initial reaction ( and I shared this with @kobelb in the issue we came across this ) is that this needs to be fixed on the client side. All the solutions that come to mind though, either require a shared server-side state in Kibana or do not handle the "HA with many Kibana instances" scenario. I think the suggested approach adds a lot of complexity and I'd like to propose a different solution, one where we do not invalidate refresh tokens on first use. The original decision to revoke refresh tokens on use, still makes sense and it is in the spirit of the specification:
but I consider revisiting it as a sensible trade-off in order to resolve the issue at hand with low to no risk involved. Suggested SolutionWe stop revoking refresh tokens on first use. This will resolve the issue at hand as Requests This means that nothing needs to change on the Kibana's side. Any concerns regarding the possibility of Standards and Best Practices complianceThere is nothing in the original standard, the OAuth 2.0 Threat Model and Security Considerations or the recent OAuth 2.0 Security Best Current Practice draft that mandates the revocation on use of the refresh tokens. Threat ModelThreat : Retaining persistent accessAttack1 : A legitimate client could use the same refresh token multiple times. Countermeasures : The legitimate client would get a new access token and a new refresh token each time but
As such, this threat degrades into normal functionality and should not be considered as a risk. Attack2 : An attacker could use the same leaked refresh token multiple times. Countermeasures : As above, this doesn't give any additional access to an attacker. The ability to explicitly invalidate tokens (either for specific tokens or for any user/realm combination) is another mitigating factor to this attack. Finally we do authenticate the client on the Threat : Performance degradation / Denial of ServiceAttack1: A malicious user can continuously refresh an existing refresh token. Each of these requests would give them a new refresh token and access token and they could then start also using this new refresh token to get additional ones, etc. Compared to the old approach, this would allow the malicious user to create documents in the security index in an exponential rate which depending on deployment specific factors could hurt performance of the Token Service. Countermeasures: We already delete expired tokens so there is an upper limit to the amount of documents they can create, as we would periodically delete ones containing expired refresh tokens. Additionally, we can decide on a sensible maximum number of refreshes that would resolve the issue but further mitigate the mentioned threat. |
Do we need to take the logout/token-invalidation behavior into consideration here? If we could potentially have multiple access/refresh tokens that were created from the initial refresh token, it becomes quite complicated (perhaps impossible) to ensure that on logout "all" of the refresh/access tokens are invalidated. This is one of the main complaints which I hear from users about our existing basic auth provider, that on logout they can re-use the previous session cookies (which would contain the previous access/refresh tokens) to authenticate subsequently. |
@kobelb wouldn't a call to invalidate all tokens of the user satisfy the logout requirement ?
|
@jkakavas I forgot we could do that now, that's awesome! Would there be a way to limit this to only the tokens associated with the instance of Kibana so it wouldn't log them out of all other instances of Kibana, or their other usages of tokens. |
No, that's a good catch. I didn't think about this scenario at all. I don't think that we have a good way of differentiating that. We could add the API parameter to also filter by I'll go back to the drawing board and re-evaluate @epixa 's proposal also. |
I agree with @jkakavas's assessment that we will not be introducing new attack vectors by not invalidating the refresh token on use. But I worry that we might increase the possibility of leaking refresh tokens if we generate new ones on each reuse. I would propose that we not generate a new refresh token for each use of another refresh token, and maintain the original one valid for subsequent uses. |
In terms of token leakage that would also broaden the attack surface in the time dimension as we would have fewer refresh tokens but with longer validity period. ( i.e. an attacker finds a refresh token logged somewhere from 4-5 days ago which might still be active ) In general , the fact that the refresh token can only be used by the original client makes token leakage a reduced impact threat as an attacker would also need to compromise the client(usually kibana for now) credentials in order to exploit a leaked refresh token.
I'll get to it in the next couple of days. |
We discussed this today in our team meeting. Jay came up with an idea to simplify the original proposal by not requiring a nonce and using a window of 1 min. We can probably bring this down to a few seconds as we're trying to solve for concurrent requests. The idea is that we would return the same access token for the duration of this window as the original proposal also suggests. |
I don't have strong opinions about the In a UI client, if an attacker gets access to the access token for another user's session, they almost certainly have access to the refresh token as well. Both of these things must be associated with a session in order to be useful. We can't prevent an attacker from initiating the session reset themselves, but without the
To "quietly" exploit the With the time window alone, no additional exploit is necessary, the timing just needs to be right. This window could be made easier to exploit if the attacker can determine approximately when a token was due to expire. I notice the token service itself returns with expires_in, so if that info is available on an ongoing basis from ES, then it would be pretty easy for an attacker to exploit the window. If not available easily through ES, it's also not unlikely that a client would store the whole token service payload (including expires_in) along with a created_at timestamp with the session, so in that case the attacker could exploit the window using the info they had on hand. I'm comfortable leaving it to this team to decide the extent to which this is acceptable in our auth threat model. |
@epixa If I'm following correctly the I think the I guess, this is not how things work right now, but each authenticator process in a multiple kibana deployment should use different secret credentials. In this way, the |
I don't think the
and Court's argument was that this can be used as a canary token of sorts to indicate that only one user is allowed to refresh a refresh token ( someone else attempting to refresh again with a leaked refresh token within the proposed allowed window will not be successful as they won't send the same nonce ).
The premise of this attack, presupposes that someone (in increasing order of complexity/decreasing order of probability) either a. happens to get a kibana auth cookie laying around and try to use it as is
So, we're trying to solve this for Let's assume the refresh window is 10sec and the access token is valid for 10 mins.
I'm leaning to this being an acceptable risk given the prerequisites to exploit this. The stealth property of this attack (which is the only thing an attacker gains here as compared to our current behavior) is , I think, not particularly interesting from an attacker perspective in a web environments where "you were logged out, please reauthenticate" messages are not too uncommon to make a legitimate user suspicious. @epixa if you think that there is an easy way to have Kibana predictably send the same value for all XHR requests of the same user that might come concurrently, then it doesn't complicate our implementation that much. But if it's a significant effort from your side, I'm not sure we gain much by doing so. |
Understood, thanks for clarifying this @jkakavas ! But from (1) - above , and this bellow
I understand that the threat model is that the attacker can trick Kibana to refresh the token but cannot trick it to also use the correct If that is so, it sounds pretty low risk to me. |
@kobelb I'd like your thoughts on this: Generating a nonce on page load in Kibana and passing it through on all requests to Elasticsearch should be pretty easy. We'd pass the nonce to the UI via However, multiple browser tabs complicate the situation as each would have its own We could address this cross-tab issue by generating the nonce in At this point, I'm leaning toward just abandoning the nonce concept entirely which is the prevailing preference from the ES side anyway. |
I think this is the biggest issue that we're doing to run into when trying to use a nonce for this mechanism. If the attacker is able to access the cookie using some mechanism in the browser (which is even less likely because it has the HttpOnly flag so they can't client-side) they'll definitely be able to access sessionStorage. This means that they're most likely going to get access to the nonce by intercepting a network request to Kibana, which will have both the nonce and the session, so we're already compromised in this scenario. I'm also leaning toward abandoning the nonce concept, given the recent discussions. |
Let's consider the |
@epixa thanks for raising this and thanks for the iterations @kobelb, @albertzaharovits, @jaymode . I'll move with the original proposal without the nonce. |
This change adds supports for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: - In mixed clusters we keep creating tokens in the old format and consume only old format tokens - In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) Resolves #36872 Co-authored-by: Jay Modi jaymode@users.noreply.github.com
This change adds supports for the concurrent refresh of access tokens as described in elastic#36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: - In mixed clusters we keep creating tokens in the old format and consume only old format tokens - In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) Resolves elastic#36872 Co-authored-by: Jay Modi jaymode@users.noreply.github.com
This is a backport of #38382 This change adds supports for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: - In mixed clusters we keep creating tokens in the old format and consume only old format tokens - In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) Resolves #36872 Co-authored-by: Jay Modi jaymode@users.noreply.github.com
I think the change was temporary reverted based on the latest comments in the PR, so I’m reopening this. Feel free to close if I’m mistaken. |
You re right Court, I will re-close this on Monday upon submitting the PR again |
Co-authored-by: Jay Modi jaymode@users.noreply.github.com This change adds support for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: In mixed clusters we keep creating tokens in the old format and consume only old format tokens In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) When reading/writing TokensInvalidationResult objects, we take into consideration that pre 7.1.0 these contained an integer field that carried the attempt count Resolves #36872
This is a backport of #39631 Co-authored-by: Jay Modi jaymode@users.noreply.github.com This change adds support for the concurrent refresh of access tokens as described in #36872 In short it allows subsequent client requests to refresh the same token that come within a predefined window of 60 seconds to be handled as duplicates of the original one and thus receive the same response with the same newly issued access token and refresh token. In order to support that, two new fields are added in the token document. One contains the instant (in epoqueMillis) when a given refresh token is refreshed and one that contains a pointer to the token document that stores the new refresh token and access token that was created by the original refresh. A side effect of this change, that was however also a intended enhancement for the token service, is that we needed to stop encrypting the string representation of the UserToken while serializing. ( It was necessary as we correctly used a new IV for every time we encrypted a token in serialization, so subsequent serializations of the same exact UserToken would produce different access token strings) This change also handles the serialization/deserialization BWC logic: In mixed clusters we keep creating tokens in the old format and consume only old format tokens In upgraded clusters, we start creating tokens in the new format but still remain able to consume old format tokens (that could have been created during the rolling upgrade and are still valid) When reading/writing TokensInvalidationResult objects, we take into consideration that pre 7.1.0 these contained an integer field that carried the attempt count Resolves #36872
The problem
The current behavior for refreshing a token is to immediately invalidate a refresh token when it is used the first time. In principle this is a sensible way to prevent the refresh token from being used maliciously, but in practice it can trivially break a client that is making many requests to Elasticsearch.
In Kibana, the consequence is that sessions using our token-based providers (saml and token) occasionally get destroyed as multiple requests race to refresh an expired access token. This problem will only get worse as we expand the usage of canvas expressions which can result in more requests in parallel.
Let's say a user allows their session to idle for a bit and their access token expires, then they click refresh on a dashboard. Requests
A
,B
, andC
are fired off to Elasticsearch in parallel, all three get rejected due to an expired access token, and all three attempt to refresh the session using the same refresh token.A
succeeds and returns a new session cookie to the client.B
andC
fail since the refresh token has already been used. The client sees the failures ofB
andC
and assumes the session must be dead, so it either errors or sends the user to the login form with a cleared session.Only processing a single refresh token in Kibana for parallel requests isn't practical because there could be multiple Kibana instances behind a load balancer handling requests for the same session.
The proposal
I propose that we add an optional
nonce
property to therefresh_token
grant type. If the first request to refresh a token contains a nonce, then subsequent refresh token requests for that same refresh token using the same nonce will return the same new access token.A client like Kibana can generate a random nonce value each time a user does a page reload on Kibana, and it'll include the nonce in any refresh token requests that occurs for that user during this time. Unlike the access token and refresh token, the nonce is never stored in the session itself.
There can be a "refresh window" associated with the nonce feature as well, maybe 5 minutes or something like that, where afterwords the token cannot be refreshed even with a matching nonce.
cc @elastic/kibana-security
The text was updated successfully, but these errors were encountered: