New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the refresh endpoint endlessly redirecting after signin #190
Comments
Thanks for reporting this. What are your values for parameters To debug, probably best to set loglevel to DEBUG for a bit, if that's acceptable to you (logs all requests, include sensitive data). That will show the responses generated by the Lambda@Edge functions––would be good to see if the
That would be great. |
I do have some debug logs that should have been captured. Just for the Also, it'll take a few for me to get those XHR's, but will do. I'll post here with additional data. So far, for what it's worth, I rolled back to |
Those settings are vanilla and should work. |
@ottokruse
|
@ottokruse LMK if I'm wrong about that and you need some bit of data that was redacted. If it's not an identifying piece of data for the user or our app, I'd be happy to add it back. I had to change the type to |
The largest issues with the redirect loop are:
We've got a handful of power-users that are going to test our app with |
Thanks that was very helpful. I can see the redirect loop, going back and forth to The root error is the redirect loop. Interestingly, from the HAR file the refresh seems to work okay:
All good, but then checkAuth decides again to send the user to Need to look closer into the reason for that. Best next step is to enable info logging, that will show the exact reason why checkAuth thinks the JWT is invalid in the logs:
Have a look at what the error says exactly? You didn't add the JWTs in the HAR (makes sense) but it would be good to have a look at them and maybe run them through jwt.io and verify with your own eyes that the JWT's in the cookies in the response from |
Can you share more details on this too? Maybe there is a clue here. Just tested again on 2.1.0 and for me this button works fine: it signs you out under the hood and redirects to the Cognito Hosted UI to sign in again. |
The value is set to We are using ReactJS and Amplify here, and I have the amplify config set to use cookies. Which does work, it bypasses the amplify sign in page. I could turn off amplify from the UI if needed for testing. After redeploying The scenario that caused it was one of our users had their computer update and reboot overnight, and when they opened up the page this morning, it triggered a "sea of redirect problems". My logging was reverted after the 2.0.19 deploy, so I'll update it to at least the INFO level and see if I can't get you exactly what the errors are saying. I'll also decode the JWTs from that redacted |
Cool. What cookie settings do you have in your Amplify config? This is a thing that changed in v2.1.0, recommend now to have config like this (to use host only cookies):
Also might be wise to have lines like this in your web app, to let Amplify do the refresh: cloudfront-authorization-at-edge/src/cfn-custom-resources/react-app/react-app/src/App.js Lines 44 to 46 in 3a79edd
|
@HudsonAkridge could you finally solve this issue? |
Hi @HudsonAkridge ! Did you ever get to the bottom of this? Did the logs reveal anything usefull? |
Closing for now, can re-open if further info surfaces |
Hi @ottokruse , long time no chat 🙂 We had some other stuff come up and just got around to giving this another go, thinking maybe something updated in the last 6+ or so months to solve the problem. Did another deploy with the latest ( So, still not sure what's happening/going on here. What I'm planning to do tomorrow is:
The only thing that seems non-vanilla about our setup, is our pre-existing custom configured Cognito pool and how we've got some custom domains set up with Amplify. So the goal is to remove as many variables from our setup as possible to create a control and see if we can recreate the issue then. It just doesn't make any sense that we're the only ones running into this with (potentially) thousands of consumers of this system. Thoughts? |
Hi @HudsonAkridge :) Great to chat again though too bad the issue is still there then.
Great plan. And do set logLevel to DEBUG then when you deploy the stack. Also, share the Amplify config? Eg the stuff you put in |
Hi @ottokruse I hope to have a write-up here soon to help the next person that comes along with a similar issue. Right now, I have sort of a question from a little out in left field, which is: The only issue I'm having now is that once the Now, I could do that/negotiate that with Cognito directly here, like the So I was trying to figure out how to get the cookie updated in the background while the user is making API calls to a different sub-domained endpoint (e.g. if the app is So far, I've got:
Option 2 is the less attractive route as I would like to re-use the behavior your great library here is set up to use. We just need to force some page in the At least, that's my current thought, but very much open to input/suggestions here. Thank you for all of your work/help with this, and I'm super excited it looks like we might finally get to use this for real (once we solve this problem that is). |
Here's what I've done for the moment to try and alleviate the issue. Added a check to do a background no-cache refresh of a simple empty JSON file with a cooldown of 15 seconds before I get the import Cookies from "universal-cookie";
const cookies = new Cookies();
const getIdTokenCookieValue = (obj, filter) => {
let key = [];
for (key in obj){
if (Object.prototype.hasOwnProperty.call(obj, key) && filter.test(key)) {
return key;
}
}
return null;
};
const backgroundRefreshCooldownInMs = 15 * 1000; //15 seconds
let lastBackgroundRefreshTimestamp = null;
export const getUserToken = async () => {
if(!lastBackgroundRefreshTimestamp || (Date.now() - lastBackgroundRefreshTimestamp) >= backgroundRefreshCooldownInMs)
{
//We want to load a potential background token refresh so the next token we get is updated
lastBackgroundRefreshTimestamp = Date.now();
let backgroundLoadResponse = await fetch("backgroundLoad.json", {cache: "no-store"});
}
let allCookies = cookies.getAll();
let cognitoIdTokenCookieName = getIdTokenCookieValue(allCookies, /idToken/);
let idToken = allCookies[cognitoIdTokenCookieName];
return idToken;
}; I've got a pretty long expiration set for the idToken, so I'll report back in the morning if this works the way we want. If it does, people are welcome to use my code above for their needs if they're also getting rid of Amplify in their SPA. My goal here is to get off Amplify completely for our ReactJS app. Amplify was nice as a boostrap in the beginning, but it's not particularly flexible and has some other weird oddities with it. It's simpler to just use this repo deployment and the above code to get the |
🎉
Awesome. I'm pretty curious.
I like that idea. If // Do a dummy fetch every minute, to refresh tokens (which is a no-op if they don't need refresh)
setInterval(() => fetch("backgroundLoad.json", {cache: "no-store"}), 1 * 60 * 1000) UPDATE: this leaves a chance of the tokens becoming expired within the first 1 minute, for returning visits where the user already has cookies that might almost expire (within 1 minute). Set the cache-control header on But your approach of only refreshing if needed, is certainly more sophisticated. |
One more thought. If I understand your situation it is roughly this:
If that's indeed true, then a good way forward would be to change 3 above: front the APIs also with the Auth@Edge CloudFront. Then refreshes will be seamless our of the box, as you've seen Auth@Edge redirects automatically to the refresh endpoint and back to the requested location. So then you don't need to trigger token refresh manually in your SPA. Side benefit is that this also eliminates CORS preflight requests, so better latency. |
@HudsonAkridge I would love to hear about how you solved the problem because we've also been experiencing something similar to this for a long time. We have a similar setup to what you described with the pre-existing custom configured Cognito user pool being fed into the CloudFormation deployment. |
@ottokruse @james1050 I'll try and briefly type up the approach with a more detailed explanation later when I have more time. For now, this is some o fthe things done to make this work starting from a fresh deploy.
"cookieSettings": {
"idToken": "Path=/; Secure; Domain=yourdomain.com; SameSite=Lax",
"accessToken": "Path=/; Secure; Domain=yourdomain.com; SameSite=Lax",
"refreshToken": "Path=/; Secure; Domain=yourdomain.com; SameSite=Lax",
"nonce": "Path=/; Secure; HttpOnly; Domain=yourdomain.com; SameSite=Lax"
} This should be the setting for all
Now, in the frontend client code. Remove Amplify. Like, all of it. You shouldn't need it because this repo @ottokruse and co have built pretty much does it all for you. Create or modify the method used to get the JWT for your application's headers (in my case, I need the JWT to attach to a call to import Cookies from "universal-cookie";
import jwt_decode from 'jwt-decode';
const cookies = new Cookies();
const getIdTokenCookieValue = (obj, filter) => {
let key = [];
for (key in obj){
if (Object.prototype.hasOwnProperty.call(obj, key) && filter.test(key)) {
return key;
}
}
return null;
};
const rateLimitInMs = 15 * 1000; //15 seconds
const refreshTokenWindowInTicks = 15; //15 seconds
let lastBackgroundRefreshTimestamp = null;
export const getUserToken = async () => {
let allCookies = cookies.getAll();
//Calculate the idToken expiration. If this fails for some reason, we should be reloading/re-signing in to the app completely
// so an error at this point should halt execution
let initialIdTokenCookieName = getIdTokenCookieValue(allCookies, /idToken/);
let initialIdToken = allCookies[initialIdTokenCookieName];
let decodedToken = jwt_decode(initialIdToken);
let initialIdTokenExpires = decodedToken.exp;
let currentTicks = Math.floor(Date.now() / 1000); //Need to convert JSdate to Unix Ticks for comparisons
let remainingIdTokenExpirationTicks = initialIdTokenExpires - currentTicks;
let isWithinTokenExpirationWindow = remainingIdTokenExpirationTicks <= refreshTokenWindowInTicks;
//Rate limit calls to doing a background fetch so we don't hammer this in case someone is impatient and keeps clicking
let isNotRateLimited = !lastBackgroundRefreshTimestamp || (Date.now() - lastBackgroundRefreshTimestamp) >= rateLimitInMs;
if(isNotRateLimited && isWithinTokenExpirationWindow)
{
lastBackgroundRefreshTimestamp = Date.now();
let backgroundLoadResponse = await fetch("backgroundLoad.json", {cache: "no-store"});
console.log("Refreshed page in background.");
}
//This may be a different idToken cookie value based on the Cognito Hash, we can't be sure it will match the one we retrieved above
let currentIdTokenCookieName = getIdTokenCookieValue(allCookies, /idToken/);
let idToken = allCookies[currentIdTokenCookieName];
return idToken;
}; I'm not saying that the code above is the most optimized way to do this, or the cleanest, just that it's fairly simple and it works. Also, add the What should happen is that if a token is expired, it should do a fetch of that json file behind the scenes which will load the JSON file, hitting the route, firing off the This prevents hammering anything on the server and only executes it once, when needed. Occasionally an individual request will fail on the background fetch, and you just have to click a link somewhere else in your app again (assuming it's always calling this So far, so good. I hope this helps someone else 🙂 |
Cheers and thanks for the explanation, happy it's sorted for you! @james1050 if this doesn't help you, I can look along with you. Send me a har file of the situation at ottokrus at amazon dot nl |
Hi mate. Can you be a bit more exact? Can you explain the steps to reproduce? And what is the technical problem? If you click on it, what does it say? Is it the same message reported at the start of this issue? |
Thanks for quick reply, well here is what is happening. 2 years back we have deployed an instance of this solution in Static Website mode. myexampledomain.com/site1/index.html Currently, we have deployed another instance of this solution with latest codebase in another AWS account. Version: 2.1.9 So, after launching the any of the URL mentioned (this happens randomly) after 5 mins of inactivity randomly one of the url fails to Also, I have one more question, in past versions of this codebase, Check Auth function used to have this piece of codebase. |
Can you share a HAR file perhaps so I can see in detail what goes on? Send it to me directly via email.
JWT refreshing is now lazy, it's only done when the JWT actually expires. (reason was that with a small expiry window for JWTs, that some people configure, you could run into infinite refreshes with the previous code, that pre-emptively refreshed) Anyways this is proving to be tricky problem and the solution will be to monitor the browser network tab (and record a HAR please) and looking at the CloudWatch logs at the same time, so you can see why they do what they do (set logging to DEBUG for this). |
Looking closely at the network tab screenshot from @HudsonAkridge I can see that it is actually If I look at the screenshot, the 1st request (is that in response to a redirect?) to refreshauth (first line in the screenshot) succeeds and comes back with HTTP 200. But, that means it is actually the error page, because if the JWTs would have been refreshed successfully, you'd see a 307 instead (easy to understand from the code here: https://github.com/aws-samples/cloudfront-authorization-at-edge/blob/master/src/lambda-edge/refresh-auth/index.ts). What confuses me is that upon loading the error page, favicon.ico is apparently fetched, and it triggers the redirect to refreshauth also, which now works because I see a 307 (!) But even though it works, it's cookies+JWTs are apparently not persisted, or something else is wrong, which is why the subsequent request to favicon.ico triggers the redirect to refreshauth again, which returns 307 (so tokens must again be refreshed) but still doesn't work, and so on until the browser has had enough of it. This is one way to rack up the number of calls to Cognito! We should do something about it. Questions I have, which can probably only be answered by going through the HAR file AND looking at the LambdaEdge logs side by side:
|
Users should never go directly to /refreshauth, they go to wherever they want to go, get redirected to /refreshauth where the JWT refresh magic happens, and then get redirected back to where they came from. Users should never see the /refreshauth URL, if all goes well the redirect to there and back to where they came from is so fast they do not even notice it. But if the error page is displayed, then they would see it. Maybe we should detect if the user goes to /refreshauth directly (without being redirected) and then redirect them back without trying the refresh. There is no foolproof way for this that I know of, but checking the absence of the Referer header could be pragmatic enough. Also, I kinda like not triggering refresh for favicon.ico. But let's not jump the gun and first get to the bottom of why this happens. |
Hi Mate, I tried something, I took a pull of code where lazy refresh doesn't happen & just tried to update ChekAuthHandler with that piece of code & things seem to be working now. I suspect the issue has something to with the lazy refresh (although can't say for sure, but the behavior of things does align). Why does the user navigate to refreshauth directly - I don't think user does that, In my case, after 5mins when the token expires & post that when I visit to the site, the issue occurs. Hope this helps! |
Yeah it is likely, the lazy refresh was introduced in v2.1.0, which is the 1st release in which this issue was reported. But I still want to understand why this happens now with the lazy refresh before making any fixes. |
Just tried to reproduce it myself (against a fresh and vanilla deployment out of the Serverless Repo) but the refresh works flawless for me: Will be hard to fix this without a reliable way to reproduce it :| @JaysenWankhade can you check if in your case also the 1st refresh goes wrong (shows error page) after which favicon.ico triggers the loop? |
Wait you mention also seeing this in 2.07 and 2.09, if so, it can't be the lazy refresh then |
We're using v2.1.0 (vanilla, nothing fancy/special). Occasionally, a user will get stuck in this redirect loop. I suspect it's when the JWT expires, and a lazy evaluation happens. Not all the time, but sometimes, it will put the browser into this constant back and forth redirect ping thing that keeps showing the message/image displayed below. If you look at the XHR request stuff, you'll see that it's just endlessly redirecting from the
/refresh
endpoint, to the cognito/auth
and/login
endpoints. I can post the XHR's (once I sanitize them), if needed.This might be due to a refactor done recently for this version that has the JWT refreshed lazily perhaps?
I've tried with the JWT expiration set to 1 hour, 12 hours, 24 hours, and 5 minutes and the same thing. It just changes how fast the cycles go before this issue manifests/gets recreated.
For the end user, there's no solution other than clearing cookies to correct this problem. Once they sign in again, the problem goes away for several refreshes, some unknown number (can't recreate it consistently, it appears to be a timing thing) before it hits again.
I'm going to try reverting back to a previous version of the app (v2.0.19) to see if this has any positive impact.
The text was updated successfully, but these errors were encountered: