New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeouts in session closing and closed reminders #5999
Comments
One common error I see is this:
Another is this:
|
I suspect this is the problem |
As opening reminders have 0% error rate, I'm guessing the db query on that side is done in a more optimal way. We need to quickly replicate the same optimization on the closing reminders side. |
Both submission edit and save requests are suffering from the first error ( |
Currently operating on the assumption that 3 methods call this method: one for sending session closing reminders, one for sending session closed emails, and a test (which is likely not relevant to this issue). As "session closing" reminders is not a particularly new feature, this may explain why a rollback does not solve the issue. Currently I have two possible solutions in mind, although none of them are particularly "perfect": 1: 2: Other considerations for this solution: editing the session to change its closing time will have to update these boolean values as well. Therefore, more time will be needed to ensure this does not break any existing features, and this solution will take longer to roll out as a result. |
@chowyb your observations are all correct; I've been working on it for the past hour (2nd solution since time is not really on our side), but still need to run some tests. Hope you can take a look later when it's done. @damithc the submission edit page/save and the result page, what are the errors coming out from them? They can't be the non-private feedback sessions error. |
This looks like a combination of
I don't think all problems are caused by that query, but it's likely to fix one of the problems. I think retrieving sessions with deadlines in past 24 hours (instead of the entire list) should solve that problem? |
Most common error for submission edit/save is
|
Another common warning (not an error) I see in submission edit/save as well as results page is this:
This is not an error, but it's new and frequent. |
Changing the issue title since we need to work on solving things one at a time. @damithc |
All the other high error rates are caused by 104. |
I managed to reduce the severity of the problem by splitting the traffic between 5 versions of the app. 104 errors still happen, but a less than before. So it is related to the traffic level. That's why I suspect 104 error may be a memory issue. |
Looks like it's the results pages (with big result sets) that is causing the 104. Other pages could be collateral damage (i.e. when the server goes down, it takes all other ongoing requests with it). Note that when we load the results page of a big course, we still try to load responses of all the questions at the same time. This could be the thing that sends the server over the memory limit. |
@chowyb can you look into the results page issue. As an interim measure, we can stop auto-loading the per-question view for any course with more than 30 respondents. |
BTW, the problem with |
@damithc I'll look into the issue, but from memory there is a preset limit on how many responses are present (10k?) before the system defaults to not auto-loading. It might be possible to reduce this limit instead. |
Yes, let's try to reduce that limit. 10k is definitely too high. 1k? |
@chowyb, for the questions view, the limit is based on the number of respondents (there's a constant set for that in @damithc, for the submission page errors, are the errors only from large courses? It is strange that we are getting an increased number of errors on the submit page. |
The error rate for submission save is not high anymore after I spread the load between 10 servers. I think when many people are trying to submit, we go out of memory and kill all the parallel requests. |
Fixed in #6001 |
The text was updated successfully, but these errors were encountered: