Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts in the KernelCI dashboard Tests and Jobs sections #235

Open
laura-nao opened this issue Sep 1, 2023 · 4 comments
Open

Timeouts in the KernelCI dashboard Tests and Jobs sections #235

laura-nao opened this issue Sep 1, 2023 · 4 comments

Comments

@laura-nao
Copy link

Browsing the Tests and Jobs section in the KernelCI dashboard (https://linux.kernelci.org/) often results in the following error: Error while loading data from the server (error code: 0). Please contact the website administrator

Sometimes the results are loaded correctly after refreshing the page, sometimes they don't appear for several minutes.
I observed that browsing through the Jobs section usually results in less timeouts, but they still happen.

Browsing the Tests section is more convenient when looking for the results for a specific test plan, is there any way the reliability of this page can be improved?

@gctucker
Copy link
Contributor

gctucker commented Sep 1, 2023

Could you please provide full URLs where this happens? I used the web dashboard and didn't hit any timeout.

Yes using the Tests page directly doesn't work, only the Jobs page works. It may be that some slow queries initiated via the broken pages caused timeouts when going through Jobs.

I think the solution here is to remove the tabs in the navigation bar and only follow the Jobs approach.

@laura-nao
Copy link
Author

Regarding the Jobs section, this morning I got a timeout on https://linux.kernelci.org/job/mainline/branch/master/ and https://linux.kernelci.org/job/next/branch/master/ . After refreshing the page, the results were loaded after a few seconds.

On a side note, I get the same errors when trying to load results from the SoC page too - for instance: https://linux.kernelci.org/soc/mediatek/job/mainline/

@gctucker
Copy link
Contributor

gctucker commented Sep 1, 2023

Yes the SoC page has been known to hit timeouts too. OK thanks for confirming, these are all known issues that have been there since the beginning. I think they're becoming more visible as more people are using the frontend. As it's now a deprecated system we can only remove the parts that cause some timeouts, fixing it would take days and we need to focus on the new API and web dashboard instead.

We might also need to do something at the backend level to avoid long queries to cause some kind of DoS, probably by reducing the timeout for MongoDB queries and Celery tasks to abort earlier. We can easily have a "query storm" if someone issues a series of really intensive queries that take several seconds each to process. Queries for regular dashboard usage going via the Jobs page typically take less than 100ms, and often a lot less than that when hitting the cache.

@gctucker
Copy link
Contributor

gctucker commented Sep 4, 2023

Quick update, I'm wrapping things up for the Early Access phase which is starting today. Then probably we'll be able to look into this issue again later this week or next week. If this is more urgent and someone is available to help then please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants