ci: models: avoid polling jobs waiting more than a week on the backend #1012

chaws · 2021-12-06T19:03:34Z

SQUAD has no way to tell whether a TestJob has been worked on by its backend. It might be that the device is out or that the backend is undergoing a unusually long maintenance. Overtime, jobs in this scenario will start clogging up the fetch queue, delaying other fetched jobs.

I hardcoded this feature for a week, because that's the usual behavior I noticed, but this can be done via a backend setting as well if requested.

SQUAD has no way to tell whether a TestJob has been worked on by its backend. It might be that the device is out or that the backend is undergoing a unusually long maintenance. Overtime, jobs in this scenario will start clogging up the fetch queue, delaying other fetched jobs. I hardcoded this feature for a week, because that's the usual behavior I noticed, but this can be done via a backend setting as well if requested. Signed-off-by: Charles Oliveira <charles.oliveira@linaro.org>

chaws · 2021-12-09T05:45:57Z

@mrchapp There are a few jobs in NXP that are waiting over a week in their LAVA instance (like this one https://lavalab.nxp.com/scheduler/job/744848), and this PR acts exactly on this kind of job. Specially on NXP, there are old hanging jobs that take ~10 seconds to get a response from the LAVA instance.

mrchapp · 2021-12-09T20:37:39Z

I'm thinking that we eventually want to get those lagged results, even if only for data mining purposes.

Can we ping the LAVA server first and determine if a round of fetching should be initiated based on that? I guess what we want to avoid is the continuous time-outs from an unresponsive server.

chaws · 2021-12-10T03:18:18Z

You have a good point. I think I will revisit the LAVA/SQUAD signals and have LAVA tell SQUAD when a job is ready for fetching. Sometimes jobs have Submitted status, but sometimes that's not the case.

I don't fully understand why.

By default SQUAD attempts fetching jobs despite whatever signal LAVA sent wrt the job.

One solution is to have SQUAD avoid polling jobs with status="Submitted" (like this NXP job). Then whenever LAVA signals SQUAD that the job is ready, it'll be queued then fetched. Downside is that if the lava lab failed to notify SQUAD, the job will never be fetched.

jscook2345 approved these changes Dec 6, 2021

View reviewed changes

chaws mentioned this pull request Nov 10, 2023

[LAVA] Query devices before polling #1111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: models: avoid polling jobs waiting more than a week on the backend #1012

ci: models: avoid polling jobs waiting more than a week on the backend #1012

chaws commented Dec 6, 2021

chaws commented Dec 9, 2021

mrchapp commented Dec 9, 2021

chaws commented Dec 10, 2021

ci: models: avoid polling jobs waiting more than a week on the backend #1012

Are you sure you want to change the base?

ci: models: avoid polling jobs waiting more than a week on the backend #1012

Conversation

chaws commented Dec 6, 2021

chaws commented Dec 9, 2021

mrchapp commented Dec 9, 2021

chaws commented Dec 10, 2021