New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule: do not report status for first and last in suite jobs #1472
Conversation
@tchaikov I guess you're familiar with --first-in-suite and --last-in-suite jobs, I wonder why we would ever want to push their statuses to paddles? |
@susebot run deploy |
I dont remember but I think the email needs to know when to send out and it needs status of those jobs, we cannto remove them. |
@vasukulkarni |
3e05a70
to
6193f2c
Compare
The changes look ok to me, but the manual task to go over it requires quite a bit of coordination in labs. |
Yah, we will need help of @djgalloway anyways, I will spend some time for testing in isolated environment before I remove DNM label. |
Commit 3e05a70 is OK. |
retest this please |
teuthology/worker.py
Outdated
@@ -191,8 +191,6 @@ def prep_job(job_config, log_file_path, archive_dir): | |||
def run_job(job_config, teuth_bin_path, archive_dir, verbose): | |||
safe_archive = safepath.munge(job_config['name']) | |||
if job_config.get('first_in_suite') or job_config.get('last_in_suite'): | |||
if teuth_config.results_server: | |||
report.try_delete_jobs(job_config['name'], job_config['job_id']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we still need this to support older users, since they may not be able to update their teuthology sandboxes to the latest code and correspondingly the jobs may continue to be added by some users.
I only see that we can add try-catch block to save worker from dying and maybe add couple of tries with pause to handle situations which we have downstream, when paddles is got the job to be add in some thread awaiting it's own turn and worker already picked the the LIS and FIS jobs and started to processing them trying to delete job which is not created yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corresponding PR #1524
Addresses the issue when teuthology run gets stuck with first_in_suite or laste_in_suite jobs in queued state. Attention: This change requires the next steps, which are not mutually exclusive: 1) server teuthology worker restart, otherwise old worker's code will try to remove reported job from paddles and exit with unexpected exception. 2) user's teuthology runner environment should be updated to recent code, because new workers will not cleanup FIS and LIS jobs and they will remain in paddles, correspondingly the run will get stuck. Requires: a34fb6a Fixes: http://tracker.ceph.com/issues/43291 Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@suse.com>
6193f2c
to
b164cdc
Compare
@susebot run deploy |
Commit b164cdc is NOT OK. |
Both failed jobs are due to unrelated issue:
|
Another run passed: |
Addresses the issue when teuthology run gets stuck with
first_in_suite or laste_in_suite jobs in queued state.
Attention: This change requires both steps, which are not mutually exclusive
remove reported job from paddles and exit with unexpected exception.
because new workers will not cleanup FIS and LIS jobs and they will remain
in paddles, correspondingly the run will get stuck.
Requires: a34fb6a
Fixes: http://tracker.ceph.com/issues/43291
Signed-off-by: Kyr Shatskyy kyrylo.shatskyy@suse.com