Skip to content
This repository

Jobs are aborted prematurely #117

Open
kparal opened this Issue December 16, 2011 · 2 comments

2 participants

Kamil Páral Lucas Meneghel Rodrigues
Kamil Páral

The jobs are aborted too early. I have max runtime set to 2 hours, but the jobs are aborted even after 1 minute of runtime! It seems that there is some watchdog running periodically every hour (in my case every second minute of every hour) and it just cancels some (all?) jobs that are running at that time.

$ grep job_ ~autotest/global_config.ini
job_timeout_default: 72
job_max_runtime_hrs_default: 2
$ grep 9542 ~autotest/logs/scheduler.log.2011-12-14-04.00.22 
12/15 19:01:04 INFO |scheduler_:0496| Assigning host virt24.qa to entry no host/9542 (9542) Queued
12/15 19:01:04 INFO |scheduler_:0516| creating block 9542/7
12/15 19:01:09 INFO |scheduler_:0673| post-koji-build:rpmlint.noarch/6/None (job 9542, entry 9542) scheduled on virt24.qa, status=Queued
12/15 19:01:09 INFO |scheduler_:0561| virt24.qa/9542 (9542) Queued -> Verifying
12/15 19:01:15 INFO |scheduler_:0561| virt24.qa/9542 (9542) Verifying [active] -> Verifying
12/15 19:01:20 INFO |scheduler_:0561| virt24.qa/9542 (9542) Verifying [active] -> Pending
12/15 19:01:20 INFO |scheduler_:0561| virt24.qa/9542 (9542) Pending [active] -> Starting
12/15 19:01:25 INFO |scheduler_:0561| virt24.qa/9542 (9542) Starting [active] -> Running
12/15 19:01:25 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '-r', u'/usr/share/autotest/results/9542-autotest/virt24.qa', '-m', u'virt24.qa', '-u', u'autotest', '-l', u'post-koji-build:rpmlint.noarch', '-P', u'9542-autotest/virt24.qa', '-n', '/usr/share/autotest/results/drone_tmp/attach.1142', '-c']
12/15 19:01:25 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.autoserv_execute
12/15 19:02:12 WARNI|monitor_db:0085| Aborting entry virt24.qa/9542 (9542) due to max runtime
12/15 19:02:12 WARNI|scheduler_:0183| initialized <class 'autotest_lib.scheduler.scheduler_models.HostQueueEntry'> 9542 instance requery is updating: {'aborted': (0, 1)}
12/15 19:02:12 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Running [active,aborted]
12/15 19:02:12 INFO |scheduler_:0561| virt24.qa/9542 (9542) Running [active,aborted] -> Gathering
12/15 19:02:23 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Gathering [active,aborted]
12/15 19:02:23 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.collect_crashinfo_execute
12/15 19:02:23 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '--pidfile-label=collect_crashinfo', '--use-existing-results', '--collect-crashinfo', '-m', u'virt24.qa', '-r', u'/usr/share/autotest/results/9542-autotest/virt24.qa']
12/15 19:02:23 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9542-autotest/virt24.qa/.collect_crashinfo.log
12/15 19:02:29 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Gathering [active,aborted]
12/15 19:02:29 INFO |scheduler_:0561| virt24.qa/9542 (9542) Gathering [active,aborted] -> Parsing
12/15 19:02:34 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Parsing [aborted]
12/15 19:02:34 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.parser_execute
12/15 19:02:34 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/tko/parse', '--write-pidfile', '-l', '2', '-r', '-o', u'/usr/share/autotest/results/9542-autotest/virt24.qa']
12/15 19:02:34 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9542-autotest/virt24.qa/.parse.log
12/15 19:02:39 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Parsing [aborted]
12/15 19:02:39 INFO |scheduler_:0561| virt24.qa/9542 (9542) Parsing [aborted] -> Archiving
12/15 19:02:44 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Archiving [aborted]
12/15 19:02:44 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.archiver_execute
12/15 19:02:44 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '--pidfile-label=archiver', '-r', u'/usr/share/autotest/results/9542-autotest/virt24.qa', '--use-existing-results', '--control-filename=control.archive', '/usr/share/autotest/scheduler/archive_results.control.srv']
12/15 19:02:44 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9542-autotest/virt24.qa/.archiving.log
12/15 19:02:49 INFO |monitor_db:0734| Aborting virt24.qa/9542 (9542) Archiving [aborted]
12/15 19:02:49 INFO |scheduler_:0561| virt24.qa/9542 (9542) Archiving [aborted] -> Aborted
12/15 19:02:49 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.autoserv_execute
12/15 19:02:49 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.collect_crashinfo_execute
12/15 19:02:49 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.parser_execute
12/15 19:02:49 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9542-autotest/virt24.qa/.archiver_execute
$ grep 9568 ~autotest/logs/scheduler.log.2011-12-14-04.00.22 
12/15 22:50:27 INFO |scheduler_:0496| Assigning host virt26.qa to entry no host/9568 (9568) Queued
12/15 22:50:27 INFO |scheduler_:0516| creating block 9568/4
12/15 22:50:32 INFO |scheduler_:0673| post-koji-build:rpmguard.noarch/6/None (job 9568, entry 9568) scheduled on virt26.qa, status=Queued
12/15 22:50:32 INFO |scheduler_:0561| virt26.qa/9568 (9568) Queued -> Verifying
12/15 22:50:37 INFO |scheduler_:0561| virt26.qa/9568 (9568) Verifying [active] -> Verifying
12/15 22:50:42 INFO |scheduler_:0561| virt26.qa/9568 (9568) Verifying [active] -> Pending
12/15 22:50:42 INFO |scheduler_:0561| virt26.qa/9568 (9568) Pending [active] -> Starting
12/15 22:50:48 INFO |scheduler_:0561| virt26.qa/9568 (9568) Starting [active] -> Running
12/15 22:50:48 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '-r', u'/usr/share/autotest/results/9568-autotest/virt26.qa', '-m', u'virt26.qa', '-u', u'autotest', '-l', u'post-koji-build:rpmguard.noarch', '-P', u'9568-autotest/virt26.qa', '-n', '/usr/share/autotest/results/drone_tmp/attach.1211', '-c']
12/15 22:50:48 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.autoserv_execute
12/15 23:02:21 WARNI|monitor_db:0085| Aborting entry virt26.qa/9568 (9568) due to max runtime
12/15 23:02:21 WARNI|scheduler_:0183| initialized <class 'autotest_lib.scheduler.scheduler_models.HostQueueEntry'> 9568 instance requery is updating: {'aborted': (0, 1)}
12/15 23:02:21 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Running [active,aborted]
12/15 23:02:21 INFO |scheduler_:0561| virt26.qa/9568 (9568) Running [active,aborted] -> Gathering
12/15 23:02:32 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Gathering [active,aborted]
12/15 23:02:32 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.collect_crashinfo_execute
12/15 23:02:32 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '--pidfile-label=collect_crashinfo', '--use-existing-results', '--collect-crashinfo', '-m', u'virt26.qa', '-r', u'/usr/share/autotest/results/9568-autotest/virt26.qa']
12/15 23:02:32 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9568-autotest/virt26.qa/.collect_crashinfo.log
12/15 23:02:37 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Gathering [active,aborted]
12/15 23:02:37 INFO |scheduler_:0561| virt26.qa/9568 (9568) Gathering [active,aborted] -> Parsing
12/15 23:02:42 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Parsing [aborted]
12/15 23:02:42 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.parser_execute
12/15 23:02:42 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/tko/parse', '--write-pidfile', '-l', '2', '-r', '-o', u'/usr/share/autotest/results/9568-autotest/virt26.qa']
12/15 23:02:42 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9568-autotest/virt26.qa/.parse.log
12/15 23:02:47 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Parsing [aborted]
12/15 23:02:47 INFO |scheduler_:0561| virt26.qa/9568 (9568) Parsing [aborted] -> Archiving
12/15 23:02:52 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Archiving [aborted]
12/15 23:02:52 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.archiver_execute
12/15 23:02:52 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '--pidfile-label=archiver', '-r', u'/usr/share/autotest/results/9568-autotest/virt26.qa', '--use-existing-results', '--control-filename=control.archive', '/usr/share/autotest/scheduler/archive_results.control.srv']
12/15 23:02:52 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9568-autotest/virt26.qa/.archiving.log
12/15 23:02:57 INFO |monitor_db:0734| Aborting virt26.qa/9568 (9568) Archiving [aborted]
12/15 23:02:57 INFO |scheduler_:0561| virt26.qa/9568 (9568) Archiving [aborted] -> Aborted
12/15 23:02:57 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.autoserv_execute
12/15 23:02:57 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.collect_crashinfo_execute
12/15 23:02:57 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.parser_execute
12/15 23:02:57 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9568-autotest/virt26.qa/.archiver_execute
$ grep 9574 ~autotest/logs/scheduler.log.2011-12-14-04.00.22
12/15 23:50:31 INFO |scheduler_:0496| Assigning host virt26.qa to entry no host/9574 (9574) Queued
12/15 23:50:31 INFO |scheduler_:0516| creating block 9574/4
12/15 23:50:36 INFO |scheduler_:0673| post-koji-build:rpmguard.noarch/6/None (job 9574, entry 9574) scheduled on virt26.qa, status=Queued
12/15 23:50:36 INFO |scheduler_:0561| virt26.qa/9574 (9574) Queued -> Verifying
12/15 23:50:41 INFO |scheduler_:0561| virt26.qa/9574 (9574) Verifying [active] -> Verifying
12/15 23:50:46 INFO |scheduler_:0561| virt26.qa/9574 (9574) Verifying [active] -> Pending
12/15 23:50:46 INFO |scheduler_:0561| virt26.qa/9574 (9574) Pending [active] -> Starting
12/15 23:50:52 INFO |scheduler_:0561| virt26.qa/9574 (9574) Starting [active] -> Running
12/15 23:50:52 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '-r', u'/usr/share/autotest/results/9574-autotest/virt26.qa', '-m', u'virt26.qa', '-u', u'autotest', '-l', u'post-koji-build:rpmguard.noarch', '-P', u'9574-autotest/virt26.qa', '-n', '/usr/share/autotest/results/drone_tmp/attach.1230', '-c']
12/15 23:50:52 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.autoserv_execute
12/16 00:02:25 WARNI|monitor_db:0085| Aborting entry virt26.qa/9574 (9574) due to max runtime
12/16 00:02:25 WARNI|scheduler_:0183| initialized <class 'autotest_lib.scheduler.scheduler_models.HostQueueEntry'> 9574 instance requery is updating: {'aborted': (0, 1)}
12/16 00:02:25 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Running [active,aborted]
12/16 00:02:25 INFO |scheduler_:0561| virt26.qa/9574 (9574) Running [active,aborted] -> Gathering
12/16 00:02:36 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Gathering [active,aborted]
12/16 00:02:36 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.collect_crashinfo_execute
12/16 00:02:36 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '--pidfile-label=collect_crashinfo', '--use-existing-results', '--collect-crashinfo', '-m', u'virt26.qa', '-r', u'/usr/share/autotest/results/9574-autotest/virt26.qa']
12/16 00:02:36 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9574-autotest/virt26.qa/.collect_crashinfo.log
12/16 00:02:41 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Gathering [active,aborted]
12/16 00:02:41 INFO |scheduler_:0561| virt26.qa/9574 (9574) Gathering [active,aborted] -> Parsing
12/16 00:02:46 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Parsing [aborted]
12/16 00:02:46 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.parser_execute
12/16 00:02:46 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/tko/parse', '--write-pidfile', '-l', '2', '-r', '-o', u'/usr/share/autotest/results/9574-autotest/virt26.qa']
12/16 00:02:46 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9574-autotest/virt26.qa/.parse.log
12/16 00:02:51 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Parsing [aborted]
12/16 00:02:51 INFO |scheduler_:0561| virt26.qa/9574 (9574) Parsing [aborted] -> Archiving
12/16 00:02:56 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Archiving [aborted]
12/16 00:02:56 INFO |drone_mana:0565| monitoring pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.archiver_execute
12/16 00:02:56 INFO |drone_mana:0539| command = ['nice', '-n', '10', '/usr/share/autotest/server/autoserv', '-p', '--pidfile-label=archiver', '-r', u'/usr/share/autotest/results/9574-autotest/virt26.qa', '--use-existing-results', '--control-filename=control.archive', '/usr/share/autotest/scheduler/archive_results.control.srv']
12/16 00:02:56 INFO |drone_mana:0540| log file = localhost:/usr/share/autotest/results/9574-autotest/virt26.qa/.archiving.log
12/16 00:03:01 INFO |monitor_db:0734| Aborting virt26.qa/9574 (9574) Archiving [aborted]
12/16 00:03:01 INFO |scheduler_:0561| virt26.qa/9574 (9574) Archiving [aborted] -> Aborted
12/16 00:03:01 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.autoserv_execute
12/16 00:03:01 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.collect_crashinfo_execute
12/16 00:03:01 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.parser_execute
12/16 00:03:01 INFO |drone_mana:0577| forgetting pidfile /usr/share/autotest/results/9574-autotest/virt26.qa/.archiver_execute
Lucas Meneghel Rodrigues
Owner

Kamil, I remember James reported it a while ago, but assumed it was resolved with the new autotest released. Well, I'll need some time to review this, I'm in the middle of a move to another city so please bear with me.

Kamil Páral

It was not :/ We'll hold, don't worry :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.