tasks/cephfs: time out on ceph-fuses that don't die #453

jcsp · 2015-06-03T09:19:42Z

For cases where we have e.g. poked the fuse abort
file for a process, but it's still not dying. Because
this is a special class of error (unlike e.g. when
we force umount something because the network is gone)
raise the error instead of trying again to kill
the client.

Fixes: #11835
Signed-off-by: John Spray john.spray@redhat.com

ceph-jenkins · 2015-06-03T09:20:02Z

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-qa-suite-pull-requests/384/
Tests passed for this pull request.

gregsfortytwo · 2015-06-03T18:35:34Z

tasks/cephfs/fuse_mount.py

+        except MaxWhileTries:
+            log.error("process failed to terminate after unmount.  This probably"
+                      "indicates a bug within ceph-fuse.")
+            raise


What does this actually do to the teuthology test runs that hit it? I've no idea but I think most throws in the post-yield cleanup result in hung jobs and nothing getting cleaned up...

@gregsfortytwo in run_tasks.py, in the part following the "Unwinding manager..." log message, it looks to me like exceptions in teardown are explicitly handled.

Entirely possible for a rogue process to be left lying around, but that's why we have nuke

#453

gregsfortytwo · 2015-06-09T22:08:34Z

Looks like this is busting things up; several failures in ubuntu-2015-06-09_11:25:56-fs-greg-fs-testing---basic-multi.

2015-06-09T12:18:44.892 INFO:teuthology.orchestra.run.plana91:Running: 'sudo fusermount -u /home/ubuntu/cephtest/mnt.0'
2015-06-09T12:18:45.017 INFO:tasks.cephfs.fuse_mount.ceph-fuse.0.plana91.stderr:ceph-fuse[19072]: fuse finished with error 0 and tester_r 0
2015-06-09T12:18:45.041 INFO:teuthology.orchestra.run.plana91:Running: "stat --file-system '--printf=%T\n' -- /home/ubuntu/cephtest/mnt.0"
2015-06-09T12:18:45.050 DEBUG:tasks.cephfs.fuse_mount:ceph-fuse not mounted, got fs type 'ext2/ext3'
2015-06-09T12:18:45.050 ERROR:teuthology.run_tasks:Manager failed: ceph-fuse
Traceback (most recent call last):
File "/home/teuthworker/src/teuthology_master/teuthology/run_tasks.py", line 125, in run_tasks
suppress = manager.exit(*exc_info)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/var/lib/teuthworker/src/ceph-qa-suite_greg-fs-testing/tasks/ceph_fuse.py", line 135, in task
mount.umount_wait()
File "/var/lib/teuthworker/src/ceph-qa-suite_greg-fs-testing/tasks/cephfs/fuse_mount.py", line 239, in umount_wait
run.wait(self.fuse_daemon, 30)
File "/home/teuthworker/src/teuthology_master/teuthology/orchestra/run.py", line 393, in wait
not_ready = list(processes)
TypeError: 'RemoteProcess' object is not iterable

For cases where we have e.g. poked the fuse abort file for a process, but it's still not dying. Because this is a special class of error (unlike e.g. when we force umount something because the network is gone) raise the error instead of trying again to kill the client. Fixes: #11835 Signed-off-by: John Spray <john.spray@redhat.com>

ceph-jenkins · 2015-06-10T09:56:38Z

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-qa-suite-pull-requests/395/
Tests passed for this pull request.

jcsp · 2015-06-10T09:56:44Z

Oops, that was a typo to run.wait() (it wants a list of processes). Updated.

#453

Our ffsb and fsync tests contain so many small writes at random offsets that it can take >10 minutes to commit all of them to disk if we get a slower OSD cluster. 15 minutes is still a plenty-fast timeout for this stage compared to just hanging and losing the logs! Signed-off-by: Greg Farnum <gfarnum@redhat.com>

ceph-jenkins · 2015-09-16T21:29:03Z

Refer to this link for build results (access rights to CI server needed):
http://jenkins.ceph.com//job/ceph-qa-suite-pull-requests/640/
Test FAILed.

#453

tasks/cephfs: time out on ceph-fuses that don't die Reviewed-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo reviewed Jun 3, 2015
View reviewed changes

gregsfortytwo assigned jcsp and gregsfortytwo and unassigned jcsp Jun 3, 2015

gregsfortytwo added the wip-greg-testing label Jun 3, 2015

gregsfortytwo added a commit that referenced this pull request Jun 9, 2015

Merge remote-tracking branch 'origin/wip-11835'

11f9dc5

#453

gregsfortytwo assigned jcsp and unassigned gregsfortytwo Jun 9, 2015

jcsp force-pushed the wip-11835 branch from 0b7d065 to 07eb03a Compare June 10, 2015 09:56

jcsp assigned gregsfortytwo and unassigned jcsp Jun 10, 2015

gregsfortytwo added a commit that referenced this pull request Jun 13, 2015

Merge remote-tracking branch 'origin/wip-11835' into greg-fs-testing

9eebee6

#453

gregsfortytwo added a commit that referenced this pull request Aug 24, 2015

Merge remote-tracking branch 'origin/wip-11835' into greg-fs-testing

7a1c059

#453

gregsfortytwo added a commit that referenced this pull request Aug 31, 2015

Merge remote-tracking branch 'origin/wip-11835' into greg-fs-testing

6bfd6a0

#453

gregsfortytwo added a commit that referenced this pull request Sep 16, 2015

Merge remote-tracking branch 'origin/wip-11835' into greg-fs-testing

1807963

#453

gregsfortytwo added a commit that referenced this pull request Sep 21, 2015

Merge pull request #453 from ceph/wip-11835

e3c9947

tasks/cephfs: time out on ceph-fuses that don't die Reviewed-by: Greg Farnum <gfarnum@redhat.com>

gregsfortytwo merged commit e3c9947 into master Sep 21, 2015

gregsfortytwo deleted the wip-11835 branch September 21, 2015 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tasks/cephfs: time out on ceph-fuses that don't die #453

tasks/cephfs: time out on ceph-fuses that don't die #453

jcsp commented Jun 3, 2015

ceph-jenkins commented Jun 3, 2015

gregsfortytwo Jun 3, 2015

jcsp Jun 3, 2015

gregsfortytwo commented Jun 9, 2015

ceph-jenkins commented Jun 10, 2015

jcsp commented Jun 10, 2015

ceph-jenkins commented Sep 16, 2015

tasks/cephfs: time out on ceph-fuses that don't die #453

tasks/cephfs: time out on ceph-fuses that don't die #453

Conversation

jcsp commented Jun 3, 2015

ceph-jenkins commented Jun 3, 2015

gregsfortytwo Jun 3, 2015

Choose a reason for hiding this comment

jcsp Jun 3, 2015

Choose a reason for hiding this comment

gregsfortytwo commented Jun 9, 2015

ceph-jenkins commented Jun 10, 2015

jcsp commented Jun 10, 2015

ceph-jenkins commented Sep 16, 2015