New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-disk: Adding retry loop in get_partition_dev() #14275

Merged
1 commit merged into from Apr 4, 2017

Conversation

Projects
None yet
4 participants
@ErwanAliasr1
Contributor

ErwanAliasr1 commented Mar 31, 2017

There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block//* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu erwan@redhat.com

@yuriw yuriw requested review from and tchaikov Mar 31, 2017

@ghost ghost added bug fix core labels Apr 3, 2017

@ghost

This comment has been minimized.

ghost commented Apr 3, 2017

flake8 runtests: commands[0] | flake8 --ignore=H105,H405,E127 ceph_disk tests
ceph_disk/main.py:692:36: E226 missing whitespace around arithmetic operator
ceph_disk/main.py:715:32: E226 missing whitespace around arithmetic operator
ERROR: InvocationError: '/home/jenkins-build/build/workspace/ceph-pull-requests/src/ceph-disk/.tox/flake8/bin/flake8 --ignore=H105,H405,E127 ceph_disk tests'
@ghost

This comment has been minimized.

ghost commented Apr 3, 2017

pushed to ceph-ci, will run the ceph-disk suite when it finishes building

@ghost

This comment has been minimized.

ghost commented Apr 4, 2017

teuthology-suite -k distro --verbose --suite ceph-disk --ceph evelu-cephdisk-race --machine-type vps

ceph-disk: Adding retry loop in get_partition_dev()
There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.

It appear that waiting a very short is usually enough to get the partition beein populated.

Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.

This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.

Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.

A typical output from the build running on a CI with that code.
	command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
	get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
	get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
	get_partition_dev: Found partition 2 for /dev/sda after 1 tries
        get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
	get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid

fixes: #19428

Signed-off-by: Erwan Velu <erwan@redhat.com>
@leseb

This comment has been minimized.

Contributor

leseb commented Apr 4, 2017

@dachary since @ErwanAliasr1 is on PTO I just pushed a new version with flake8 fixes.

@ghost

ghost approved these changes Apr 4, 2017

@ghost ghost merged commit 9f64925 into master Apr 4, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
@leseb

This comment has been minimized.

Contributor

leseb commented Apr 4, 2017

@dachary thanks for the merge :).
@dachary @ktdreyer can we mark this as a backport for Jewel?

@tchaikov tchaikov deleted the evelu-cephdisk-race branch Apr 4, 2017

@ktdreyer

This comment has been minimized.

Member

ktdreyer commented Apr 4, 2017

Sure, just update http://tracker.ceph.com/issues/19428 from "New" to "Pending Backport", and ensure that "jewel" is listed in the "Backport" field.

@leseb

This comment has been minimized.

Contributor

leseb commented Apr 4, 2017

@ktdreyer I'm not the owner, I can not do it, @ErwanAliasr1 is on PTO can someone do it for us? @dachary @ktdreyer ? Thanks!

@ktdreyer

This comment has been minimized.

Member

ktdreyer commented Apr 4, 2017

@leseb would you please coordinate with David Galloway (@djgalloway)? he can grant your Redmine account access.

@leseb

This comment has been minimized.

Contributor

leseb commented Apr 4, 2017

@ktdreyer done, thanks!

@ktdreyer

This comment has been minimized.

Member

ktdreyer commented Apr 4, 2017

@leseb will you be doing the backport?

@dachary can help you get it tested through teuthology.

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Apr 4, 2017

@ktdreyer @leseb jewel backport is here: #14329

@leseb

This comment has been minimized.

Contributor

leseb commented May 11, 2017

@smithfarm do we also have a backport for ceph:kraken?

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented May 11, 2017

@leseb I added the kraken backport to http://tracker.ceph.com/issues/19428

@leseb

This comment has been minimized.

Contributor

leseb commented May 11, 2017

@smithfarm thanks!

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment