Fix disk partitioning race condition and using partition number 0#2234
Fix disk partitioning race condition and using partition number 0#2234chewi wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the disk partitioning logic to validate partition starts and sizes against sysfs after partitioning, and updates partition number types from uint64 to int. However, several critical issues were identified in the review. Changing getRealStartAndSize to return a map instead of a slice introduces non-deterministic ordering and causes partition overwrites when multiple partitions use number 0. Additionally, the new validation logic, partx commands, and deletion checks do not account for partition 0 (which represents the next available partition and does not exist in sysfs or /dev), leading to potential runtime failures. Reverting the map changes to slices and skipping partition 0 in sysfs/partx operations is recommended.
This was broken since partx was used in commit c2cc56c. Passing 0 to partx causes it to try and add all the partitions, which will almost always fail because the kernel will usually already know about at least some of them. This changes getRealStartAndSize() to also determine and return the resulting partition numbers so that subsequent operations use these instead of 0. sgdisk does support --new=0, but it has no way to report which partition number it actually used. Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>
`partx --add` will fail if the kernel is already aware of the new partition. It was always theoretically possible that udev might trigger early, and that appears to be happening now. Allow partx to fail and then check that added/updated partitions have the right start sector and size and that deleted partitions are absent once udev has settled. Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>
We previously disallowed this because we didn't resolve the partition numbers to their final values, making it impossible to determine whether they would actually exist in the end. Now we do. The error message that was shown mentioned partitions having a start or size of 0 rather than the number. It's not clear why this was. Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>
I've admittedly lost the output of the race condition triggering, but this is what was going on underneath.
We started reliably seeing this in Flatcar after some batch updates. We don't know exactly which update triggered it, but it was probably systemd. While we could have looked into systemd's changes, I strongly felt that this code was always potentially racy. There was never anything stopping the kernel picking up the partition changes before partx had a chance to run.
This change therefore allows partx to fail and then checks that added/updated partitions have the right start sector and size and that deleted partitions are absent once udev has settled.
On first submitting this change, Gemini highlighted that I had broken the feature that allows you to specify partition number 0 to get the next available slot. On testing this, I found that this was already broken since c2cc56c. Passing 0 to partx causes it to try and add all the partitions, which will almost always fail because the kernel will usually already know about at least some of them.
If anything, my initial change had improved the situation by ignoring the partx failure, but I have now fixed the issue properly. This changes
getRealStartAndSize()to also determine and return the resulting partition numbers so that subsequent operations use these instead of 0.sgdisk does support
--new=0, but it has no way to report which partition number it actually used.Following that, I was able to drop the restriction that prevented users from deleting a partition while creating partition number 0. We previously disallowed this because we didn't resolve the partition numbers to their final values, making it impossible to determine whether they would actually exist in the end.
The error message that was shown mentioned partitions having a start or size of 0 rather than the number. It's not clear why this was.
New cases have been added to tests/positive/partitions/complex-mb.go to cover all this. The creation of the
new-auto3partition triggers the race condition on Fedora 44. I don't know why just this one does, but at least that confirms the fix works.