Skip to content

Fix disk partitioning race condition and using partition number 0#2234

Open
chewi wants to merge 3 commits into
coreos:mainfrom
chewi:partx-race
Open

Fix disk partitioning race condition and using partition number 0#2234
chewi wants to merge 3 commits into
coreos:mainfrom
chewi:partx-race

Conversation

@chewi

@chewi chewi commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

I've admittedly lost the output of the race condition triggering, but this is what was going on underneath.

$ partx --add --nr 1 --verbose /dev/nvme0n1
partition: none, disk: /dev/nvme0n1, lower: 1, upper: 1
/dev/nvme0n1: partition table type 'gpt' detected
range recount: max partno=9, lower=1, upper=1
partx: /dev/nvme0n1: adding partition #1 failed: Device or resource busy
partx: /dev/nvme0n1: error adding partition 1

We started reliably seeing this in Flatcar after some batch updates. We don't know exactly which update triggered it, but it was probably systemd. While we could have looked into systemd's changes, I strongly felt that this code was always potentially racy. There was never anything stopping the kernel picking up the partition changes before partx had a chance to run.

This change therefore allows partx to fail and then checks that added/updated partitions have the right start sector and size and that deleted partitions are absent once udev has settled.

On first submitting this change, Gemini highlighted that I had broken the feature that allows you to specify partition number 0 to get the next available slot. On testing this, I found that this was already broken since c2cc56c. Passing 0 to partx causes it to try and add all the partitions, which will almost always fail because the kernel will usually already know about at least some of them.

If anything, my initial change had improved the situation by ignoring the partx failure, but I have now fixed the issue properly. This changes getRealStartAndSize() to also determine and return the resulting partition numbers so that subsequent operations use these instead of 0.

sgdisk does support --new=0, but it has no way to report which partition number it actually used.

Following that, I was able to drop the restriction that prevented users from deleting a partition while creating partition number 0. We previously disallowed this because we didn't resolve the partition numbers to their final values, making it impossible to determine whether they would actually exist in the end.

The error message that was shown mentioned partitions having a start or size of 0 rather than the number. It's not clear why this was.

New cases have been added to tests/positive/partitions/complex-mb.go to cover all this. The creation of the new-auto3 partition triggers the race condition on Fedora 44. I don't know why just this one does, but at least that confirms the fix works.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the disk partitioning logic to validate partition starts and sizes against sysfs after partitioning, and updates partition number types from uint64 to int. However, several critical issues were identified in the review. Changing getRealStartAndSize to return a map instead of a slice introduces non-deterministic ordering and causes partition overwrites when multiple partitions use number 0. Additionally, the new validation logic, partx commands, and deletion checks do not account for partition 0 (which represents the next available partition and does not exist in sysfs or /dev), leading to potential runtime failures. Reverting the map changes to slices and skipping partition 0 in sysfs/partx operations is recommended.

Comment thread internal/exec/stages/disks/partitions.go Outdated
Comment thread internal/exec/stages/disks/partitions.go Outdated
Comment thread internal/exec/stages/disks/partitions.go Outdated
Comment thread internal/exec/stages/disks/partitions.go Outdated
Comment thread internal/exec/stages/disks/partitions.go Outdated
Comment thread internal/exec/stages/disks/partitions.go Outdated
@chewi chewi marked this pull request as draft June 1, 2026 18:51
chewi added 3 commits June 8, 2026 17:01
This was broken since partx was used in commit c2cc56c. Passing 0 to
partx causes it to try and add all the partitions, which will almost
always fail because the kernel will usually already know about at least
some of them.

This changes getRealStartAndSize() to also determine and return the
resulting partition numbers so that subsequent operations use these
instead of 0.

sgdisk does support --new=0, but it has no way to report which partition
number it actually used.

Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>
`partx --add` will fail if the kernel is already aware of the new
partition. It was always theoretically possible that udev might trigger
early, and that appears to be happening now.

Allow partx to fail and then check that added/updated partitions have
the right start sector and size and that deleted partitions are absent
once udev has settled.

Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>
We previously disallowed this because we didn't resolve the partition
numbers to their final values, making it impossible to determine whether
they would actually exist in the end. Now we do.

The error message that was shown mentioned partitions having a start or
size of 0 rather than the number. It's not clear why this was.

Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>
@chewi chewi changed the title Fix disk partitioning race condition between the kernel and partx Fix disk partitioning race condition and using partition number 0 Jun 8, 2026
@chewi chewi marked this pull request as ready for review June 8, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant