Fix disk partitioning race condition and using partition number 0 by chewi · Pull Request #2234 · coreos/ignition

chewi · 2026-06-01T16:34:10Z

I've admittedly lost the output of the race condition triggering, but this is what was going on underneath.

$ partx --add --nr 1 --verbose /dev/nvme0n1
partition: none, disk: /dev/nvme0n1, lower: 1, upper: 1
/dev/nvme0n1: partition table type 'gpt' detected
range recount: max partno=9, lower=1, upper=1
partx: /dev/nvme0n1: adding partition #1 failed: Device or resource busy
partx: /dev/nvme0n1: error adding partition 1

We started reliably seeing this in Flatcar after some batch updates. We don't know exactly which update triggered it, but it was probably systemd. While we could have looked into systemd's changes, I strongly felt that this code was always potentially racy. There was never anything stopping the kernel picking up the partition changes before partx had a chance to run.

This change therefore allows partx to fail and then checks that added/updated partitions have the right start sector and size and that deleted partitions are absent once udev has settled.

On first submitting this change, Gemini highlighted that I had broken the feature that allows you to specify partition number 0 to get the next available slot. On testing this, I found that this was already broken since c2cc56c. Passing 0 to partx causes it to try and add all the partitions, which will almost always fail because the kernel will usually already know about at least some of them.

If anything, my initial change had improved the situation by ignoring the partx failure, but I have now fixed the issue properly. This changes getRealStartAndSize() to also determine and return the resulting partition numbers so that subsequent operations use these instead of 0.

sgdisk does support --new=0, but it has no way to report which partition number it actually used.

Following that, I was able to drop the restriction that prevented users from deleting a partition while creating partition number 0. We previously disallowed this because we didn't resolve the partition numbers to their final values, making it impossible to determine whether they would actually exist in the end.

The error message that was shown mentioned partitions having a start or size of 0 rather than the number. It's not clear why this was.

New cases have been added to tests/positive/partitions/complex-mb.go to cover all this. The creation of the new-auto3 partition triggers the race condition on Fedora 44. I don't know why just this one does, but at least that confirms the fix works.

gemini-code-assist

Code Review

This pull request modifies the disk partitioning logic to validate partition starts and sizes against sysfs after partitioning, and updates partition number types from uint64 to int. However, several critical issues were identified in the review. Changing getRealStartAndSize to return a map instead of a slice introduces non-deterministic ordering and causes partition overwrites when multiple partitions use number 0. Additionally, the new validation logic, partx commands, and deletion checks do not account for partition 0 (which represents the next available partition and does not exist in sysfs or /dev), leading to potential runtime failures. Reverting the map changes to slices and skipping partition 0 in sysfs/partx operations is recommended.

This was broken since partx was used in commit c2cc56c. Passing 0 to partx causes it to try and add all the partitions, which will almost always fail because the kernel will usually already know about at least some of them. This changes getRealStartAndSize() to also determine and return the resulting partition numbers so that subsequent operations use these instead of 0. sgdisk does support --new=0, but it has no way to report which partition number it actually used. Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>

`partx --add` will fail if the kernel is already aware of the new partition. It was always theoretically possible that udev might trigger early, and that appears to be happening now. Allow partx to fail and then check that added/updated partitions have the right start sector and size and that deleted partitions are absent once udev has settled. Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>

We previously disallowed this because we didn't resolve the partition numbers to their final values, making it impossible to determine whether they would actually exist in the end. Now we do. The error message that was shown mentioned partitions having a start or size of 0 rather than the number. It's not clear why this was. Signed-off-by: James Le Cuirot <jlecuirot@microsoft.com>

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

chewi marked this pull request as draft June 1, 2026 18:51

chewi force-pushed the partx-race branch from 5540de3 to 1514e0b Compare June 2, 2026 15:27

chewi added 3 commits June 8, 2026 17:01

chewi force-pushed the partx-race branch from 1514e0b to f9ac93d Compare June 8, 2026 16:07

chewi changed the title ~~Fix disk partitioning race condition between the kernel and partx~~ Fix disk partitioning race condition and using partition number 0 Jun 8, 2026

chewi marked this pull request as ready for review June 8, 2026 16:35

chewi mentioned this pull request Jun 8, 2026

sys-apps/ignition: Fix giving partition number 0 to get next available flatcar/scripts#4077

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix disk partitioning race condition and using partition number 0#2234

Fix disk partitioning race condition and using partition number 0#2234
chewi wants to merge 3 commits into
coreos:mainfrom
chewi:partx-race

chewi commented Jun 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chewi commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chewi commented Jun 1, 2026 •

edited

Loading