Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 4k storage #4974

Open
ij1 opened this issue Apr 13, 2019 · 51 comments
Open

Support 4k storage #4974

ij1 opened this issue Apr 13, 2019 · 51 comments
Labels
C: installer C: storage hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.

Comments

@ij1
Copy link

ij1 commented Apr 13, 2019

Qubes OS version

R4.0

Affected component(s) or functionality

VMs not working/starting right from a fresh install.

Brief summary

Right after a fresh install, all VMs fail to mount root and therefore fails to start beyond the point where they expect /dev/xvda3 available. This happens on a device that has 4kB logical and physical block sizes (NVMe drive). This was not problem in R3.2 (as it used files by default for VM storage).

To Reproduce

Steps to reproduce the behavior:

  1. Install Qubes to a drive with 4kB sector size (both logical / physical); (I put /boot to a SATA drive with 512B sectors to avoid BIOS/NVMe boot challenges, rest of the system is on the NVMe with 4kB sectors).
  2. Firstboot stuff fails
  3. After clicking "finish" for firstboot, find out that no VM will start successfully (which explains firstboot failures I guess)
  4. Look to the VM logs, and find this from there:
[    0.887548] blkfront: xvda: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    0.902355] blkfront: xvdb: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    0.924386] blkfront: xvdc: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
[    0.940325] blkfront: xvdd: flush diskcache: enabled; persistent grants: enabled; indirect descriptors: enabled;
Waiting for /dev/xvda* devices...
Qubes: Doing R/W setup for TemplateVM...
[    1.049451] random: sfdisk: uninitialized urandom read (4 bytes read)
[    1.052481]  xvdc: xvdc1
[    1.060250] random: mkswap: uninitialized urandom read (16 bytes read)
Setting up swapspace version 1, size = 8 GiB (8589930496 bytes)
no label, UUID=...
Qubes: done.
mount: wrong fs type, bad option, bad superblock on /dev/xvda,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
Waiting for /dev/xvdd device...
mount: /dev/xvdd is write-protected, mounting read-only
[    1.099814] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem
[    1.106796] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Opts: (null)
mount: /sysroot not mounted or bad option

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
[    1.119049] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1e335a008d5, max_idle_ns: 440795216613 ns
mount: /sysroot not mounted or bad option

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
switch_root: failed to mount moving /sysroot to /: Invalid argument
switch_root: failed. Sorry.
[    1.217841] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
...

Expected behavior

VMs would start. Firstboot stuff would work. Drives with 4kB sector size would work.

Additional context

I've tracked this down to the handling of the partition table. With 512B sectors the location of the GPT differs from that of with 4kB sectors and therefore VMs fail to find the correct partition table from xvda. Obviously also the partition start/end values will be off by the factor of 8 because the templates are built(?) with an assumption of 512B sector size.

I'm not sure if there are other assumptions based on 512B sectors with the other /dev/xvd* drives.

Solutions you've tried

I cloned a template and I tried to manually fix the partition table of the clone (in dom0 through /dev/qubes_dom0/...). There's was plenty of space before the first partition, however, at the end the drive is so tight on space that the GPT secondary table won't fit so the xvda3 partition's tail was truncated slightly and I didn't try to resize its filesystem first (this probably causes some problems, potentially corruption?). With such a fixed partition table, I could start VMs (but there are then some other problems/oddities that might be due to incomplete firstboot or non-fixed fedora template, I only fixed the debian one which I mainly use normally). I could possibly enlarge the relevant LV slightly to avoid the truncate problem at the tail of xvda3 but I've not tried that yet.

I tried to look if I could somehow force pv/vg/lv chain to fake the logical sector size but couldn't find anything from the manpages.

Libvirt might be able to fake the logical_block_size but I've not yet tried that.

Relevant documentation you've consulted

During install, I used the custom install steps to create manual partitioning (but I think it is irrelevant).

Related, non-duplicate issues

None I could find, some other issues included failure to mount root successfully but the causes are different.

@ij1 ij1 added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Apr 13, 2019
@marmarek
Copy link
Member

Sector size is advertised by the block backend in xenstore (xenstore-ls /local/domain/0/backend/vbd/$DOMID/51712), but I don't see any option to force specific value.

This issue is really unfortunate, because a lot of places in Qubes assume you can freely transfer disk image and it will work just fine. This include cloning VMs (including cloning to a different storage pool), backup/restore etc. So, the solution here should be either:

  • find a way to construct a disk image to work on both 4k and 512 sector size
  • force VM to see 512 sector size

The second one may come with a performance penalty. The first one would not have this problem, but not sure if it's possible. I'm fine with making partition table 4K aligned, as long as it will also work with 512 sector size. But it isn't clear to me it would be enough.

Partition table and filesystem are built here: https://github.com/QubesOS/qubes-linux-template-builder/blob/master/prepare_image#L63-L83

Another idea would be to revert to a filesystem directly on /dev/xvda (without any partition table). This may not be as simple as it sounds, because we need to fit grub somewhere (with HVM with in-VM kernel case).

But this all may not work for other cases, including other OS. Imagine installing some OS (Linux, Windows, whatever) in a standalone HVM and then moving it to another storage pool (or restoring a backup on another machine). Those cases may require emulating constant sector size.

Sadly, I don't have any hardware with 4k physical sector size to test on. I'll try to find a way to emulate one.

BTW, another issue from 4k sector size is 8GB of swap, instead of 1GB. But this should be easy to fix in this script

@marmarek
Copy link
Member

A lot of useful info: https://superuser.com/questions/679725/how-to-correct-512-byte-sector-mbr-on-a-4096-byte-sector-disk
There is also a script to parse 512-byte GPT on 4k disk (and map it using loop devices).
Using this, one workaround would be to adjust init.sh to rewrite GPT if sector mismatch is detected (in either direction). This require the partitions to be 4k aligned before, but it should be doable.
But this is far from a complete solution, given non-template-based Linux use cases.

@ij1
Copy link
Author

ij1 commented Apr 14, 2019

There's not much to worry about 4k alignment, it is already there in the template: what I gathered, the partition table tools nowadays will enforce at least 4k alignment and they will warn if that would be violated (some might do even larger alignment). This is why I managed to rewrite the template's partition table in the first place so easily (except the truncate issue).

I don't think forcing 512 sector size itself would come with a large penalty as in practice the filesystems inside will use something larger than 512 (depending on how all relevant block stuff handles the larger continuous units of course but I'd guess that would not cause performance problems). So it would be mostly relevant for booting up correctly. What I'd rather avoid though, is forcing my drive's firmware to use 512 sector size as it would explore less tested corners of the firmware and possibly have significant performance impact too (I know my NVMe drive could do 512 but I don't know if all 4k drives are able so this needs to be handled anyway).

Btw, the USB HDDs might expose 4kB when not behind the SATA-to-USB converter, perhaps you have one of them which you can disassemble to get such a device? (losetup seems able to fake it as noted below)

Like I said, libvirt supposedly has a way to configure logical_block_size but I don't know if that is able to fake it for real:
https://libvirt.org/formatdomain.html
...or is that only for KVM?

I'll probably try to use the file backend (that's what was used in R3.2 right?) for the main system for now (the NVMe drive should clone fast anyway :-) so the biggest downside I know of is a non-issue). Can the installer do that automatically if I simply reinstall (that is, how it chooses which type of storage pool to use by default) or do I have to manually setup everything afterwards skipping the firstboot stuff to avoid it failing? I can then look into the 4k stuff while other stuff keeps working fine with 512. That would also allow me to easily test cross 512 and 4k copying but that looks rather scary to begin with, so far about nothing seems portable from one sector size to another from what I've read.

If the partition table would be removed from xvda, the grub might have a similar 4k vs 512 issue anyway so that might not solve anything (sector was mentioned somewhere when I tried to look into what kind of information format it uses which sound bad) but this needs a deeper investigation.

@ij1
Copy link
Author

ij1 commented Apr 14, 2019

Losetup seems able to fake logical block size:

util-linux/util-linux@a1a4159

@marmarek
Copy link
Member

...or is that only for KVM?

Yes, I think it's KVM only.

Can the installer do that automatically if I simply reinstall

If you choose btrfs during installation, Qubes will use that instead of LVM.

@ij1
Copy link
Author

ij1 commented Apr 15, 2019

Could not the faking be done other way around? I'd feel by intuition that in block code log4096 -> phy512 is far simpler than log512 -> phy4096.

Or is there some particular reason why 512 is still needed for the VM disk format that is almost internal to Qubes. VMs will obviously see the end result but they should have little reason to change how the sector size is defined by the "internal" format. Or is there some other OS that only works with 512?

That would leave just a few things to address:

  • block support for log4096 -> phy512 (or any 2^n to be more generic).
  • Template building code to create 4k GPT (sfdisk doesn't seem to support forcing sector size unlike some other partition tools but losetup could be used to fake it).
  • Fix the 512 assumptions (such as the one with swap partition sizing), hopefully not that many
  • One-time migration (at new release?) which would be at most as complex as a GPT rewrite workaround needed for supporting both sizes, probably somewhat less.

@marmarek
Copy link
Member

Or is there some particular reason why 512 is still needed for the VM disk format that is almost internal to Qubes

I'm not sure about disks emulated by QEMU. And then windows PV drivers. Recently I've seen some patches flying around fixing 512 sector size assumption somewhere there, so there still may be more issues like this.
Given various elements involved, I think 512 is simply safer in terms of compatibility.

@brendanhoar
Copy link

brendanhoar commented Jul 8, 2019

Can I throw in another alignment data point to consider: the LVM chunk_size, which can range from 64KB to 1MB.

Policy-wise, Qubes may want to consider ensuring that any physical partitions (or partitions inside lvm LVs), that are created by qubes tools and/or installer, are 1MB aligned, primarily for performance reasons. Probably not as critical as the baseline fixes to ensure 4K logical sector drives work, but since that requires changes, consider enforcing a much more strict alignment going forward (see the volatile volume issue #5151).

Brendan

@arno01
Copy link

arno01 commented Jan 19, 2020

If anyone needs 4Kn templates right now, can use my patch from https://gist.github.com/arno01/ae31e1e9098591dadde3d1fc8c707000

I have also found that partprobe will fail to spawn the partitions off loop devices created with the custom sector size (losetup: -b / --sector-size) not corresponding to the sector size of the backing disk on Linux < 4.18-rc4.


And there is some interesting discussion about the 4Kn sector disks.
IIUC, the point Alan Cox makes there is that this kind of problem should be solved at the partitioning level, not at the xenbus / LVM / Linux kernel.

@rustybird
Copy link

rustybird commented Oct 22, 2022

This will become a bigger problem with R4.2, where cryptsetup >= 2.4.0 (Fedora >= 35) will automatically create dm-crypt with 4K logical sectors even on 512e drives.

Ideas:

  • Invoke cryptsetup luksFormat with an explicit --sector-size=512 argument for the LVM Thin installation layout (fix for 512e drives)

  • Or attach thin volumes to the VM via loop devices dm-ebs (fix for 4Kn and 512e drives)

    • With an optimization to skip the loop device dm-ebs setup (passing through the thin volume) when it already has the right logical block size, and a Volume.logical_block_size property (defaulting to 512) it could be a way to gradually opt into 4K storage volumes in general.

@brendanhoar
Copy link

I wonder if Qubes pools should specify the sector size of their underlying storage technology, and whether importing volumes should involve a conversion step?

B

@rustybird
Copy link

rustybird commented Oct 22, 2022

Conversion during import would mean parsing VM data in dom0 😬

Or a DisposableVM I guess.

@rustybird
Copy link

rustybird commented Oct 22, 2022

Ok someone should definitely write a DisposableVM-powered converter for common volume contents.

But automatic conversion won't be possible in all cases (like standalone HVMs where a volume could contain anything, e.g. bs dependent filesystems like XFS that might not be straightforward to upgrade), so even with a very good converter there's still a need for

  • per-volume metadata recording the appropriate bs for its current content
  • a mechanism to present the volume to the VM with that bs, even if the storage pool's ideal bs is different, e.g. after restoring from a backup

@HW42
Copy link

HW42 commented Oct 22, 2022

This will become a bigger problem with R4.2, where cryptsetup >= 2.4.0 (Fedora >= 35) will automatically create dm-crypt with 4K logical sectors even on 512e drives.

Interesting. Do you know how they implement this? Because I thought this direction is the tricky one, because a block device should guarantee atomic writes per sector (in other words you should always see either the version before the write or a fully updated sector, but not a mix). So a proper implementation likely needs a journal.

  • a mechanism to present the volume to the VM with that bs, even if the storage pool's ideal bs is different, e.g. after restoring from a backup

At lest on dm level support seems to exists: https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-ebs.html

@rustybird
Copy link

cryptsetup >= 2.4.0 (Fedora >= 35) will automatically create dm-crypt with 4K logical sectors even on 512e drives.

Do you know how they implement this? Because I thought this direction is the tricky one, because a block device should guarantee atomic writes per sector

I'm kinda curious too about how writes really work for kernel -> 512e drive communication.

Pure speculation: Since both the kernel and the drive know that the drive's physical block size is 4K, maybe the kernel just always writes batches of 8 * 512B logical blocks - and when the drive sees logical blocks coming in fast enough, one immediately following another, it figures out that read-modify-write can be avoided? Or there could be some explicit way for the kernel to signal to the drive that it's aware of 512e and that it guarantees to send 4K blocks merely encoded as batches of 512B blocks.

https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-ebs.html

Huh. Thanks! Wonder if that's better than a loop device.

@DemiMarie
Copy link

I can think of at least two solutions:

  1. Place the partition table for a 512e device between the protective MBR and the 4Kn GPT. There are 7 512-byte sectors in this space, which allows for up to 6 partitions. This is enough to fit all three partitions used by Qubes OS, plus one extra covering the 4Kn partition table. The only problem with this approach is that if the block size is 4K, the protective MBR will appear to extend past the end of the device. I suspect this is harmless.
  2. Use a partition table that is not part of the image, but instead is overlayed on it at runtime. This can be done by using dm-linear.

@rustybird
Copy link

rustybird commented Oct 24, 2022

@DemiMarie I don't get (1). Would there be some script in the VM's initrd to rewrite the partition table ("activating" the stashed away 512B or 4K version) depending on xvda's current logical block size?

Dynamically switching back and forth between 512B and 4K partitioning in general seems like it could make resizing the volume (resize-rootfs-if-needed.sh and resize-rootfs) a little scary...

@marmarek
Copy link
Member

Generally, I'd try to avoid any kind of conversion at startup and go for emulation when necessary. That means:

  • recording expected block size by the volume
  • emulating it, if the underlying pool has different block size

And then, either build templates with two flavors, or convert at the install time (as part of qvm-template-postprocess), if reasonably easy.

Can we get away without emulating 4k bs on 512 bs devices?

@rustybird
Copy link

Once there's a way to attach a volume as 4K, why even bother building (or converting to) 512B templates.

Can we get away without emulating 4k bs on 512 bs devices?

Forcing --sector-size=4096 for luksFormat in the installer (or reencrypt in the upgrade script) even on drives reporting 512B physical sectors would have the same effect.

I'd guess almost all of those drives (that make sense to install a Qubes storage pool on) actually have 4K physical sectors anyway, but it's misreported by shoddy firmware or an adapter.

@rustybird
Copy link

rustybird commented Oct 26, 2022

https://gitlab.com/cryptsetup/cryptsetup/-/issues/585

Thanks! Latest updates to the proposal:

  • workaround for cryptsetup is to ensure good partitioning
  • clarified default value mechanism for the property, it was too vague
  • legacy 'file' driver doesn't use /etc/xen/scripts/block, but 'linux-kernel' driver does

@marmarek
Copy link
Member

marmarek commented Oct 29, 2022

Sadly, I don't have any hardware with 4k physical sector size to test on. I'll try to find a way to emulate one.

Here is openQA run with 4kn emulated disk: https://openqa.qubes-os.org/tests/52712
fdisk there reports:

fdisk -l /dev/nvme0n1
fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 80 GiB, 85899345920 bytes, 20971520 sectors
Disk model: QEMU NVMe Ctrl                          
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

And every VM fails to start, as expected.

Note to self: set HDDMODEL=nvme,physical_block_size=4096,logical_block_size=4096

@marmarek
Copy link
Member

marmarek commented Apr 7, 2023

Given the failure mode on 4.2 is worse than on 4.1, I think we should have it in 4.2. The plan outlined by @rustybird in #4974 (comment) looks good. @rustybird are you up for implementing this?

@DemiMarie
Copy link

@marmarek: what about write tearing? 4K sector writes on a 512e driver are not guaranteed to be atomic, and IIRC are not atomic on some low-end SSDs in the event of power failure. XFS takes precautions against this.

@andrewdavidwong andrewdavidwong added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. and removed T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Apr 7, 2023
@andrewdavidwong andrewdavidwong changed the title VM partition layout expects 512B sector size, VMs fail to start from 4096B drive Support 4k storage Apr 7, 2023
@rustybird
Copy link

Maybe we should get the R4.2 regression affecting 512e disks out of the way first, by hacking the installer to use cryptsetup luksFormat --sector-size=512 whenever the LVM Thin layout has been selected and the destination disk has a 512-byte logical block size.

But yes I'm also still interested in starting on the proposal's phase 1 and 2 at least, to fix the existing lvm_thin (and zfs driver too?) incompatibility with 4Kn drives. Not sure how long it will take though.

Phase 3 and 4 is where things would get spicy with the whole atomicity question. If a Qubes storage volume is exposed to the VM as a 4K block device even if the disk hardware might not provide atomic 4K writes, will this cause the filesystem on the volume to falsely rely on 4K writes being atomic when it otherwise wouldn't have - either for its own purposes in attempting to preserve its data structures' integrity, or as some sort of ineffective guarantee to an application writing data into a file? Tbd... There's an interesting writeup: https://stackoverflow.com/a/61832882

marmarek added a commit to marmarek/qubes-anaconda that referenced this issue Aug 13, 2023
See patch description for details

QubesOS/qubes-issues#4974
@andrewdavidwong andrewdavidwong removed this from the Release 4.2 milestone Aug 13, 2023
@marmarek
Copy link
Member

Maybe we should get the R4.2 regression affecting 512e disks out of the way first, by hacking the installer to use cryptsetup luksFormat --sector-size=512 whenever the LVM Thin layout has been selected and the destination disk has a 512-byte logical block size.

I tried in QubesOS/qubes-anaconda#28
But it didn't worked: https://openqa.qubes-os.org/tests/80043:

[2023-08-14 05:27:22] Waiting for /dev/xvda* devices...

[2023-08-14 05:27:22] Qubes: Doing R/W setup for TemplateVM...

[2023-08-14 05:27:23] [    2.836815]  xvdc: xvdc1 xvdc3

[2023-08-14 05:27:23] Setting up swapspace version 1, size = 1073737728 bytes

[2023-08-14 05:27:24] [    3.052640] random: crng init done

[2023-08-14 05:27:24] UUID=50685a9b-25ff-4cb6-b107-7170602a08e6

[2023-08-14 05:27:24] Qubes: done.

[2023-08-14 05:27:24] mount: mounting /dev/mapper/dmroot on /sysroot failed: Invalid argument

[2023-08-14 05:27:24] Waiting for /dev/xvdd device...

[2023-08-14 05:27:24] [    3.086464] /dev/xvdd: Can't open blockdev

[2023-08-14 05:27:24] [    3.086937] EXT4-fs (xvdd): mounting ext3 file system using the ext4 subsystem

[2023-08-14 05:27:24] [    3.089277] EXT4-fs (xvdd): mounted filesystem with ordered data mode. Quota mode: none.

[2023-08-14 05:27:24] mount: mounting none on /sysroot/lib/modules failed: No such file or directory

[2023-08-14 05:27:24] [    3.172895] EXT4-fs (xvdd): unmounting filesystem.

[2023-08-14 05:27:24] mount: can't read '/proc/mounts': No such file or directory

[2023-08-14 05:27:24] BusyBox v1.36.0 (2023-01-10 00:00:00 UTC) multi-call binary.

[2023-08-14 05:27:24] 
[2023-08-14 05:27:24] Usage: switch_root [-c CONSOLE_DEV] NEW_ROOT NEW_INIT [ARGS]

[2023-08-14 05:27:24] 
[2023-08-14 05:27:24] Free initramfs and switch to another root fs:

[2023-08-14 05:27:24] chroot to NEW_ROOT, delete all in /, move NEW_ROOT to /,

[2023-08-14 05:27:24] execute NEW_INIT. PID must be 1. NEW_ROOT must be a mountpoint.

[2023-08-14 05:27:24] 
[2023-08-14 05:27:24] 	-c DEV	Reopen stdio to DEV after switch

[2023-08-14 05:27:24] [    3.193656] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

I think I did set it correctly:

root# cryptsetup luksDump /dev/nvme0n1p3; echo efDdj-$?-
LUKS header information
Version:       	2
Epoch:         	3
Metadata area: 	16384 [bytes]
Keyslots area: 	16744448 [bytes]
UUID:          	c51bf27f-c7e1-443b-917d-b95db5145101
Label:         	(no label)
Subsystem:     	(no subsystem)
Flags:       	(no flags)

Data segments:
  0: crypt
	offset: 16777216 [bytes]
	length: (whole device)
	cipher: aes-xts-plain64
	sector: 512 [bytes]
...

Any ideas?

@rustybird
Copy link

rustybird commented Aug 14, 2023

@marmarek:

But it didn't worked: https://openqa.qubes-os.org/tests/80043:

The test had HDDMODEL configured with physical_block_size=4096,logical_block_size=4096, so it's emulating a 4Kn drive. To test 512e (in the sense that's causing the R4.2 regression due to new cryptsetup) it should be physical_block_size=4096,logical_block_size=512

Hardcoding --sector-size=512 doesn't help with 4Kn drives, because dm-crypt's sector_size cannot shrink the logical block size (compared to the underlying block device), we can only avoid enlarging it.

@DemiMarie
Copy link

@marmarek: I think overlaying two different GPTs is the only reasonable approach here. At some point everyone will be using bcachefs or another filesystem that does not care about sector size but we are not there yet.

@marmarek
Copy link
Member

@marmarek: I think overlaying two different GPTs is the only reasonable approach here.

Is it only partition table issue? What about the filesystem?
But also, the remark about dynamic resize is a valid one. Only one partition table will be updated, so if you migrate to a disk with a different logical block size, you'll get truncated partition.
But if that's just about partition table, maybe we can do the conversion in initramfs before mounting anything? Assuming everything is 4k-aligned, it should be technically possible, right?

@DemiMarie
Copy link

@marmarek: I think overlaying two different GPTs is the only reasonable approach here.

Is it only partition table issue? What about the filesystem? But also, the remark about dynamic resize is a valid one. Only one partition table will be updated, so if you migrate to a disk with a different logical block size, you'll get truncated partition. But if that's just about partition table, maybe we can do the conversion in initramfs before mounting anything? Assuming everything is 4k-aligned, it should be technically possible, right?

Yup, it’s just about partition table, and we can avoid the dynamic resize problem by changing the partition table with our own tools that understand the different layout.

@marmarek
Copy link
Member

@rustybird what do you think about adjusting partition table in initramfs?

@rustybird
Copy link

I like the simplicity of it, especially compared to my 4-phase slog of a proposal.

The adjustment script should probably bail out early if the root volume looks too nonstandard? E.g. if xvda3 is not an ext4 filesystem (or another filesystem type that's whitelisted as known to be logical-block-size agnostic).

@DemiMarie
Copy link

Which filesystem types should be on the allowlist?

@rustybird
Copy link

rustybird commented Aug 14, 2023

IIRC ext3 and btrfs are fine too, xfs definitely isn't

@maybebyte
Copy link

maybebyte commented Dec 22, 2023

Overall, I think these kind of problems are a good reason to switch to BTRFS-by-default [...]

Uhm, I think we are getting a bit off-track here. This issue is about things not working on devices that report 4096 B logical sectors. Switching the default storage pool wouldn't even solve this, beside the other questions it raises (this doesn't mean it's a bad idea, but not for this issue).

Setting aside the question of what a good default would be for Qubes OS, using btrfs instead of LVM + ext4 actually does work around this issue on my 1 TB NVMe SSD (the model is a Sandisk Corp WD Blue SN570). It reports 4096 for both logical and physical sector size due to me following the instructions in the ArchWiki entry for Advanced Format a while back.1

Anyway, I ran into the same issue that OP mentioned while installing R4.2 on my desktop using the default disk configuration scheme, but everything works OK so far on a reinstall using btrfs. I wouldn't have even thought to do this if not for this issue and this comment on a duplicate issue.

It makes me wonder whether using other filesystems that have similar properties to btrfs would work, such as ZFS (I'm not suggesting Qubes OS needs to adopt ZFS; I'm just thinking out loud here). Seems like adjusting the drive to emulate a sector size of 512 again may not be a viable workaround based on others' comments here, but I haven't tested it.

Footnotes

  1. I'm aware that Advanced Format is a hard drive specific thing, but that's what the wiki entry is called.

@DemiMarie
Copy link

That’s really interesting, but it actually makes sense: since BTRFS is copy-on-write, it can (at the expense of performance) make arbitrarily small writes atomic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: installer C: storage hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Projects
None yet
Development

No branches or pull requests

10 participants