Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

Partition remapper (draft)

aszlig edited this page Sep 15, 2015 · 8 revisions

Partition remapper

The remapper is providing a way to (non-destructively) migrate from one partition layout to another without relying on external storage, networking or enough RAM to hold the complete set of data.

Motivation

Usually if you want to re-partition your disk or change the filesystem, you'd need to move your data to another storage device, write the new partition table, create the new filesystems and move the data over to the new filesystem.

For nixos-assimilate we need to write the Nix store to the existing filesystem, reboot into the new kernel of the new configuration, move to a new partitioning schema and continue booting from the previously written Nix store. Of course, we could just provide a bare minimum configuration expression that is built after re-partitioning, but it would introduce quite a lot of additional complexity and is error-prone, particularly if there are networking failures.

Also, a further advantage is that we can not only use it for just writing the Nix store, but retain existing data, which could be useful in order to roll back to a previously working state.

Basic design

The partition remapper is comprised of two virtual block devices that are based on the real disk, so we have three devices:

  • Virtual source device
  • Virtual destination device
  • Physical target device

Virtual source device

Firstly, the source device is behaving the exact same way as the physical target device, so we can properly mount the filesystem(s) read-only on top of that. The reason for it to initially behave the same way (i.e., it's read-write and it writes to the actual physical target device), is that some filesystems need to do writes to disk, even if you do a read-only mount.

Secondly, we set the source device read-only as well and now the destination device is ready to receive the data.

Virtual destination device

Maps the new partitioning layout and has a caching mechanism which only writes data to the physical device if it has been successfully read from the source device and thus is no longer needed.

Example schema

Example remapping of blocks

The above figure shows a device consisting of 5 blocks which should be remapped like this:

  • Block 1 should go to block 4
  • Block 2 should go to block 3
  • Block 3 should go to block 1
  • Block 4 should go to block 5
  • Block 5 should go to block 2

And these are the detailed steps to do the transformation:

  1. Move block 1 into the cache (because block 4 wasn't transferred yet).
  2. Move block 2 into the cache (because block 3 wasn't transferred yet).
  3. Move block 3 to destination block 1 (because we already have block 1 in the cache) and move cached block 2 to destination block 3.
  4. Move block 4 into the cache (because block 5 wasn't transferred yet), move cached block 3 to destination block 1; hence we can move cached block 1 to destination block 4 as well.
  5. Move block 5 to block 2 (it was already touched in step 2) and move the remaining cached block 4 to the destination block 5.

Problems

Of course, this method has a few problems which need to be solved before we can implement it:

Virtual source device revisits

Especially for filesystems based on B-trees, metadata could be revisited more than once, also we need to avoid hard links or other means of deduplication. We could solve this by hooking into VFS and only work on extent level, but not sure how well this works with LVM. So let's investigate that and/or maybe find a better solution.

Too much overlap

If our cache is quite small and we have too much overlap between the old and new layout, we could run out of RAM. A way to mitigate this would be to temporarily relocate the cached data to a known visited area on the physical disk where we could be sure that it's not corrupting the mounted source filesystem(s).

"Special" file systems

One example that comes in mind is a LUKS container, where it would make sense to randomize data before formatting. In this particular case we could write random data afterwards, because we already know which blocks are used by the new filesystem. However, there could be other cases, so let's check that first.