Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for unified image layout #1870

Merged
merged 5 commits into from Dec 17, 2021

Conversation

bcressey
Copy link
Contributor

@bcressey bcressey commented Dec 16, 2021

Issue number:
N/A

Description of changes:
Add support for a "unified" image layout, where the OS and data partitions reside on the same disk.

In order to resize the data partition when it's on the same disk as the root filesystem, switch from our homegrown growpart tool to systemd-repart, which knows how to tell the kernel about changes to individual partition entries.

Refactor the unit dependencies for setting up /local to take advantage of the new repart functionality, which lets us an ordinary mount unit for the filesystem.

Testing done:

  • "unified" and "split" layout AMI registration works
  • "unified" and "split" layout OVA creation works
  • /local is correctly resized to fill the disk for both "unified" and "split"
  • systemd-repart and growpart resize the disk to the same end sector
  • overlayfs and bind mount directories are created and labeled

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

The repart service is a oneshot, so we can't use a drop-in to replace
the `ExecStart` command that runs by default, which attempts to add
or grow defined partitions on the same device as the root filesystem.

This is OK for "unified" images, where the data partition is at the
end of the device, but not OK for "split" images, where it lives on a
different device.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Switch from our `growpart` tool to `systemd-repart` to resize the
data partition. For a unified root+data image.

`growpart` uses the `gptman` crate, which calls the BLKRRPART ioctl
to tell the kernel to re-read the partition table. This call fails if
the device contains mounted partitions.

`systemd-repart` uses the newer BLKPG ioctl, which manipulates the
kernel's view of individual partitions. This works even if the root
filesystem is present on the same device and already mounted. It also
avoids the need to handle the partition symlink going away and coming
back, since udev does not get the change event that triggers this.

The two tools differ in how much free space is left on the device
after the last partition is resized. `growpart` ends the partition
one sector before the last 1 MiB boundary, while `systemd-repart`
ends it just before the GPT label. Both tools run on every boot.

To avoid problems on downgrade after a newer release resizes the data
filesystem beyond where the older release will end the partition, we
constrain `systemd-repart` to leave the older number of free sectors.

Since `/local` can be mounted during the resize operation, we can use
a real mount unit for it, which greatly simplifies the dependencies,
and allows us to decouple the "prepare" logic from "resize" logic.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
For some targets such as bare metal systems, the requirement for a
separate block device to hold the data partition is unworkable.

Implement a "unified" image layout, which places the data partition
after the final OS partition, and is suitable for targets which may
only have one disk.

The old "split" layout remains the default.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Comment on lines -120 to +128
%{S:1001} %{S:1002} %{S:1003} %{S:1004} %{S:1005} \
%{S:1006} %{S:1007} %{S:1008} %{S:1009} %{S:1010} %{S:1011} %{S:1012} \
%{S:1015} %{S:1040} %{S:1041} %{S:1060} %{S:1061} %{S:1062} %{S:1080} \
%{S:1001} %{S:1002} %{S:1003} %{S:1004} %{S:1005} %{S:1006} %{S:1007} \
%{S:1008} %{S:1009} %{S:1010} %{S:1011} %{S:1012} %{S:1013} %{S:1015} \
%{S:1040} %{S:1041} %{S:1042} %{S:1060} %{S:1061} %{S:1062} %{S:1080} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hurts to review 😰

echo "No OVF template or VMDK images, skipping OVA build"
exit 0
else
# Warn the user if a VMDK exists but an OVF template does not. Assume we do not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't know if this is intended, but there is an extra space here

Suggested change
# Warn the user if a VMDK exists but an OVF template does not. Assume we do not
# Warn the user if a VMDK exists but an OVF template does not. Assume we do not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to preserve the existing "two spaces after a period" convention in this section, even though I belong to the "one space after a period" tribe.

Makefile.toml Outdated
fi

root_image_size_bytes="$(measure_image "${root_vmdk_path}")"
root_image_size_gib="$((root_image_size_bytes / 1024 / 1024 / 1024))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for 1024 * 1024 * 1024 and 1024 / 1024 / 1024 to be constants?

@@ -101,6 +104,9 @@ install -p -m 0644 %{S:11} %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}
install -d %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/wicked/ifconfig
install -p -m 0644 %{S:1000} %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/wicked/ifconfig

install -d %{buildroot}%{_cross_libdir}/repart.d
install -p -m 0644 %{S:96} %{buildroot}%{_cross_libdir}/repart.d/80-local.conf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a drop-in, and not part of the unit itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

systemd-repart looks for its config files in /usr/lib/repart.d, like howsystemd-tmpfiles looks in /usr/lib/tmpfiles.d.

Although the existing AWS and VMware variants use the "split" image
layout, custom variants for these platforms might use the "unified"
layout instead.

Adapt the AMI registration and OVA creation logic to account for the
possibility that we only build a single disk image.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Using `state_t` as the label makes the directories read-only for all
unprivileged containers, even if they have access via a host mount.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
@bcressey
Copy link
Contributor Author

Added bytes_in_gib constant to make the calculations more readable.

Copy link
Contributor

@arnaldo2792 arnaldo2792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fi

root_image_size_gib="$(($(stat -c %s "${root_image}") / 1024 / 1024 / 1024))"
if [ "${is_split}" == "yes" ] ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we are at the limit of what should be done with scriptlets inside of a Makefile.toml!

Copy link
Contributor Author

@bcressey bcressey Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the gauntlet thrown down!

(But yes - we definitely need a different approach.)

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@bcressey bcressey merged commit 0710193 into bottlerocket-os:develop Dec 17, 2021
@bcressey bcressey deleted the unified-layout branch December 17, 2021 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants