Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podvm-mkosi: dm-verity, reproducible builds & measurements #1606

Merged

Conversation

katexochen
Copy link
Contributor

@katexochen katexochen commented Nov 30, 2023

This introduces dm-verity protection for the root disk as well as fully reproducible builds.

Developers should notice that the / is now read only. All writes at runtime should happen to /run.

This is a big step in our remote attestation story. With introducing
a dm-verity protected root file system, the content of our root FS is
measured, including binaries and configuration that is part of the
image. The root FS is now read-only.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
@katexochen katexochen added the podvm Related to podvm images label Nov 30, 2023
@katexochen katexochen force-pushed the feat/mkosi-dmverity-nix branch 2 times, most recently from b593296 to 77ce6e6 Compare November 30, 2023 17:45
As the root file system is now read-only, files that need to be written
must reside in /run.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
As the root file system is now read-only, files that need to be written
must reside in /run.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
sshd generates server keys on boot. This change moves the generated
keys to /run, as their original destination is now read-only. Fedora
uses a custom script 'sshd-keygen' to generate the keys that cannot be
configured with the target path, so we need to vendor the script and
fix the path there.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Working with the new images, two information about the image are of
fundamental interest both for devs as for users: the vTPM PCR values
of the image and whether the image is a debug image or intended for
production use. This changes adds printing capabilities for those
information.

As production images don't allow access via SSH or serial console, we
print everything to the serial console after boot. For dev convenience,
the debug image will also print the image info on SSH login.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
This enables foremost the serial console. It is helpful for debug images
as they allow access through the serial consol for debugging as well as
for production images which display PCR values and don't allow login.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Adding packages and enabling units explicitly that were running before.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
@katexochen katexochen marked this pull request as ready for review November 30, 2023 17:53
@katexochen katexochen changed the title podvm-mkosi: squashFS, dm-verity, reproducible builds podvm-mkosi: dm-verity, reproducible builds & measurements Nov 30, 2023
@bpradipt
Copy link
Member

bpradipt commented Dec 1, 2023

I built a standalone image for testing and it worked without any issues. Currently testing out a peer-pod scenario. Will update.

@bpradipt
Copy link
Member

bpradipt commented Dec 1, 2023

I failed to successfully test e2e flow. The agent-protocol-forwarder is failing to start

usr/local/bin/agent-protocol-forwarder: failed to open /peerpod/daemon.json: open /peerpod/daemon.json: no such file or directory

This shouldn't have happened as code changes are there. I'll have to recheck my environment and redo this next week.

Comment on lines 15 to 17
--build-arg CAA_SRC=https://github.com/katexochen/cloud-api-adaptor \
--build-arg CAA_SRC_REF=feat/mkosi-dmverity-nix \
--no-cache \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could parameterize those in the the Makefile or just drop before merge. I opt for the latter to reduce the noise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that commit should be dropped.

@bpradipt
Copy link
Member

bpradipt commented Dec 1, 2023

So the binaries agent-protocol-forwarder and process-user-data under podvm-mkosi/resources/binaries-tree/usr/local/bin/ were not having the updated filepath. Not sure why since the code https://github.com/katexochen/cloud-api-adaptor branch:feat/mkosi-dmverity-nix is having the changes.
Anyways I copied the files manually and rebuilt the image. Post this I'm successfully able to create the pod.

One quick observation was increase in pod startup time. Earlier with ubuntu on t3.small it took roughly 60s. Now it took almost close to 2min.
systemd-analyze showed this

root@fedora ~]# systemd-analyze blame
6min 41.394s afterburn-checkin.service
      3.756s dev-tpm0.device
      3.756s sys-devices-LNXSYSTM:00-LNXSYBUS:00-MSFT0101:00-tpm-tpm0.device
      3.683s dev-ttyS13.device
      3.683s sys-devices-platform-serial8250-tty-ttyS13.device
      3.681s sys-devices-platform-serial8250-tty-ttyS12.device
      3.681s dev-ttyS12.device
      3.680s sys-devices-platform-serial8250-tty-ttyS15.device
      3.680s dev-ttyS15.device
      3.677s sys-devices-platform-serial8250-tty-ttyS1.device
      3.677s dev-ttyS1.device
      3.676s sys-devices-platform-serial8250-tty-ttyS10.device
      3.676s dev-ttyS10.device
      3.674s sys-devices-platform-serial8250-tty-ttyS11.device
      3.674s dev-ttyS11.device
      3.671s dev-ttyS22.device
      3.671s sys-devices-platform-serial8250-tty-ttyS22.device
      3.670s dev-ttyS20.device
      3.670s sys-devices-platform-serial8250-tty-ttyS20.device
      3.669s sys-devices-platform-serial8250-tty-ttyS14.device
      3.669s dev-ttyS14.device
      3.668s sys-devices-platform-serial8250-tty-ttyS16.device
      3.668s dev-ttyS16.device
      3.666s dev-tpmrm0.device
      3.666s sys-devices-LNXSYSTM:00-LNXSYBUS:00-MSFT0101:00-tpmrm-tpmrm0.device
      3.665s sys-devices-platform-serial8250-tty-ttyS17.device

Will investigate this further next week.

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, also, the ssh debug shell works now and there's a super-cute issue screen 😃

I build an image and ran a pod successfully on it. When failing to launch an encrypted image I noticed that we were using the wrong config file for kata-agent.

cat /etc/systemd/system/kata-agent.service | grep agent-config
ExecStart=/usr/local/bin/kata-agent --config /etc/agent-config.toml
ExecStopPost=/usr/local/bin/kata-agent-clean --config /run/peerpod/agent-config.toml
# Now specified in the agent-config.toml Environment="KATA_AGENT_SERVER_ADDR=unix:///run/kata-containers/agent.sock"

fixed the line, rebuilt the image, pushed it to a gallary and checked the fix - all within 10min (90% of that is gallery publishing). It's hard to overstate the improvement to the developer experience!

...ofc I forgot to build w/ AA_KBC but it should work 🙄

Last login: Fri Dec  1 18:32:19 2023 from 172.16.0.2
                        ...
                    .-=**++*+=:                 PodVM image for cloud-api-adaptor
                 :=**=:.    .-+*+-.             ---------------------------------
             .-+*+-.            :=+*=:          Part of Confidential Containers project
          .-+*=:                   .-+*+-
        -+*=:           .-:            -+*=.    Image variant: debug
       :*+           :=*****+-.          :**
       =*-       .-+************=:        +*.   OS: Fedora Linux 38 (Thirty Eight) x86_64
       +*:      +****************#%-      +*:   Host: Virtual Machine Hyper-V UEFI Release v4.1
       +*.      +************##%@@@=      =*:   Kernel: 6.6.2-101.fc38.x86_64
       +*.      +*********#%%@@@@@@=      =*:   Shell: bash 5.2.21
       +*.      +********#@@@@@@@@@=      =*:   Terminal: /dev/pts/0
       +*.      +********#@@@@@@@@@=      =*:   CPU: 19/01 (2) @ 2.445GHz
       =*:      =********#@@@@@@@@%:      +*:   Memory: 341MiB / 7477MiB (4%)
       =*-        :=*****#@@@@@%+:        **.   Disk (/): 243M / 243M (100%)
       :*+.          :-+*#@@#=.           :*+
        :+*+:            -:            .-+*=.
          .-+*+-.                  :=+*=:
             .:=*+-:           .:=*+=:
                 :-+*+-.   .:=+*=-.
                     :=+*+**+-.
                         ..

[root@fedora ~]# journalctl -u kata-agent | grep AA
Dec 01 18:30:51 fedora kata-agent[672]: [2023-12-01T18:30:51Z ERROR attestation_agent::rpc::getresource::ttrpc] Call AA-KBC to get resource failed: AA does not support the given KBC module! Module: cc_kbc
Dec 01 18:30:52 fedora kata-agent[672]: [2023-12-01T18:30:52Z ERROR attestation_agent::rpc::getresource::ttrpc] Call AA-KBC to get resource failed: AA does not support the given KBC module! Module: cc_kbc

podvm/files/etc/systemd/system/kata-agent.service Outdated Show resolved Hide resolved
@mkulke
Copy link
Contributor

mkulke commented Dec 1, 2023

One quick observation was increase in pod startup time. Earlier with ubuntu on t3.small it took roughly 60s. Now it took almost close to 2min. systemd-analyze showed this

I noticed the same and opened this issue to look into it

@mkulke
Copy link
Contributor

mkulke commented Dec 1, 2023

I can confirm, w/ the fix and a rebuilt AA image layer decryption also works 🎉

@katexochen
Copy link
Contributor Author

Link check failure is unrelated, caused by #1508

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -5,9 +5,11 @@ Wants=process-user-data.service
After=netns@podns.service process-user-data.service

[Service]
ExecStart=/usr/local/bin/kata-agent --config /etc/agent-config.toml
# Temporary fix for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, just haven't opened that issue/PR yet.

@bpradipt
Copy link
Member

bpradipt commented Dec 5, 2023

Retesting on AWS. I'll update my findings. Overall this looks good to me.

ExecStart=/usr/local/bin/kata-agent --config /etc/agent-config.toml
# Temporary fix for
ExecStartPre=mkdir -p /run/kata-containers
ExecStart=/usr/local/bin/kata-agent --config /run/peerpod/agent-config.toml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question came to my mind when going through the changes again.

@katexochen can the systemd config files for mkosi based builds be kept under podvm-mkosi directory?

This will give us sometime to cleanly deprecate packer based builds and eventually remove packer based builds or move it out to separate repo.

Also having the required dependencies for mkosi under separate directory will make subsequent changes to mkosi based builds easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

katexochen can the systemd config files for mkosi based builds be kept under podvm-mkosi directory?

Not sure I understand your proposal here. You don't want this PR to have any changes inside of /podvm ? The binaries container files for mkosi are in that directory, and mkosi uses the same /podvm/files as the legacy images.

There is no way scoping these changes to /podvm-mkosi, as other components are touched, like process-user-data, adaptor and forwarder, as well as the guest-components.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

katexochen can the systemd config files for mkosi based builds be kept under podvm-mkosi directory?

Not sure I understand your proposal here. You don't want this PR to have any changes inside of /podvm ? The binaries container files for mkosi are in that directory, and mkosi uses the same /podvm/files as the legacy images.

Yes, I'm looking at having all mkosi related configs and deps under podvm-mkosi to help with deprecating packer based builds.
I created a related issue to track this - #1615

There is no way scoping these changes to /podvm-mkosi, as other components are touched, like process-user-data, adaptor and forwarder, as well as the guest-components.

I'm only referring to configs (like the systemd scripts, Dockerfile, Makefile). Apologise if it was not clear.

At a high level, this is what I was thinking

  • Copy the systemd config files in podvm/files/etc to podvm-mkosi
  • Include Makefile.inc in the podvm-mkosi/Makefile to build the binaries (will need some adjustments)
  • Move the fedora dockerfiles under podvm-mkosi as fedora is only used for podvm-mkosi builds

I plan to handle these in a separate PR as part of 1615. However, since this PR is touching the systemd files , can these files be copied to podvm-mkosi and used from there?

If it's too much work, then leave it. Will handle it separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the fedora dockerfiles under podvm-mkosi as fedora is only used for podvm-mkosi builds

okay for me, as you said this is independent and has no impact either way

Copy the systemd config files in podvm/files/etc to podvm-mkosi
Include Makefile.inc in the podvm-mkosi/Makefile to build the binaries (will need some adjustments)

I think duplicating this stuff will lead to a mess. There are no mkosi specific modifications to the Makefile.inc, so why would we duplicate that? Regarding the /files, I don't see benifeits either. The changes that this PR is doing must be made to the legacy podvm files, too, otherwise the legacy images will just break. There are no mkosi specific updates to these files. If you really don't want these changes in the legacy images, you would basically need need to fork guest-components and cloud-api-adaptor at least.

In general, I don't think it is a good approach to isolate the two image-builds more than necessary from each other. We want to migrat -- making both approaches fully independent will rather lead to a fork than to a migration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration is going to take time and I am exploring ways to make it smooth. Also I'm looking at ways to have a path forward for ongoing experimentation (by end users of upstream project and developers) with workloads on peer-pods. This is important to identify functional issues with peer-pods approach.

Currently, I can't experiment with AI use cases involving large models and image sizes as it gets broken with this PR due to removal of the bind mount. We don't have a way to embed the container images in podvm yet. And spinning up large memory instances and keeping them running for experiments and debugging is very costly. Also required instances are not available always in a specific region where the cluster is deployed.

Experimenting with gpu workloads is another example. Supported packages are available only for ubuntu and rhel. We can enable gpu support only for packer based builds and let users experiment with their workload and see if peer-pods approach work for them or not.
In parallel it gives us time to test F37 nvidia packages on F38 mkosi images and if it works, we enable it for mkosi built image. Or wait for F38 packages to be released to enable it in mkosi built image.

I feel isolating the two builds will help with the velocity of the project at this stage and most importantly give us time to focus on mkosi based builds knowing that packer based images are available to continue with experiments both for us and end users of upstream peer-pods project.

Are there other alternatives that we can look into ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘ll be able to take a more thorough look into it tomorrow, but I agree, this PR shouldn‘t break packer builds. (If that would be the case, we’d need more reviewers to ack) I‘m optimistic we‘re able introduce this iteratively without having to mirror everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I can't experiment with AI use cases involving large models and image sizes as it gets broken with this PR due to removal of the bind mount.

Ah, now this is actually a concrete problem we can work on. I asked about the bind mounts some while ago in the slack , and my understanding after that conversation was that the mounts weren't used, that's why I removed them. There are some easy ways to unblock you in this regard:

  • We can copy the files like you've said (fine for me if we have a reason like this)
  • We can disable the units in the mkosi build
  • We can remove the files in the docker build step of the binaries so they are not included in the tar ball (my preferred way)

In parallel it gives us time to test F37 nvidia packages on F38 mkosi images and if it works, we enable it for mkosi built image. Or wait for F38 packages to be released to enable it in mkosi built image.

I would need to check, but I don't think the image has any f38 specific dependencies. We could try to downgrade the full image to f37, but notice that f37 is EOL since yesterday (if I understand their schedule correctly).

And we could also reconsider building both fedora + ubuntu images with mkosi if we can't get AI stuff running on fedora.

Copy link
Contributor

@mkulke mkulke Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the files in the docker build step of the binaries so they are not included in the tar ball (my preferred way)

Since we clone + build the repo in the docker builds, I guess binaries + config files are somewhat coupled. so there might be inconsistencies if we retrieve /etc/** from somewhere in the file tree (HEAD), while the binaries are copied from a docker layer (built from some specific ref).<

Since there is a plan to keep packer-based builds in-tree, for the time being, masking the units could be viable option.

Is the removal of the bind mounts relevant for the functionality of this PR or is this a drive-by fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-added the units in /podvm and added some lines to remove them during the mkosi build.

@bpradipt
Copy link
Member

bpradipt commented Dec 6, 2023

I still can't get a working pod deployment on AWS. This time I get an error

 [mount.fuse [overlay /var/lib/containerd/tmpmounts/containerd-mount2222888048 -o workdir=/var/lib/containerd-nydus/snapshots/33/work -o upperdir=/var/lib/containerd-nydus/snapshots/33/fs -o lowerdir=/var/lib/containerd-nydus/snapshots/27/fs -o io.katacontainers.volume=eyJ2b2x1bWVfdHlwZSI6ImltYWdlX2d1ZXN0X3B1bGwiLCJvcHRpb25zIjpbImNvbnRhaW5lcmQuaW8vc25hcHNob3QvY3JpLmxheWVyLWRpZ2VzdD1zaGEyNTY6NWU4MTE3YzBiZDI4YWVjYWQwNmY3ZTc2ZDRkM2I2NDczNGQ1OWMxYTBhNDQ1NDFkMTgwNjBjZDhmYmEzMGM1MCIsImNvbnRhaW5lcmQuaW8vc25hcHNob3QvbnlkdXMtcHJveHktbW9kZT10cnVlIl0sImltYWdlX3B1bGwiOnsibWV0YWRhdGEiOnsiY29udGFpbmVyZC5pby9zbmFwc2hvdC9jcmkubGF5ZXItZGlnZXN0Ijoic2hhMjU2OjVlODExN2MwYmQyOGFlY2FkMDZmN2U3NmQ0ZDNiNjQ3MzRkNTljMWEwYTQ0NTQxZDE4MDYwY2Q4ZmJhMzBjNTAiLCJjb250YWluZXJkLmlvL3NuYXBzaG90L255ZHVzLXByb3h5LW1vZGUiOiJ0cnVlIn19fQ== -o ro -t nydus-overlayfs]] failed: "": exec: "mount.fuse": executable file not found in $PATH

Most likely this error has nothing to do with this PR.

@stevenhorsman or @mkulke any idea about the above error ? I'm using v0.8.0 on EKS and deployed the containerd (v1.7) via the operator.

@bpradipt
Copy link
Member

bpradipt commented Dec 6, 2023

I still can't get a working pod deployment on AWS. This time I get an error

 [mount.fuse [overlay /var/lib/containerd/tmpmounts/containerd-mount2222888048 -o workdir=/var/lib/containerd-nydus/snapshots/33/work -o upperdir=/var/lib/containerd-nydus/snapshots/33/fs -o lowerdir=/var/lib/containerd-nydus/snapshots/27/fs -o io.katacontainers.volume=eyJ2b2x1bWVfdHlwZSI6ImltYWdlX2d1ZXN0X3B1bGwiLCJvcHRpb25zIjpbImNvbnRhaW5lcmQuaW8vc25hcHNob3QvY3JpLmxheWVyLWRpZ2VzdD1zaGEyNTY6NWU4MTE3YzBiZDI4YWVjYWQwNmY3ZTc2ZDRkM2I2NDczNGQ1OWMxYTBhNDQ1NDFkMTgwNjBjZDhmYmEzMGM1MCIsImNvbnRhaW5lcmQuaW8vc25hcHNob3QvbnlkdXMtcHJveHktbW9kZT10cnVlIl0sImltYWdlX3B1bGwiOnsibWV0YWRhdGEiOnsiY29udGFpbmVyZC5pby9zbmFwc2hvdC9jcmkubGF5ZXItZGlnZXN0Ijoic2hhMjU2OjVlODExN2MwYmQyOGFlY2FkMDZmN2U3NmQ0ZDNiNjQ3MzRkNTljMWEwYTQ0NTQxZDE4MDYwY2Q4ZmJhMzBjNTAiLCJjb250YWluZXJkLmlvL3NuYXBzaG90L255ZHVzLXByb3h5LW1vZGUiOiJ0cnVlIn19fQ== -o ro -t nydus-overlayfs]] failed: "": exec: "mount.fuse": executable file not found in $PATH

Most likely this error has nothing to do with this PR.

@stevenhorsman or @mkulke any idea about the above error ? I'm using v0.8.0 on EKS and deployed the containerd (v1.7) via the operator.

Able to run a pod finally. I had to install fuse on the EKS worker node

@mkulke
Copy link
Contributor

mkulke commented Dec 6, 2023

@bpradipt yes, that’s a known issue w/ nydus and coco 0.8. it will not work ootb on EKS, you need to install fuse

@bpradipt
Copy link
Member

bpradipt commented Dec 6, 2023

And I get the beautiful issue screen

image

@bpradipt
Copy link
Member

bpradipt commented Dec 6, 2023

I could also successfully spin up an AMD SEV-SNP instance. So my basic pod creation workflow is working fine with both cvm and non cvm.

@katexochen katexochen force-pushed the feat/mkosi-dmverity-nix branch 2 times, most recently from 6cbbc9f to 988912f Compare December 7, 2023 08:26
Removing the mount units as there currently isn't a way to secure them.

This triggered a bug in kata-agent/ttrpc-rust as the path of the socket
is not created automatically. Adding a workaround to the kata-agent
unit until this is fixed upstream.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Disabling the newer systemd-measure services as they are not needed and
use a lot of PCRs.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
It is a commen task to check the pcrs when using measured boot, so
we are providing an alias to do that with ease.

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Thanks @katexochen
I couldn't test the latest changes due to issues on my side. But based on my previous tests and discussions this looks good.

To include confidential-containers/guest-components#401

Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
@katexochen
Copy link
Contributor Author

Fixed the commit message, which was to long.

@katexochen katexochen merged commit 0657f2c into confidential-containers:main Dec 13, 2023
16 checks passed
@katexochen katexochen deleted the feat/mkosi-dmverity-nix branch December 13, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
podvm Related to podvm images
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants