Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux containers on FreeBSD #7000

Merged
merged 1 commit into from Jun 9, 2022

Conversation

akhramov
Copy link
Contributor

This allows running Linux containers on FreeBSD and modifies the
mounts so that they represent the linux emulated filesystems, as per:
https://wiki.freebsd.org/LinuxJails

Supersedes #5480 per convo with @gizahNL in OCI Slack chat

@k8s-ci-robot
Copy link

Hi @akhramov. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

oci/spec_opts.go Outdated
Comment on lines 380 to 387
//FreeBSD has support for running Linux binaries via syscall redirection, while that would be enough
//for some binaries others depend on emulated linux file systems (procfs & sysfs).
//So when we are running FreeBSD and our image OS is Linux we modify our mounts to mount the emulated
//filesystems.
//We do it here since here we have access to both the OCI spec & the image OS.
if runtime.GOOS == "freebsd" && ociimage.OS == "linux" {
freebsdLinuxEmulationMounts(s)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carrying over the comment I left on the previous PR:

I'd like to see this extracted out into a function like appendOSMounts, which is then separately defined in each of spec_opts_freebsd.go/spec_opts_linux.go/spec_opts_windows.go with whatever content is appropriate there. The Linux one would likely be empty, but I imagine there could be some use for this on Windows with the LCOW/WCOW work.

In addition to providing a possibly-reusable abstraction for FreeBSD and Windows to achieve similar goals, an advantage is keeping FreeBSD-specific filesystems like "linprocfs" and "linsysfs" out of the common file that is compiled for all operating systems.

I'd also like to see a WithLinuxEmulationMounts functional option that clients of containerd can use without having to use WithImageConfig/WithImageConfigArgs or even having an image with a declared OS of "linux".

If you're not interested in doing that work, I can pick up and carry this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this extracted out into a function like appendOSMounts

That makes a lot of sense, implemented. As for WithLinuxEmulationMounts I'm not that familiar with the codebase yet, so I would defer it to you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me, I can pick up WithLinuxEmulationMounts in a follow-up PR.

// The Variant field will be empty if arch != ARM.
Variant: cpuVariant(),
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add a newline at the end of this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks!

@@ -377,6 +377,7 @@ func WithImageConfigArgs(image Image, args []string) SpecOpts {
return fmt.Errorf("unknown image config media type %s", ic.MediaType)
}

appendOSMounts(s, ociimage.OS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the linter is failing here because it's also attempting to lint & compile on MacOS (darwin), where this new function is not defined. My apologies for not realizing that in my original suggestion, but I think you can fix this by either (a) adding a new spec_opts_notfreebsd.go and moving the definition for everything not-FreeBSD there (including Windows and Linux) or (b) additionally adding a spec_opts_darwin.go for MacOS.

@akhramov akhramov force-pushed the freebsd_linux_containers branch 2 times, most recently from 92f180e to e72aeaf Compare May 31, 2022 07:58
Copy link
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to work with me on this! I've got a few more questions and suggestions after looking at the code a bit more closely.

oci/spec_opts_freebsd.go Outdated Show resolved Hide resolved
Destination: "/proc",
Type: "linprocfs",
Source: "linprocfs",
Options: []string{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the in-place case above, we're adding "nosuid" and "noexec" as options, but in this case we're not. Is there any particular reason for the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, we should have nosuid & noexec in both ases.

Comment on lines 77 to 78
mounts = append(mounts, specs.Mount{
Destination: "/sys",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle /sys in-place like we're doing for /proc and /dev/fd too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but that is less relevant with the new approach I have taken. I will implement this if the approach does not work.


if !haveDevFd {
mounts = append(mounts, specs.Mount{
Destination: "/dev/fd",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be adding devfs for /dev?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we already have both from the default mounts:

func defaultMounts() []specs.Mount {
return []specs.Mount{
{
Destination: "/dev",
Type: "devfs",
Source: "devfs",
Options: []string{"ruleset=4"},
},
{
Destination: "/dev/fd",
Type: "fdescfs",
Source: "fdescfs",
Options: []string{},
},
}
}

@akhramov
Copy link
Contributor Author

@samuelkarp I drastically simplified the code. Let me know if I'm missing something. The reasoning is below.


We should already have /dev & /dev/fd as default mounts:

func defaultMounts() []specs.Mount {
return []specs.Mount{
{
Destination: "/dev",
Type: "devfs",
Source: "devfs",
Options: []string{"ruleset=4"},
},
{
Destination: "/dev/fd",
Type: "fdescfs",
Source: "fdescfs",
Options: []string{},
},
}
}

We need to provide /sys, /proc, and /dev/shm per https://wiki.freebsd.org/LinuxJails and https://github.com/opencontainers/runtime-spec/blob/1c3f411f041711bbeecf35ff7e93461ea6789220/config-linux.md#default-filesystems

/sys and /proc are backed by their corresponding file-systems, which I include in appendOSMounts

/dev/shm is backed by the linuxulator kernel module (albeit the device is not whitelisted by devfs ruleset 4, but that's a separate issue).

This allows running Linux containers on FreeBSD and modifies the
mounts so that they represent the linux emulated filesystems, as per:
https://wiki.freebsd.org/LinuxJails

Co-authored-by: Gijs Peskens <gijs@peskens.net>, Samuel Karp <samuelkarp@users.noreply.github.com>
Signed-off-by: Artem Khramov <akhramov@pm.me>
// Default returns the default matcher for the platform.
func Default() MatchComparer {
return Ordered(DefaultSpec(), specs.Platform{
OS: "linux",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because most containers image are for Linux today?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ordered with default (FreeBSD) first and Linux as a fallback.

@dmcgowan dmcgowan added this to Ready For Review in Code Review Jun 1, 2022
@akhramov
Copy link
Contributor Author

akhramov commented Jun 4, 2022

Alright, CI seems green 🎉 It seems like we are good to go.

@samuelkarp is there any items I missed?

@samuelkarp
Copy link
Member

@akhramov The previous approach modified the /proc and /dev/fd mounts in-place if they existed rather than appending them. Did you change to append unconditionally because you'd expect the mounts at the end of the spec to hide any previous mount to those mountpoint and the hidden mount is irrelevant?

@samuelkarp
Copy link
Member

Aside from the question about mounts, I've tested this and it works!

$ sudo ctr run --rm --runtime wtf.sbk.runj.v1 --tty --snapshotter zfs docker.io/library/alpine:latest test sh -c 'cat /etc/os-release && uname -a'
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.16.0
PRETTY_NAME="Alpine Linux v3.16"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
Linux 3.17.0 FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC x86_64 Linux

@akhramov
Copy link
Contributor Author

akhramov commented Jun 7, 2022

@samuelkarp

Did you change to append unconditionally because you'd expect the mounts at the end of the spec to hide any previous mount to those mountpoint and the hidden mount is irrelevant?

Not really, but it is up to the runtime (us) to create mountpoints at these locations. Since we don't have these as default mountpoints, I don't expect them to exist at all. Uh, I hope I'm not missing anything.

@samuelkarp
Copy link
Member

samuelkarp commented Jun 7, 2022

Since we don't have these as default mountpoints, I don't expect them to exist at all.

Mounts can be arbitrarily defined in the runtime configuration which is an argument to the functional option returned by WithImageConfigArgs (see the signature func(ctx context.Context, client Client, c *containers.Container, s *Spec) error; the s *Spec is an arbitrary configuration which may already have mounts defined). (It's worth keeping in mind that containerd's primary interface/first-class interface is the thick client library of which this is a part, which then interacts with the daemon over a gRPC API. Clients like ctr and firecracker-containerd and so forth then use this library.)

@akhramov
Copy link
Contributor Author

akhramov commented Jun 7, 2022

@samuelkarp got it, thanks. So what should happen if a client overrides a default mountpoint?

@samuelkarp
Copy link
Member

A client can choose which functional options to apply (including ones it supplies itself) and can also directly manipulate the object; ultimately the *Spec at the end of those functional options being applied is what gets sent to the underlying runtime (typically runc on Linux and runj on FreeBSD). The question here is: what behavior is appropriate for this functional option? I think it's probably fine for it to append-only and cause any prior mounts at that mountpoint to be shadowed (and on Linux I'd be reasonably confident of the behavior), but I'm not sure if there are footguns I don't know about on FreeBSD (and I think you know more than me). Specific areas I'd be concerned about include performance and what the behavior might be if a shadowed mount was a network filesystem that went away during the lifetime of the container.

@akhramov
Copy link
Contributor Author

akhramov commented Jun 8, 2022

@samuelkarp I think the shadowing behavior is pretty much the same on FreeBSD as on Linux.

@emaste, can you kindly confirm?

Copy link
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akhramov! If the shadowing behavior is roughly the same, I think we can go ahead and merge this and adjust to the more-complicated replace-in-place logic if we discover it being a problem.

LGTM!

@emaste
Copy link

emaste commented Jun 8, 2022

@samuelkarp I think the shadowing behavior is pretty much the same on FreeBSD as on Linux.

@emaste, can you kindly confirm?

That is my expectation. I asked some folks who would know for certain, but haven't heard back.

@samuelkarp
Copy link
Member

@emaste Thanks!

@akhramov and @gizahNL Thank you so much for working on this and pushing through! Now that containerd/project#90 is approved I'll go ahead and merge this myself. 😄

@samuelkarp samuelkarp merged commit 2b4b0cf into containerd:main Jun 9, 2022
Code Review automation moved this from Ready For Review to Done Jun 9, 2022
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this pull request Jun 11, 2022
@kbruner kbruner mentioned this pull request Sep 7, 2022
@dmcgowan dmcgowan added this to the 1.7 milestone Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants