New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run singularity in (unprivileged) k8s pod #5857
Comments
Related to #5806 and, possibly, #2397 ? If there is another way to do this which doesn't use the setuid approach that would be even better. I would have thought the setuid mode would be more likely to work than the user namespace mode. |
It'd be expected that the |
Marking this as a 'wontfix' per the above, but pinging @cclerget just in case he has a different viewpoint on it. |
Hi Ryan, The problem remaining and which I have been trying to find a Kubernetes admin to help me test for over a year is that the singularity -p option to give an unprivileged user namespace is disabled in docker and Kubernetes by default. The kubernetes option is documented as being in the PodSecurityPolicy under allowedProcMountTypes Unmasked. I referred to this a couple of days ago in #5454. |
@DrDaveD I'm a kubernetes admin :) I am very glad to finally have a working recipe for unprivileged singularity in k8s pods; the part I had been missing and still need to narrow down more is defining the seccomp profile needed for Singularity to work - should be easy if it only needs the unshare system call as you mentioned in the other issue. What is the advantage of using -p ? I don't think a separate PID namespace is essential for my use case. Trying that I get
Setting this in the psp:
and Took some digging and I found |
Regarding a seccomp profile that allows Singularity to work, I tried applying this file https://github.com/moby/moby/blob/master/profiles/seccomp/default.json#L384 with 'unshare' included on line 384 as unconditionally allowed, but ran into this:
so there must be a number of additional syscalls that are needed. I think I'm getting them logged to audit logs using SCMP_ACT_LOG but it could take a lot of digging to enumerate all the required ones. |
The singularity -p option is essential for complete isolation between unrelated payloads under a pilot job. We always use singularity -cip for isolation. That kubernetes feature gate you found is what the admin of the OSG service kubernetes cluster is up to, and I'm waiting on him to get around to it. It would be great if you could try it in the meanwhile. I was aware that unshare is only the first system call that singularity needs that is being blocked by the default docker/kubernetes seccomp profile. @jthiltges made a complete profile although I don't know where it is. In my opinion, it is much better to use
|
With Docker, adding clone, mount, setns, and unshare seemed sufficient to get Singularity running. This gist has the seccomp changes I'd used for testing: https://gist.github.com/jthiltges/02f93509bd92f3fc9a276bbc2e966d35/revisions |
@jthiltges thanks for the pointer, I'll take a look! |
One more thing which probably goes without saying, but for completeness: untrusted code needs to be started as an unprivileged user, not as a fake root user. |
Fake root (user namespace ID remapping) is not supported in kubernetes yet anyway. I enabled the ProcMountType=true feature gate and applied the same YAML changes described above, this time with no PSP complaints. get pod -o yaml shows "procMount: Unmasked" but using the -p Singularity option still returns From my point of view the application is completely contained inside the pod. ATLAS' use of Singularity to protect different parts of the workload from each other inside the pod is a separate matter; I am not sure if there is a firm policy on that. |
genuinetools/img#212 seems unmasked may only work with containerd ? |
Hi @rptaylor ,
You also need to allow |
That's an interesting and potentially helpful thread you found. As I read it, though, it wasn't working with containerd either in the end, although it seemed to get further. Can you still see the /proc mask mounts it refers to inside your pod, in /proc/mounts? A year and a half has elapsed since that thread; please list your software versions too for the record. It's not clear to me if the person testing was using a new enough version of kubernetes. It was apparently a new feature in kubernetes 1.13 according to this thread. That thread notes the PSP only allows it, it also has to be enabled in the container spec. Did you include that? The example in the img thread includes
When I use docker-ce-20.10.3-3.el7 API version 1.41 (seen with "docker version") I see those mounts under /proc until I add |
Yes, my earlier comment showed that: #5857 (comment) /proc/mounts is
k8s v1.19.7 Perhaps the kubernetes implementation of this alpha feature gate has not been updated since the new systempaths=unconfined Docker feature was added. Anyway dockershim is deprecated, probably it will eventually work with containerd. Hopefully the k8s feature will reach beta too. |
@rptaylor - having a bit of a difficult time following the whole thread. Were you successful in the end? |
Please what's was the resolve and steps taken? |
I think the only things needed to make unprivileged Singularity work in unprivileged k8s pods is using an unconfined seccomp profile or otherwise allowing various syscalls that would otherwise be blocked by the default docker seccomp profile (if your PSP applies seccomp). Also the nodes are EL8 which may have something to do with it, it might also work on EL7 if you enable the max user namespaces sysctl. However full PID isolation (the stuff about procMount unmasked) does not seem to work currently, at least not with Docker. |
@rptaylor - any luck with this? Was eventually able to replicate all the steps you did and hit the same problem. It looks like dockershim expects to pass the |
I wonder if the problem is in this line of kubernetes code and the fact that it appears to only set MaskedPaths if |
I'm not sure but it would make more sense to try this with containerd since Dockershim is deprecated. |
Works well enough for now, would be interesting to try further with containerd in the future. In any case this is a useful Singularity-related discussion but not a Singularity issue per se. |
Hi all -- Got it working! The recipe is:
(you can do as @jthiltges did and develop a more refined seccomp profile if you'd like.) Verify that the created pod has
SADLY it appears that the nvidia container runtime violates item (3). Without a GPU assigned:
(and Singularity works with PID namespaces). With a GPU assigned:
(and the kernel effectively considers |
That's great news, Brian! Can you give any more details for the record about how exactly to "Enable the feature gate for unmasked proc in the kube apiserver"? |
In order to enable the feature gate, I had to add a command line flag to the kube-apiserver pod. In my on-prem cluster (deployed via kubeadm), the flag looks like this:
|
I am trying to figure out how to use Singularity inside a k8s pod. If the pod is privileged it works, but I want to make it more secure and use a non-privileged pod. My first attempt is based on using a setuid installation of Singularity.
I have done the following:
git.computecanada.ca:4567/rptaylor/misc/atlas-grid-centos7-singbuild
)Nevertheless I am running into issues, I think because Singularity perhaps does not expect to be already running inside a container namespace.
Version of Singularity:
3.7.1
Expected behavior
singularity could start a (nested) container inside a k8s pod.
Actual behavior
The last line is a debug message I added which confirms the error is occurring here: https://github.com/hpcng/singularity/blob/master/cmd/starter/c/starter.c#L550
Perhaps when it tries to read "ns/mnt" it is not the right mount namespace?
Steps to reproduce this behavior
Run this container image on kubernetes (I can provide kubectl access to the pod if needed)
What OS/distro are you running
The kubelet node is CentOS8 and the container image is based on CentOS7.
How did you install Singularity
Built 3.7.1 for CentOS7 and installed into container image.
The text was updated successfully, but these errors were encountered: