Add first portion of container SSH Daemon #197

sargun · 2018-11-06T20:47:33Z

This PR is a WIP of adding a thing that makes it so we can embed sshd into the container.

codecov · 2018-11-06T20:54:15Z

Codecov Report

Merging #197 into master will increase coverage by 0.78%.
The diff coverage is 77.31%.

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
+ Coverage   31.77%   32.55%   +0.78%     
==========================================
  Files          66       68       +2     
  Lines        8445     8592     +147     
==========================================
+ Hits         2683     2797     +114     
- Misses       5441     5458      +17     
- Partials      321      337      +16

Impacted Files	Coverage Δ
config/config.go	`97.44% <100%> (+0.34%)`	⬆️
executor/runtime/docker/docker_linux.go	`52.29% <50%> (-0.54%)`	⬇️
executor/runtime/docker/docker_ssh.go	`65.38% <65.38%> (ø)`
executor/runtime/docker/docker_image_pull.go	`73.17% <73.17%> (ø)`
executor/runtime/docker/docker.go	`52.04% <81.15%> (+1%)`	⬆️
executor/runtime/types/types.go	`77.77% <0%> (-2.03%)`	⬇️

coveralls · 2018-11-06T21:06:30Z

Pull Request Test Coverage Report for Build 2472

44 of 238 (18.49%) changed or added relevant lines in 5 files are covered.
3 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.1%) to 22.801%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
executor/runtime/docker/docker_linux.go	0	5	0.0%
executor/runtime/docker/docker_image_pull.go	16	58	27.59%
executor/runtime/docker/docker_ssh.go	0	62	0.0%
executor/runtime/docker/docker.go	0	85	0.0%

Files with Coverage Reduction	New Missed Lines	%
executor/runtime/docker/docker.go	1	5.56%
executor/runtime/types/types.go	2	44.09%

Totals
Change from base Build 2367:	-0.1%
Covered Lines:	2501
Relevant Lines:	10969

💛 - Coveralls

sargun · 2018-11-08T01:21:25Z

@rgulewich this is ready-to-review.

gabrielhartmann · 2018-11-08T17:13:52Z

@sargun : could you please provide a little more description about what this PR does? For instance, does this intend to take care of the lifecycle of the systemd services you'll be launching or just provide the capability to launch these services?

This commit adds an embedded ssh daemon per container. It uses an SSH configuration which runs on port 7522 (the configuration is currently embedded in the code). The configuration sets up some principals, based on conventions that our SSH CA uses to specify what account / app / instance a user has access to. The way it actually does this is by creating a Docker container, with a volume that has a relocatable SSH daemon installed within it. This Dockerfile is in the project. Once this volume is mounted within the container, the executor sets up the config file, the principals allowed, and copies the allowed CA list from the host to the container. After this is done, it starts up a templated systemd unit, called "titus-sshd", which performs a custom nsenter, and sets the apparmor profile of the SSH'd users to the same as the container.

sargun · 2018-11-08T17:27:06Z

This commit adds an embedded ssh daemon per container. It uses an SSH
configuration which runs on port 7522 (the configuration is currently
embedded in the code).

The configuration sets up some principals, based on conventions that
our SSH CA uses to specify what account / app / instance a user has
access to.

The way it actually does this is by creating a Docker container, with a
volume that has a relocatable SSH daemon installed within it. This
Dockerfile is in the project. Once this volume is mounted within the
container, the executor sets up the config file, the principals allowed,
and copies the allowed CA list from the host to the container.

After this is done, it starts up a templated systemd unit, called
"titus-sshd", which performs a custom nsenter, and sets the apparmor
profile of the SSH'd users to the same as the container.

sargun · 2018-11-08T17:27:11Z

@gabrielhartmann ^

gabrielhartmann · 2018-11-08T17:31:09Z

Thanks @sargun. So what happens when the executor crashes? Leaked systemd service?

sargun · 2018-11-08T18:20:58Z

Are you asking what happens when the executor crash-stops, or when it gets into undefined state? @gabrielhartmann

gabrielhartmann · 2018-11-08T18:39:32Z

crash-stops. Really my question is, do these systemd services get cleaned up on expected failures?

sargun · 2018-11-08T18:47:25Z

I'm glad you asked.
So, the executor mount /proc/$PID1OFCONTAINER on a directory, which is like /var/lib/titus-inits/%i (where %i is the Task ID)

So, at the (Re)start of the systemd unit, it checks for the presence of the /var/lib/titus-inits/%i/ns (https://github.com/Netflix/titus-executor/pull/197/files#diff-8351ac04b205be9439973e0d6b78e916R3), which checks if the unit is alive.

Since we join the pid namespace of the task using nsenter, the task will actually stop when the pid namespace of the container stops.

So, basically the crash-stop behaviour should be something like this:

Executor crashes
Mesos realizes this, and kills all processes in pid cgroup for the container
The pid namespace of the container gets torn down, because PID 1 in the container's pid namespace exits
All processes (including SSHD) stop
Systemd thinks about restarting the unit, but then realizes that the PID 1 of the container is dead, and since it's an ephemeral / templatized systemd unit, it goes away.

gabrielhartmann · 2018-11-08T19:23:23Z

Does the foo.service file get leaked? I assume it's still on a filesystem if systemd is able "think about" restarting the unit. Not a big deal if it is leaked as they're uniquely named correct?

sargun · 2018-11-08T19:47:13Z

No, it's a template unit. So, it doesn't leak.

…

On Thu, Nov 8, 2018 at 11:23 AM Gabriel Hartmann ***@***.***> wrote: Does the foo.service file get leaked? I assume it's still on a filesystem if systemd is able "think about" restarting the unit. Not a big deal if it is leaked as they're uniquely named correct? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#197 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAtyRGqZ6btgOgJY43oUpl4fwMqFYBd3ks5utIStgaJpZM4YRPs5> .

sargun · 2018-11-08T20:07:42Z

Templated units by themselves cannot start, they need an "instance" to be started. The instance lifecycle is tied to the lifecycle (whether or not it's up) of the PID1.

rgulewich · 2018-11-09T16:53:03Z

executor/runtime/docker/docker_ssh.go

+		return err
+	}
+
+	caData, err := ioutil.ReadFile(c.Config.ContainerSSHDCAFile)


Is this shared between all of the containers?

So, the configuration can be set at executor start up time to point to a different file. It gets copied into the container at startup time, so if the file changes, the container wont reflect it.

I guess, different question, what is "this" that you refer to?

The CA file that this is pointing to. I'm wondering if it's one CA file used for all ssh containers on the host, or if it's intended to change per SSH container.

It's configurable, but by default it's one CA file for all containers on the host (we can override it via config, or mutable deploy)

rgulewich · 2018-11-09T16:55:53Z

executor/runtime/docker/docker.go

+	noDashes := strings.Replace(r.cfg.ContainerSSHDImage, ":", "-", -1)
+	noAts := strings.Replace(noDashes, "@", "-", -1)
+	noSlashes := strings.Replace(noAts, "/", "_", -1)
+	*containerName = "titus-sshd-" + noSlashes


Why is this based on the image, and not on the ID of the container that it's meant to ssh into?

because this container / volume is shared amongst all containers that refer to the same image. As you can see, it's not deleted / terminated. This way, we can reuse the container for "VolumesFrom"

So the container is shared, but there's a separate running sshd executable per container to ssh into (as per the titus-sshd systemd unit below)?

rgulewich · 2018-11-09T17:46:15Z

@sargun Could you give a quick overview of how IAM accounts, certs, and ssh user accounts tie together? As an app owner, what do I need to configure to get ssh access, and what accounts can I ssh in as?

rgulewich · 2018-11-09T19:18:38Z

executor/runtime/docker/docker.go

+	}
+
+	// Do this with backoff
+	timer := time.NewTicker(100 * time.Millisecond)


Does this ever stop? If the ContainerCreate call above fails for a reason that's not related to another container creation in progress (so it will never be created), what's the behaviour?

The timer itself will stop when we return from this function, see the defer.

The loop itself can terminate under one of two conditions:

The container shows up and responds to inspect

If the context is cancelled. The context descends from:

// Prepare host state (pull image, create fs, create container, etc...) for the container func (r *DockerRuntime) Prepare(parentCtx context.Context, c *runtimeTypes.Container, binds []string) error { // nolint: gocyclo var sshdContainerName string l := log.WithField("taskID", c.TaskID) l.WithField("prepareTimeout", r.dockerCfg.prepareTimeout).Info("Preparing container") ctx, cancel := context.WithTimeout(parentCtx, r.dockerCfg.prepareTimeout) defer cancel()

group, errGroupCtx := errgroup.WithContext(ctx)

So, if the errgroup has an error, the context is cancelled. Alternatively, if the func (r *Runner) prepareContainer(ctx context.Context, updateChan chan update) update times out, it cancels the context.

Also, if we get a message from the master saying to die, it exits.

Perfect, thanks.

sargun · 2018-11-10T00:33:33Z

@rgulewich that's managed by the CA infrastructure. It's...complicated.

rgulewich · 2018-11-12T16:28:58Z

inject/titus-nsenter.c

+		.nstype	= CLONE_NEWUTS,
+		.name	= "uts",
+	},
+	/* Order here is intentional. We do pid NS last. */


Why do we do pid last?

It opens up an exploit path if we do PID before any of the other ones. Specifically, the attack vector is that the container shows up in the PID namespace (/proc/...) in the container before it's changed over all its namespaces to container namespaces.

If someone breaks in using some new vector that we don't know of while this change is happening, they can traverse back up to the host OS's namespaces.

russell-lewis · 2018-11-12T17:49:34Z

executor/runtime/docker/docker_ssh.go

+Subsystem sftp /titus/sshd/usr/lib64/misc/sftp-server
+
+PidFile /run/sshd.pid
+`


sshd config lgtm.

russell-lewis · 2018-11-12T17:55:42Z

executor/runtime/docker/docker_ssh.go

+			fmt.Sprintf("%s:%s:%s", c.TitusInfo.GetAppName(), accountID, c.TaskID),              // key has access to any username for this given app in this given account, with this task ID
+			fmt.Sprintf("%s:%s", c.TitusInfo.GetAppName(), accountID),                           // key has access to any username for this given app in this given account
+			c.TaskID,                                 // key has access to any username on this task ID
+			fmt.Sprintf("%s:%s", username, c.TaskID), // key has access to this given username on this task ID


I'll get back to you on this later today.

sargun requested a review from rgulewich November 6, 2018 20:47

sargun force-pushed the sshd branch from c8f8d7f to f8933f4 Compare November 6, 2018 20:59

sargun force-pushed the sshd branch from 0067647 to 5127390 Compare November 6, 2018 21:32

sargun changed the title ~~Add first portion of container SSHd [WIP]~~ Add first portion of container SSHd Nov 8, 2018

sargun requested a review from gabrielhartmann November 8, 2018 01:21

sargun requested a review from russell-lewis November 8, 2018 01:26

sargun changed the title ~~Add first portion of container SSHd~~ Add first portion of container Nov 8, 2018

Sargun Dhillon and others added 2 commits November 8, 2018 09:20

Add titus-sshd Dockerfile

0608570

sargun changed the title ~~Add first portion of container~~ Add first portion of container SSH Daemon Nov 8, 2018

sargun force-pushed the sshd branch from caf6225 to 9403403 Compare November 8, 2018 17:26

rgulewich reviewed Nov 9, 2018

View reviewed changes

rgulewich reviewed Nov 12, 2018

View reviewed changes

rgulewich approved these changes Nov 12, 2018

View reviewed changes

sargun merged commit ac6235b into master Nov 12, 2018

sargun deleted the sshd branch November 12, 2018 18:11

russell-lewis reviewed Nov 12, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add first portion of container SSH Daemon #197

Add first portion of container SSH Daemon #197

sargun commented Nov 6, 2018

codecov bot commented Nov 6, 2018 •

edited

Loading

coveralls commented Nov 6, 2018 •

edited

Loading

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018

sargun commented Nov 8, 2018

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018 •

edited

Loading

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018

sargun commented Nov 8, 2018 via email

sargun commented Nov 8, 2018

rgulewich Nov 9, 2018

sargun Nov 9, 2018

sargun Nov 9, 2018

rgulewich Nov 9, 2018

sargun Nov 10, 2018

rgulewich Nov 9, 2018

sargun Nov 9, 2018

rgulewich Nov 9, 2018

sargun Nov 10, 2018

rgulewich commented Nov 9, 2018

rgulewich Nov 9, 2018

sargun Nov 10, 2018

rgulewich Nov 12, 2018

sargun commented Nov 10, 2018

rgulewich Nov 12, 2018

sargun Nov 12, 2018

russell-lewis Nov 12, 2018

russell-lewis Nov 12, 2018

Add first portion of container SSH Daemon #197

Add first portion of container SSH Daemon #197

Conversation

sargun commented Nov 6, 2018

codecov bot commented Nov 6, 2018 • edited Loading

Codecov Report

coveralls commented Nov 6, 2018 • edited Loading

Pull Request Test Coverage Report for Build 2472

💛 - Coveralls

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018

sargun commented Nov 8, 2018

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018 • edited Loading

sargun commented Nov 8, 2018

gabrielhartmann commented Nov 8, 2018

sargun commented Nov 8, 2018 via email

sargun commented Nov 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgulewich commented Nov 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sargun commented Nov 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 6, 2018 •

edited

Loading

coveralls commented Nov 6, 2018 •

edited

Loading

gabrielhartmann commented Nov 8, 2018 •

edited

Loading