Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Add first portion of container SSH Daemon #197

Merged
merged 2 commits into from
Nov 12, 2018
Merged

Add first portion of container SSH Daemon #197

merged 2 commits into from
Nov 12, 2018

Conversation

sargun
Copy link
Contributor

@sargun sargun commented Nov 6, 2018

This PR is a WIP of adding a thing that makes it so we can embed sshd into the container.

@sargun sargun requested a review from rgulewich November 6, 2018 20:47
@codecov
Copy link

codecov bot commented Nov 6, 2018

Codecov Report

Merging #197 into master will increase coverage by 0.78%.
The diff coverage is 77.31%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #197      +/-   ##
==========================================
+ Coverage   31.77%   32.55%   +0.78%     
==========================================
  Files          66       68       +2     
  Lines        8445     8592     +147     
==========================================
+ Hits         2683     2797     +114     
- Misses       5441     5458      +17     
- Partials      321      337      +16
Impacted Files Coverage Δ
config/config.go 97.44% <100%> (+0.34%) ⬆️
executor/runtime/docker/docker_linux.go 52.29% <50%> (-0.54%) ⬇️
executor/runtime/docker/docker_ssh.go 65.38% <65.38%> (ø)
executor/runtime/docker/docker_image_pull.go 73.17% <73.17%> (ø)
executor/runtime/docker/docker.go 52.04% <81.15%> (+1%) ⬆️
executor/runtime/types/types.go 77.77% <0%> (-2.03%) ⬇️

@coveralls
Copy link

coveralls commented Nov 6, 2018

Pull Request Test Coverage Report for Build 2472

  • 44 of 238 (18.49%) changed or added relevant lines in 5 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.1%) to 22.801%

Changes Missing Coverage Covered Lines Changed/Added Lines %
executor/runtime/docker/docker_linux.go 0 5 0.0%
executor/runtime/docker/docker_image_pull.go 16 58 27.59%
executor/runtime/docker/docker_ssh.go 0 62 0.0%
executor/runtime/docker/docker.go 0 85 0.0%
Files with Coverage Reduction New Missed Lines %
executor/runtime/docker/docker.go 1 5.56%
executor/runtime/types/types.go 2 44.09%
Totals Coverage Status
Change from base Build 2367: -0.1%
Covered Lines: 2501
Relevant Lines: 10969

💛 - Coveralls

@sargun sargun changed the title Add first portion of container SSHd [WIP] Add first portion of container SSHd Nov 8, 2018
@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018

@rgulewich this is ready-to-review.

@gabrielhartmann
Copy link
Contributor

@sargun : could you please provide a little more description about what this PR does? For instance, does this intend to take care of the lifecycle of the systemd services you'll be launching or just provide the capability to launch these services?

@sargun sargun changed the title Add first portion of container SSHd Add first portion of container Nov 8, 2018
Sargun Dhillon and others added 2 commits November 8, 2018 09:20
This commit adds an embedded ssh daemon per container. It uses an SSH
configuration which runs on port 7522 (the configuration is currently
embedded in the code).

The configuration sets up some principals, based on conventions that
our SSH CA uses to specify what account / app / instance a user has
access to.

The way it actually does this is by creating a Docker container, with a
volume that has a relocatable SSH daemon installed within it. This
Dockerfile is in the project. Once this volume is mounted within the
container, the executor sets up the config file, the principals allowed,
and copies the allowed CA list from the host to the container.

After this is done, it starts up a templated systemd unit, called
"titus-sshd", which performs a custom nsenter, and sets the apparmor
profile of the SSH'd users to the same as the container.
@sargun sargun changed the title Add first portion of container Add first portion of container SSH Daemon Nov 8, 2018
@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018

This commit adds an embedded ssh daemon per container. It uses an SSH
configuration which runs on port 7522 (the configuration is currently
embedded in the code).

The configuration sets up some principals, based on conventions that
our SSH CA uses to specify what account / app / instance a user has
access to.

The way it actually does this is by creating a Docker container, with a
volume that has a relocatable SSH daemon installed within it. This
Dockerfile is in the project. Once this volume is mounted within the
container, the executor sets up the config file, the principals allowed,
and copies the allowed CA list from the host to the container.

After this is done, it starts up a templated systemd unit, called
"titus-sshd", which performs a custom nsenter, and sets the apparmor
profile of the SSH'd users to the same as the container.

@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018

@gabrielhartmann ^

@gabrielhartmann
Copy link
Contributor

Thanks @sargun. So what happens when the executor crashes? Leaked systemd service?

@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018

Are you asking what happens when the executor crash-stops, or when it gets into undefined state? @gabrielhartmann

@gabrielhartmann
Copy link
Contributor

gabrielhartmann commented Nov 8, 2018

crash-stops. Really my question is, do these systemd services get cleaned up on expected failures?

@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018

I'm glad you asked.
So, the executor mount /proc/$PID1OFCONTAINER on a directory, which is like /var/lib/titus-inits/%i (where %i is the Task ID)

So, at the (Re)start of the systemd unit, it checks for the presence of the /var/lib/titus-inits/%i/ns (https://github.com/Netflix/titus-executor/pull/197/files#diff-8351ac04b205be9439973e0d6b78e916R3), which checks if the unit is alive.

Since we join the pid namespace of the task using nsenter, the task will actually stop when the pid namespace of the container stops.

So, basically the crash-stop behaviour should be something like this:

  1. Executor crashes
  2. Mesos realizes this, and kills all processes in pid cgroup for the container
  3. The pid namespace of the container gets torn down, because PID 1 in the container's pid namespace exits
  4. All processes (including SSHD) stop
  5. Systemd thinks about restarting the unit, but then realizes that the PID 1 of the container is dead, and since it's an ephemeral / templatized systemd unit, it goes away.

@gabrielhartmann
Copy link
Contributor

Does the foo.service file get leaked? I assume it's still on a filesystem if systemd is able "think about" restarting the unit. Not a big deal if it is leaked as they're uniquely named correct?

@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018 via email

@sargun
Copy link
Contributor Author

sargun commented Nov 8, 2018

Templated units by themselves cannot start, they need an "instance" to be started. The instance lifecycle is tied to the lifecycle (whether or not it's up) of the PID1.

return err
}

caData, err := ioutil.ReadFile(c.Config.ContainerSSHDCAFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this shared between all of the containers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the configuration can be set at executor start up time to point to a different file. It gets copied into the container at startup time, so if the file changes, the container wont reflect it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, different question, what is "this" that you refer to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CA file that this is pointing to. I'm wondering if it's one CA file used for all ssh containers on the host, or if it's intended to change per SSH container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's configurable, but by default it's one CA file for all containers on the host (we can override it via config, or mutable deploy)

noDashes := strings.Replace(r.cfg.ContainerSSHDImage, ":", "-", -1)
noAts := strings.Replace(noDashes, "@", "-", -1)
noSlashes := strings.Replace(noAts, "/", "_", -1)
*containerName = "titus-sshd-" + noSlashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this based on the image, and not on the ID of the container that it's meant to ssh into?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this container / volume is shared amongst all containers that refer to the same image. As you can see, it's not deleted / terminated. This way, we can reuse the container for "VolumesFrom"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the container is shared, but there's a separate running sshd executable per container to ssh into (as per the titus-sshd systemd unit below)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@rgulewich
Copy link
Contributor

@sargun Could you give a quick overview of how IAM accounts, certs, and ssh user accounts tie together? As an app owner, what do I need to configure to get ssh access, and what accounts can I ssh in as?

}

// Do this with backoff
timer := time.NewTicker(100 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever stop? If the ContainerCreate call above fails for a reason that's not related to another container creation in progress (so it will never be created), what's the behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer itself will stop when we return from this function, see the defer.

The loop itself can terminate under one of two conditions:

  1. The container shows up and responds to inspect
  2. If the context is cancelled. The context descends from:
// Prepare host state (pull image, create fs, create container, etc...) for the container
func (r *DockerRuntime) Prepare(parentCtx context.Context, c *runtimeTypes.Container, binds []string) error { // nolint: gocyclo
	var sshdContainerName string
	l := log.WithField("taskID", c.TaskID)
	l.WithField("prepareTimeout", r.dockerCfg.prepareTimeout).Info("Preparing container")

	ctx, cancel := context.WithTimeout(parentCtx, r.dockerCfg.prepareTimeout)
	defer cancel()

group, errGroupCtx := errgroup.WithContext(ctx)

So, if the errgroup has an error, the context is cancelled. Alternatively, if the func (r *Runner) prepareContainer(ctx context.Context, updateChan chan update) update times out, it cancels the context.

Also, if we get a message from the master saying to die, it exits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks.

@sargun
Copy link
Contributor Author

sargun commented Nov 10, 2018

@rgulewich that's managed by the CA infrastructure. It's...complicated.

.nstype = CLONE_NEWUTS,
.name = "uts",
},
/* Order here is intentional. We do pid NS last. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do pid last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It opens up an exploit path if we do PID before any of the other ones. Specifically, the attack vector is that the container shows up in the PID namespace (/proc/...) in the container before it's changed over all its namespaces to container namespaces.

If someone breaks in using some new vector that we don't know of while this change is happening, they can traverse back up to the host OS's namespaces.

@sargun sargun merged commit ac6235b into master Nov 12, 2018
@sargun sargun deleted the sshd branch November 12, 2018 18:11
Subsystem sftp /titus/sshd/usr/lib64/misc/sftp-server

PidFile /run/sshd.pid
`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sshd config lgtm.

fmt.Sprintf("%s:%s:%s", c.TitusInfo.GetAppName(), accountID, c.TaskID), // key has access to any username for this given app in this given account, with this task ID
fmt.Sprintf("%s:%s", c.TitusInfo.GetAppName(), accountID), // key has access to any username for this given app in this given account
c.TaskID, // key has access to any username on this task ID
fmt.Sprintf("%s:%s", username, c.TaskID), // key has access to this given username on this task ID

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll get back to you on this later today.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants