Job pile runner

The document describes a runner for executing asynchronous job piles. The jobs are ran by the daemon, known as piled, while the management tasks of stating/stopping/monitoring are carried out using a command line tool pilec.

piled

Is a daemon that manages and supervises the jobs.

The daemon can be started as a unit:

[Unit]
Description=piled daemon

[Service]
ExecStart=/usr/bin/piled --address localhost:9999 --config /etc/piled/daemon.yaml
Delegate=true

Or through systemd-run:

$ sudo systemd-run -p Delegate=true -t --wait -- ./piled --config $PWD/server.yaml

The daemon allows fundamental operations of starting a job, stopping it, observing its output and querying the status.

Each job is assigned an ID by the daemon, which has the format pile-<uuid> and is returned to the client in the start response. Job ID needs to be used for all requests for that job.

The daemon uses a simple config file:

# server's certificate and key, can be an absolute path or relative wrt. this
# config file
cert: certs/server-cert.pem
key: certs/server-key.pem

# CA certificate that
ca-cert: certs/ca-cert.pem

# address to listen on, format accepted by URL parse, the hostname is ignored
listen-address: tcp://localhost:9999

Process isolation

Jobs are run as pid 1 (in their namespace). Note, that a new proc is mounted on top of /proc, hence other processes are not trivially discoverable.

Network isolation

The jobs are run in a separate network namespace. However, no means of network configuration are provided by piled. It is up to the job to set the network up as desired.

Mount isolation

The jobs are run in a separate mount namespace. For the purpose of mounting a new /proc, the propagation is changed recursively to private for / mount. Thus, changes to mounts will not be propagated from the mount ns to parent, and neither in reverse direction.

cgroups

The piled runner manages cgroups by itself. Because of this, it is assumed that piled is executed under a systemd scope/service with delegation turned on (.e. Delegate=true in the properties). This is mandatory for not interfering with systemd group management and vice versa. See https://systemd.io/CGROUP_DELEGATION/ for details.

The daemon sets up a hierarchy as follows, where piled.service is a delegation point from systemd.

-- piled.service -- runner -- 1234 (piled)
                 \- pile-<uuid> - 9999 (job1)
                 \- pile-<uuid> - 12341 (job2)
                 \- pile-<uuid> - 3333 (job3)

The leaf groups are prepared and their resource limits are configured by the runner. The processes are placed to cgroups through a shim whose job is to move the process to a leaf group, and exec into the provided command.

Communication and authentication

Communication with piled is done only over TCP INET sockets. Other means of communication are not supported.

Connections are authenticated using mTLS, with TLS set to version 1.3. Upon connection the server shall verify the client's certificate using using the system CA pool and whatever additional CA certificates were provided in the configuration. See the limitations section for further discussion.

Authorization

Authorization is based on simple tokens. The token can be a JWT with scopes that identify which operations are possible. For simplicity, the code uses hardcoded strings, with 2 known tokes, one for read-only access (output, status), and the other allowing read-write (start, stop, output, status). Token is included in each request. The demo code does not provide any means of 'generating' the tokens.

In theory if Unix socket communication was supported, the authorization scheme could be extended to observe peercred and allow read-write for all root owned processes.

Job management

Jobs are assumed to complete at some point. When requested to stop, for simplicity the whole cgroup gets killed through cgroup.kill, so that there are no runaway processes (with limitations as described at the end of the document).

When starting a job, its output is redirected to a file derived from the job's name. Piled maintains a storage location under /tmp/piled/output/. There is a single output for each job which collects both stdout and stderr (similarly to syslog/journal etc.). Stdin is connected to /dev/null. When streaming the job's output is delivered in chunks of any size. It is expected that the client will print the chunks directly to stdout (similarly to what journalctl would do) without any formatting or line separation.

No assumptions should be made about PATH or any other environment variable. Jobs are assumed to be specified using full command paths. Should there be a need to set specific environment bits, the command needs to be wrapped by shell, eg. `sh -c 'export FOO=bar; exec myprogram'``

pilec

Example usage. Start a trivial job:

$ ./pilec --config client.yaml \
    start -- /usr/bin/bash -c 'echo Hello from job'
pile.c74cb008-9b63-439c-a158-12cedcf41733

Start a long running job:

$ ./pilec --config client.yaml \
    start -- /usr/bin/bash -c 'while true; do echo "$(date) -- log"; sleep 5; done'
pile.7954ce77-4b7f-4fed-b6c6-0868174bf6dd

Query the status of a job:

$ ./pilec --config client.yaml  \
    status pile.7954ce77-4b7f-4fed-b6c6-0868174bf6dd
active
$ ./pilec --config client.yaml \
    status pile.4fc399e9-d813-4df2-b33d-24f876aa407e
exited
$ ./pilec --config client.yaml \
    status pile.15343990-15aa-4d80-a82b-8896ab1e1de9
failed (status=3)

See the output of a job:

$ ./pilec --config client.yaml \
    status pile.7954ce77-4b7f-4fed-b6c6-0868174bf6dd
Fri Apr  8 20:38:07 CEST 2022 -- log
Fri Apr  8 20:38:12 CEST 2022 -- log
Fri Apr  8 20:38:17 CEST 2022 -- log
...<output continues>

Stop a job:

$ ./pilec --config client.yaml stop pile.7954ce77-4b7f-4fed-b6c6-0868174bf6dd
stopped (status=0)

Uses a simple config file:

# client's certificate and key
cert: certs/client-cert.pem
key: certs/client-key.pem
# cert: testdata/client-cert.pem
# key: testdata/client-key.pem

# CA certificate
ca-cert: certs/ca-cert.pem

# server address, in the format accepted by grpc address parser
address: dns:localhost:9999

# authorization token
token: admin-token

Overrides for address and token can be passed in the command line as --address and --token parameters.

Other limitations

Resource control

Resource control is implemented on top of cgroup v2 controller. For simplicity the resource constraints are expressed in a format that is applicable to the *.max knob of a given controller, thus there's cpu.max, io.max, memory.max for each of the supported controllers.

While there are more knobs, implementing support for each and every one is outside of the scope of piled.

CGroup namespacing

The jobs are run inside the same cgroup namespace as the piled daemon, although in a per-job leaf group. Since the daemon is within the system hierarchy, the complete hierarchy will also be visible to the jobs, making it possible for rouge jobs to escape their original cgroup.

CGroup event notifications

No effort is made to observe cgroup event notifications. Any status updates happen when requested by the client.

Mount namespace, /tmp and /run (or /var/run)

A new clean tmpfs is not mounted in the job's mount namespace /tmp, /run or /var/run locations. This allows for sharing content with the jobs, but has a potential of triggering an information leak, eg. /tmp/.X11-unix is trivially accessible.

The same applies to /sys, or /dev.

LSM or other sandboxing techniques

No effort is made to sandbox the processes using LSM, Seccomp or caps. Everything runs with the same security context as piled.

Cross restart state

No effort is made to preserve state across piled restarts.

Authentication

Normally either CN or SANs would need to match the address on which the server listens, though for simplicity of starting the server on whatever host, the verification of the server's certificate does not include the name of the certificate, but rather assumes a valid certificate (given the system pool CAs or the supplementary certificate) with a well known CN=piled is good enough.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
certs		certs
cgroup		cgroup
cmd		cmd
pile		pile
runner		runner
testdata		testdata
testutils		testutils
utils		utils
LICENSE		LICENSE
README.md		README.md
client.yaml		client.yaml
go.mod		go.mod
go.sum		go.sum
server.yaml		server.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Job pile runner

piled

Process isolation

Network isolation

Mount isolation

cgroups

Communication and authentication

Authorization

Job management

pilec

Other limitations

Resource control

CGroup namespacing

CGroup event notifications

Mount namespace, /tmp and /run (or /var/run)

LSM or other sandboxing techniques

Cross restart state

Authentication

About

Releases

Packages

Languages

License

bboozzoo/piled

Folders and files

Latest commit

History

Repository files navigation

Job pile runner

piled

Process isolation

Network isolation

Mount isolation

cgroups

Communication and authentication

Authorization

Job management

pilec

Other limitations

Resource control

CGroup namespacing

CGroup event notifications

Mount namespace, /tmp and /run (or /var/run)

LSM or other sandboxing techniques

Cross restart state

Authentication

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages