Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: podman pod start cant yet be used from systemd #4433

Open
zem opened this issue Nov 3, 2019 · 6 comments

Comments

@zem
Copy link

@zem zem commented Nov 3, 2019

While i was playing around with podman I wondered why it is that systemd startup is so overcomplicated.

My problem is that I can either start single containers by writing a systemd init script for each container or i can use generate, which i have to redo each and every time a container upgrades.

So why is it that I cant just write an systemd template like so:

[Unit]
Description=Podman POD %I
After=network.target

[Service]
KillMode=none
ExecStartPre=/usr/bin/podman pod exists %i
ExecStart=/usr/bin/podman pod start %i
ExecStop=/usr/bin/podman pod stop -t 10 %i

[Install]
WantedBy=multi-user.target

Once I have configured my pods I can just start them by using: systemctl enable pod@foo ; systemctl start pod@foo

It turns out that systemd needs to stay attached to one container to detect if the service is running, therefore I wrote a little workaround like this:

#!/bin/bash
POD="${1}"
echo podman pod start "${POD}"
podman pod start "${POD}"
JSON=$(podman pod inspect "${POD}" | grep infraContainerID |  head -n 1)
ID=${JSON%\"}
ID=${ID##*: \"}
echo "${JSON}"
echo podman attach "${ID}"
exec podman attach "${ID}"

and modified pod@.service:

ExecStart=/etc/podman-pod-start_attached.sh %i

My proposal for an elegant solution would be to add a parameter --attach_to_infra to podman pod start which attaches STDIN and STDOUT to the Infra container. And to add pod@.service to the upstream code.

@vrothberg

This comment has been minimized.

Copy link
Member

@vrothberg vrothberg commented Nov 3, 2019

Hi @zem, thanks for reaching out. Have you looked into using podman generate systemd? It will generate a set of services for a pod. Not only for the infra container but also for the containers, to allow for individual restart policies and proper dependency management via systemd.

@zem

This comment has been minimized.

Copy link
Author

@zem zem commented Nov 3, 2019

Hi @vrothberg yes I have looked into generate if that was not clear from my second paragraph.
I have some problems with podman generate systemd:

First of all it generated all unit files cat together on stdout, leaving me up with an editor to tear them apart. (i know i can do that with a script, too)

Secondly, those init scripts are pinned to the container-ids and those are changing weekly if not more frequently as containers are immutable. Which means I have to tear down the old container unit files and replace them with the new ones in the process.

Not to mention that the unit file names change with the container IDs.

You can now esimate the lines of code a user/sysadmin has to implement to render those service unit files whenever they need to be regenerated. It is probably even easier to write template them from scratch than to use podman generate systemd at all.

As for the restart policy. podman has one. During container creation you can set --restart=always causing podman to monitor if a container has crashed. This works more than sufficient for all of my current use cases. I can still write my own units to do dependency management if necessary but that is getting much easier with --attach_to_infra parameter as suggested.

@vrothberg

This comment has been minimized.

Copy link
Member

@vrothberg vrothberg commented Nov 3, 2019

First of all it generated all unit files cat together on stdout, leaving me up with an editor to tear them apart. (i know i can do that with a script, too)

There's a CLI flag --files which instructs Podman to generate files instead of printing on stdout. We added it to make it more user-friendly and to prevent users from having to untangle the big output.

Secondly, those init scripts are pinned to the container-ids and those are changing weekly if not more frequently as containers are immutable. Which means I have to tear down the old container unit files and replace them with the new ones in the process.

There's the --name flag which will instruct Podman to use the name instead of the ID of a pod/container. However, any generated file will still be specific to each individual (infra) container. This is because we need the PIDFile to point to the conmon process of each container, so systemd can actually know if the container/pod is running or not.

PIDFile=/run/user/1000/overlay-containers/63f940a01fce29f1def57ef7babcb82906047e7bc120b7a84af1f3d1809be8bf/userdata/conmon.pid

That's a snippet of a service pointing to such a PIDFile. I think it would be a nice improvement to be able to also have a path with the container-name embedded (optionally, maybe a symlink?) which would avoid the need to re-generate. @baude @mheon @rhatdan WDYT? This way, we don't need to regenerate the service files if a container gets updated but the name remains.

You can now esimate the lines of code a user/sysadmin has to implement to render those service unit files whenever they need to be regenerated. It is probably even easier to write template them from scratch than to use podman generate systemd at all.

The process seems straight forward to me:

Whenever a pod/container is updated, we can run podman-generate-systemd --files --name and copy the generated files to the specific systemd (system or user) path (maybe creating a subdir for each pod to ease replacement), then reload the daemon to update the service files to finally start the service.

As for the restart policy. podman has one. During container creation you can set --restart=always causing podman to monitor if a container has crashed. This works more than sufficient for all of my current use cases.

One service file for a pod is not sufficient for systemd to know if the pod is running or not. Imagine a service-critical container of the Pod fails. With only one service file, systemd does not know about the failing container and can't enforce the restart policy. A systemctl status $service would falsely report the service to be running although maybe all containers except the infra-container have failed. That's why we decided to have one file for the infra-container which serves as the central unit of the pod, and one service for each container.

@zem

This comment has been minimized.

Copy link
Author

@zem zem commented Nov 3, 2019

Oh I somehow overlooked the --file option.

tbh i did not understand immediately why using --restart=always is not sufficient, but you are probably referring to those conmon processes (one for each container) I guess if one of those fails without warning it can cause trouble.

Anyway, having the pid file accessible under the container name sounds very good to me. :)

@mheon

This comment has been minimized.

Copy link
Collaborator

@mheon mheon commented Nov 3, 2019

I wonder if adding a container name -> ID symlink somewhere might be worthwhile - an easy filesystem way of identifying the specific container associated with a name... Probably not worth it unless we're going to expose a lot more via the filesystem than we do now, though.

@vrga

This comment has been minimized.

Copy link

@vrga vrga commented Nov 9, 2019

I'd like to add another potential lovely usecase that would be enabled by being able to just straight run from systemd.

[Unit]
Description="Pihole container service..."
Wants=network.target multi-user.target
After=network.target

[Service]
Type=forking
TimeoutSec=30
RestartSec=10
Restart=always
ExecStart=/usr/bin/podman container run \
  --rm \
  --name=%N \
  --network podman \
  -p 10.0.0.1:53:53/tcp -p 10.0.0.1:53:53/udp \
  -p 10.0.0.1:8091:80 \
  -p 10.0.0.1:8092:443 \
  -v /etc/localtime:/etc/localtime:ro \
  -v /etc/timezone:/etc/timezone:ro \
  -v "/opt/pihole/etc/pihole/:/etc/pihole/" \
  -v "/opt/pihole/etc/dnsmasq.d/:/etc/dnsmasq.d/" \
  -e ServerIP="10.8.0.1" \
  -e DNS1="1.1.1.1" \
  -e DNS2="8.8.8.8" \
  -e VIRTUAL_HOST="<redacted>" \
  pihole/pihole:latest

Ignoring the content of the launch itself, what it allows is running ephemeral containers as system services.

This pretty much allows me to run these services mostly hands-off, no need to update the units every time i update it or anything like that.

I've had this working somewhat with docker and using this tool, https://github.com/ibuildthecloud/systemd-docker, but it was always flaky.

One iffy problem with running podman generate is that its a bit iffy when you wish to add some customizations on top.

One idea that comes to mind is to somehow use
podman inspect pihole -t container --format "{{.ConmonPidFile}}" to provide systemd with the location of the pidfile immediately after run, if it'll allow it, but i've not got a faintest clue if systemd will actually allow it.

The biggest problem is that starting the container through systemd hangs both that invocation of systemctl and podman until some magical timeout gets hit.

Will drop a few more comments as i figure things out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.