Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dscreate: "start = False" and "systemd = False" have no effect #6099

Open
minfrin opened this issue Feb 18, 2024 · 8 comments · May be fixed by #6101
Open

dscreate: "start = False" and "systemd = False" have no effect #6099

minfrin opened this issue Feb 18, 2024 · 8 comments · May be fixed by #6101
Labels
needs triage The issue will be triaged during scrum

Comments

@minfrin
Copy link
Contributor

minfrin commented Feb 18, 2024

Issue Description
When automating the configuration of 389ds, it is not possible to prevent the server being started via systemd's systemctl start.

This creates a deadlock as dscreate's hidden systemd calls collide with automation systemd.

Package Version and Platform:

  • Platform: RHEL9
  • Package and version: 389-ds-base-2.3.6-3.el9.x86_64
  • Browser

Steps to Reproduce
Steps to reproduce the behavior:

  1. Add start = False and systemd = False to slapd-test.inf
  2. Run "/usr/sbin/dscreate from-file /etc/dirsrv/slapd-test.inf"
  3. dscreate hangs, because it just executed "systemctl start dirsrv@test" exactly as it was told not to, inside a systemd service that runs before dirsrv@test.

Expected results
Systemd is not touched, 389ds is not started, exactly as the options require.

Additional context
This problem has come up a few times:

https://bugzilla.redhat.com/show_bug.cgi?id=1872910
#5452

The "systemd" option is broken because it's conditional on a variable called self.containerised:

if self.containerised:

A search through the code for the "start" option shows that this option is never parsed.

@minfrin minfrin added the needs triage The issue will be triaged during scrum label Feb 18, 2024
@minfrin
Copy link
Contributor Author

minfrin commented Feb 18, 2024

Over time, the general['systemd'] call has been handled sanely, then commented out, then commented back in but inside the containerised if.

It looks like the work to create dscontainer broke dscreate.

https://github.com/389ds/389-ds-base/blame/7ffb2eb118acd83f0b941f69bf4a460d91240efd/src/lib389/lib389/instance/setup.py#L782

@vashirov
Copy link
Member

It would really help us to understand your use case and how do you want to use automation.

If I add start = False, the server is not started after the instance creation. However, the systemd service is created and enabled:

# systemctl status dirsrv@localhost | head -n5
○ dirsrv@localhost.service - 389 Directory Server localhost.
     Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/dirsrv@.service.d
             └─custom.conf
     Active: inactive (dead) since Mon 2024-02-19 03:58:34 EST; 3min 0s ago

Do you also want the service remain disabled?

Also, systemd key is not present in the template, that is created by dscreate:

# dscreate create-template | grep systemd -c
0

Because it's an implementation detail and it's not supposed to be controlled by the user. It's present in the defaults.inf file to indicate that the target platform that was used for the build supports systemd and lib389 can use it to create unit files, start/stop service via systemctl. RHEL has systemd, so systemd = True is present and should not be modified in the template. Otherwise, you will get unexpected results.

@minfrin
Copy link
Contributor Author

minfrin commented Feb 19, 2024

It would really help us to understand your use case and how do you want to use automation.

The automation of the directory server happens inside a systemd service, which is set using systemd's "Before" functionality to run before dirsrv@test (actually dirsrv@%i, but you get the idea).

dscreate creates a problem because hidden under the hood, ignoring two explicit flags telling dscreate to absolutely do not do this, dscreate starts up the service using systemd on this line of the code:

subprocess.check_output(["systemctl", "start", "dirsrv@%s" % self.serverid], stderr=subprocess.STDOUT)

If I add start = False, the server is not started after the instance creation. However, the systemd service is created and enabled:

The service is indeed started - I know because it triggers a deadlock as below, and because the code tells me it's started:

[root@seawitch ~]# systemctl status dirsrv-autodiscovery@test
● dirsrv-autodiscovery@test.service - 389 Directory Server Autodiscovery test.
     Loaded: loaded (/usr/lib/systemd/system/dirsrv-autodiscovery@.service; disabled; preset: disabled)
     Active: activating (start) since Sun 2024-02-18 21:36:17 SAST; 14h ago
   Main PID: 5154 (bash)
      Tasks: 10 (limit: 100377)
     Memory: 53.2M
        CPU: 15.037s
     CGroup: /system.slice/system-dirsrv\x2dautodiscovery.slice/dirsrv-autodiscovery@test.service
             ├─5154 /bin/bash /usr/libexec/device-autodiscovery/389ds start test
             ├─5158 /bin/bash /usr/libexec/device-autodiscovery/389ds start test
             ├─5160 logger -t /usr/libexec/device-autodiscovery/389ds
             ├─5169 /bin/bash /usr/libexec/device-autodiscovery/389ds start test
             ├─5175 /bin/bash /usr/libexec/device-autodiscovery/389ds.d/05-create start test
             ├─5176 /bin/bash /usr/libexec/device-autodiscovery/389ds.d/05-create start test
             ├─5178 logger -t /usr/libexec/device-autodiscovery/389ds.d/05-create
             ├─5180 /bin/bash /usr/libexec/device-autodiscovery/389ds.d/05-create start test
             ├─5196 /usr/bin/python3 /usr/sbin/dscreate from-file /etc/dirsrv/slapd-test.inf
             └─5357 systemctl start dirsrv@test <--- XXXX started by dscreate when it should not have been

Feb 18 21:36:17 seawitch systemd[1]: Starting 389 Directory Server Autodiscovery test....
Feb 18 21:36:17 seawitch dirsrv-autodiscovery[5184]: creating dirsrv server instance test...
Feb 18 21:36:17 seawitch bash[5180]: Creating test under
Feb 18 21:36:17 seawitch bash[5180]: Notice: Creating test under
Feb 18 21:36:17 seawitch bash[5196]: Starting installation ...
Feb 18 21:36:17 seawitch bash[5196]: Validate installation settings ...
Feb 18 21:36:17 seawitch bash[5196]: Create file system structures ...
Feb 18 21:36:18 seawitch bash[5196]: Perform SELinux labeling ...
XXXX deadlocked here XXXX

We also have dscreate arbitrarily deciding that this particular instance should be started at next boot, which is none of dscreate's business. In our case we have "create a directory server but leave it in a disabled state". Obviously I can work around this by disabling the service afterwards, but this is ugly.

Because it's an implementation detail and it's not supposed to be controlled by the user. It's present in the defaults.inf file to indicate that the target platform that was used for the build supports systemd and lib389 can use it to create unit files, start/stop service via systemctl. RHEL has systemd, so systemd = True is present and should not be modified in the template. Otherwise, you will get unexpected results.

dscreate used to be a perl script, and has been replaced by the current python implementation. Unfortunately the python code didn't follow compatibility with the perl code, and this appears to be why things don't work.

My biggest need is for the start flag to be properly implemented so the perl code behaviour returns.

Systemd is currently needed, but beggars can't be choosers and if I have to manually recreate the missing systemd bit like tmpfiles.d config then so be it.

@vashirov
Copy link
Member

The automation of the directory server happens inside a systemd service, which is set using systemd's "Before" functionality to run before dirsrv@test (actually dirsrv@%i, but you get the idea).

Thank you for the clarification. This looks like a corner case to me and is definitely something we do not test against.

dscreate creates a problem because hidden under the hood, ignoring two explicit flags telling dscreate to absolutely do not do this, dscreate starts up the service using systemd on this line of the code:

start = False works as expected: instance is not running after the installation, it says nothing about starting the server during the installation.

# start (bool)
# Description: Starts the instance after the install completes. If false, the instance is created but not started.
# Default value: True
;start = True

But with systemd = False there is indeed a bug. It works only during the containerized installation using dscontainer and not dscreate. I think we should allow this in dscreate too, since there is a use case such as yours.

dscreate used to be a perl script, and has been replaced by the current python implementation. Unfortunately the python code didn't follow compatibility with the perl code, and this appears to be why things don't work.

Hyrum's law strikes again :)

My biggest need is for the start flag to be properly implemented so the perl code behaviour returns.

I will submit a PR to fix the systemd override, but in the meantime you can set
with_systemd to 0 in /usr/share/dirsrv/inf/defaults.inf and change it back to 1 after the installation.

vashirov added a commit to vashirov/389-ds-base that referenced this issue Feb 19, 2024
Bug Description:
`systemd = False` doesn't override `with_systemd = 1` from
`defaults.inf` when used with `dscreate`. It is only effective when
setup is running in a containerized environment (via `dscontainer`).
But for some special use cases it's important that DS installation runs
without systemd.

Fix Description:
Remove the condition for overriding systemd flag.
`systemd` option is not exposed in the default template, it's listed
there only when `--advanced` flag is used, so it should not affect
regular installations.

Fixes: 389ds#6099
@vashirov vashirov linked a pull request Feb 19, 2024 that will close this issue
@minfrin
Copy link
Contributor Author

minfrin commented Feb 20, 2024

This should fix the systemd flag, that's perfect - much appreciated.

The start flag does appear to be implemented, but it starts-with-stops rather than no start.

if general['start']:

By way of background, we've been configuring services inside systemd in a just-in-time fashion so that all systems start the same way on a clean boot every time. To do this, there is one service before the main service that writes config files (in 389ds case runs dscreate if needed), and one service after the main service that writes to the main service (creates/destroys backends, etc). Systemd's Before and After keeps everything running in the correct order.

@fostermi
Copy link

The suggest solution to set with_systemd = 0 doesn't work for me. Attempting to create a custom Docker image using dscreate with an answer file results in

DEBUG: [Errno 2] No such file or directory: 'systemd-detect-virt'

during the build. I'm trying to create a barebones Docker image, but WITH all the custom plugin settings I want to use instead of having to orchestrate them outside of the container deployment.

@fostermi
Copy link

I think this line overwrites any changes I make to the defualts.

('LINE', 'with_systemd', '0'),

@fostermi
Copy link

fostermi commented Mar 15, 2024

Also, it seems that setting start = False has no effect when systemd is not running (ala in a container) and disbled. I get this message:

#11 0.577 DEBUG: DEBUG: starting with ['/usr/sbin/ns-slapd', '-D', '/etc/dirsrv/slapd-localhost', '-i', '/data/run/slapd-localhost.pid'

From this line

self.log.debug("DEBUG: starting with %s" % cmd)
because init.py checks to see if systemd is running and if not enters this block of code:

self.log.debug("systemd status -> False")

Is there no way to create a custom Docker image using dscreate like we used to do with setup.pl?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage The issue will be triaged during scrum
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants