Better systemd integration #5963

jtru · 2019-03-24T15:52:42Z

I believe this fixes #5826 while also reducing implementation complexity by dynamically re-using libsystemd.so components.

The patch as implemented only changes behaviour if the configuration specifies redis to be supervised by systemd explicitly. If the server is supervised by systemd "accidentally" only (i.e. by supervised auto being set and the redis unit file specifying Type=notify for the service), the current behaviour of reporting readiness very early after startup is preserved. This should prevent users who aren't aware of the concequences of having a systemd Type=notify service in operation from having to deal with any changes.

dlopen() and use libsystemd.so's sd_notify() to signal the systemd service manager what state redis-server is in. Also, if redis-server has been configured to be supervised by systemd explicitly (i.e. systemd supervision wasn't auto-detected by looking at the environment), report server status to systemd more correctly: The service status will be switched from "loading" to "ready" after redis-server is ready to actually serve client requests. TODO: Make sure redis-server correctly reports service readiness if configured as a replica.

I'm not 100% sure that having a "server.masterhost" alone is enough to make this decision, but all usage scenarios that I personally know and have operated should be covered by this, according to my testing.

ulidtko · 2019-03-28T13:05:31Z

Good one!

The use of STATUS=... is particularly good and I like it. (That shows up in systemctl status for those unfamiliar.)

However, I strongly disagree with the differing behavior of supervised systemd and supervised auto [in presence of systemd]. IMO, this is way more surprising than a change of currently-incorrect behavior.

Still, this PR is a good step towards fixing #5826 and I'd love to see what @antirez has to say about it.

antirez · 2019-03-28T18:24:40Z

Thanks, I wonder if @lamby has some wisdom about that.

antirez · 2019-03-28T18:25:47Z

Good one!

The use of STATUS=... is particularly good and I like it. (That shows up in systemctl status for those unfamiliar.)

However, I strongly disagree with the differing behavior of supervised systemd and supervised auto [in presence of systemd]. IMO, this is way more surprising than a change of currently-incorrect behavior.

Still, this PR is a good step towards fixing #5826 and I'd love to see what @antirez has to say about it.

Could you explain all this in terms of systemd illiterate persons? That would help, thanks!

lamby · 2019-03-28T18:37:18Z

Nothing particular from me, but +1 for generally better systemd integration but also +1 the -1 (hah!) regarding different behaviour... far too magical. :)

antirez · 2019-03-28T18:39:25Z

@jtru in light of two people saying -1 to the change of the behavior, would you be open to change the patch in that regard? (here I'm working as mediator because I know zero about systemd, yet want to merge that).

lamby · 2019-03-28T18:43:08Z

Bonus points for minimising the whitespace changes in the diff too!

jtru · 2019-03-28T21:09:36Z

Thank you all for chiming in on the matter! :) I apologize for my lenghty response, but I think it's necessary, in order to not make redis users potentially shoot themsevles in the foot...

@lamby and @antirez - I'm very open to make the changes you requested. I'd just like to have clarification on this - you would want the changed behaviour implemented, not the current behaviour (where "current" is "signal systemd that redis is ready to go without actually having loaded the dataset yet") while merely dlopen()ing libsystemd.so to do it, right? ;)

The original concern that caused me to stick to the split behaviour between auto-detection and explicit configuration is the following:

systemd services have a property named TimeoutStartUSec - it defines the maximum amount of time that may pass between requesting a service to start, and that service actually advancing to the ready state. On my system, that property defaults to 90 seconds. That's certainly not a problem with redis' current relaxed way of handling that, where READY=1 gets sent to systemd quickly.

However, with the new approach that my patch implements, redis won't progress to the ready state until the dataset has loaded (which, imo, is the correct way to behave). On our systems at work, bigger instances actually take more than 120 seconds to do that pretty regularly. With the default TimeoutStartUSec value (that most users won't change, and might not even be aware of), systemd would shoot that loading instance down with SIGTERM before it could finish loading, making the service fail. The only way to avoid this behaviour, afaik, is for the administrator to raise the timeout for that particular unit, or to change the system-wide default.

Now suppose a user has redis in operation with Type=notify and supervised auto in redis.conf (because that's supported since 3.2 or so, and it works for them - even though not ideally, and/or their distro shipped the redis service unit defined that way), and they're not aware of TimeoutStartUSec at all. The next time they'll start their large-ish instance, redis won't get the necessary work to signal "READY=1" done fast enough, and get killed. Everything has worked as designed and configured, but the user hasn't been aware of all the details. Downtime ensues.

Since redis.conf documents some of the limitiations of supervision that exist today, that comment could be extended to include a warning to keep an eye on the servce unit's TimeoutStartUSec as well for the supervised systemd option, and prevent such a thing from happening. I realize that it's probably unlikely for events to play out like this, but if you consider it a scenario that's worth trying to prevent, please let me know what you think!

(And now, some optional reading about systemd-related stuff, since @antirez requested it :))

systemd defines a protocol where a service can tell systemd what state (loading, ready, stopping, failed, ...) it's in its lifecycle, so that systemd can initiate or defer other actions (i.e. begin starting or stopping other service) based on that information. systemctl displays that state to the administrator, and also offers a simple mechanism for the supervised service to display concise status text (as opposed to emitting a log line/journal entry that might quickly scroll out of the last N lines of a buffer for a busy service) to the operator. It looks like this (watch out for the "Status:" line):

# systemctl -l status test-redis.service
● test-redis.service - Redis demo service for type=notify
   Loaded: loaded (/etc/systemd/system/test-redis.service; static; vendor preset: disabled)
   Active: activating (start) since Thu 2019-03-28 21:28:51 CET; 1s ago
 Main PID: 6050 (redis-server)
   Status: "Waiting for MASTER <-> REPLICA sync"
    Tasks: 4 (limit: 4915)
   Memory: 3.1M
   CGroup: /system.slice/test-redis.service
           └─6050 /home/colo/codebase/redis/src/redis-server 127.0.0.1:6379

lamby · 2019-03-28T21:20:27Z

I'm afraid this has very quickly got above what I know about systemd so will be unable to comment/review/etc. :)

jtru · 2019-03-28T21:27:44Z

Hm, I didn't mean to scare anyone away :/ I'll try to give a quick tl;dr overview: My patch makes it possible for redis to correctly inform systemd to start services that depend on it. However, for users where redis startup (= the time until clients can meaningfully be served after spawning) is slow, there's a config option in systemd (not redis) that might get the redis service killed during startup. That possible interaction should be hinted at somewhere.

lamby · 2019-03-28T21:32:13Z

Hm, I didn't mean to scare anyone away

Don't worry, I didn't take it that way. I just lack the systemd knowledge and, alas, I lack the time to invest in learning it to have any kind of +0 or -0 here. Good luck. :)

ulidtko · 2019-03-29T10:18:11Z

sd_notify crash course

So we want to deploy an application which uses Redis, let's call that mogrifier. How do we usually do that? We write an initscript, /etc/init.d/mogrifier — and then the classic SysV init system takes care of starting the process on boot, etc. With systemd, it's just the same: we write a unit file, /etc/systemd/system/mogrifier.service — then systemd knows how to start and babysit the service for us.

Both systemd units & initscripts can have service dependencies, whereby mogrifier.service can declare please don't ever start me before redis.service, I won't work in that case. A line which reads After=redis.service in mogrifier.service declares something close to that effect (the exact interpretation is messy, let me skip).

So all is fine and dandy, services can depend on each other, some order can be instilled into the chaotic and unpredictable boot process, and big swaths of reconnection&readiness-waiting crapcode can be not written in mogrify and other application services (who are the least concerned with system supervision, cannot properly do it and are not responsible for it). Right? Right, up until we ask ourselves: how does systemd know when redis.service has properly started and mogrifier.service can commence?

That's the crucial question, and the answer has two cases: Type=notify case, and Type=simple case. (The mandatory doc link for the "types" is man 5 systemd.service; it's not necessary to click it anyway).

Type=simple (Type=oneshot, Type=forking, Type=exec)

In this case, systemd has no choice but assume that service is ready as soon as its main process has started.

This sort-of works, except sometimes it doesn't. There's a scheduling race between mogrifier main() and redis main().

Said another way, there's a small window of time (hit by chance only sometimes) when
mogrifier can attempt to connect to redis which has not yet fully initialized (didn't reach listen() on the socket yet, hasn't loaded dataset yet, etc etc) — eventhough mogrifier has been started after redis by the init system, exactly as requested.

Type=notify

Hopefully, you can see the race problem above. The sd_notify API has the sole purpose to solve that. Systemd services declare they use this API by having a line Type=notify in the unit file.

In this case, the service itself tells to the supervisor when initial startup has completed — via sending READY=1.

Assuming service dependencies were configured correctly, this feature allows to completely eliminate the startup race problem, so BusyLoadingError never happens again in the clients. Which is immensely helpful (to some people).

@antirez @lamby please see the above, I wrote it specifically to help you review the PR. Questions welcome of course.

ulidtko · 2019-03-29T10:39:58Z

@jtru

[...] Everything has worked as designed and configured, but the user hasn't been aware of all the details. Downtime ensues.

I see your point; let me counter it like this:

this sort of downtime is trivial to diagnose & troubleshoot:

Job for foobar.service failed because a timeout was exceeded.

Plus crystal-clear logs, plus near-100% reproduceability in the scenarios like you describe;
it's just like any other bug/misconfig, I mean you have to test your upgrades before production;
the current as-I-hold-incorrect behavior which you propose to save in supervised auto is orders of magnitude more difficult to reproduce, diagnose and test.

I do recognize there are situations when correctness has to be sacrificed for backward compatibility. But this isn't one of those cases. Just choose your nightmare: explaining TimeoutStartUSec & how to CI and test VS hunting obscure races which magically disappear when you switch away from default config?..

lamby · 2019-03-29T12:50:54Z

I'm not sure why I'm being asked to review this. It should not block on me.

ulidtko · 2019-04-01T07:30:24Z

Hey @jtru, just to make it clear: looks like this PR is currently blocked on you.

@jtru in light of two people saying -1 to the change of the behavior, would you be open to change the patch in that regard? (here I'm working as mediator because I know zero about systemd, yet want to merge that).

I.e. the supervised auto behavior is to be changed.

As I see, there's no complaints about dlopen() and the deduplication of sd_notify logic (which is awesome!).

jtru · 2019-04-01T08:52:05Z

@antirez if you tell me to make the supervised auto (and having detected systemd as the acting service manager) and supervised systemd to work alike, I'll happily implement that. (I'm not yet sure that you actually favour the proposed behaviour over the current one.)

If you'd like, I would also include a word of warning in the default/sample redis.conf about the perils that systemd's TimeoutStartUSec may hold when systemd supervision is active. Maybe that's a fitting occasion to include a default/sample redis.service systemd unit, too (modeled after utils/redis_init_script)?

ulidtko · 2019-04-26T10:55:04Z

A sample redis.service to me looks pretty much essential. You can document in comments the need to increase start (and stop too?..) timeout, and actually do this by way of example.

jtru · 2019-04-26T11:12:13Z

I'll have some time to work on this over the upcoming weekend (UTC+2 here) - I'll try my best to make everyone involved in this ticket happy ;)

antirez · 2019-04-26T15:10:42Z

Thanks, just want to say that as @lamby I also lack the necessary systemd skill, so it's better to handle that in a "most of the community should be happy". Of course trying always to make simple stuff in the Redis style when possible. Thank you.

@tomcat1102

This commit fixes bug reported at redis#5998. Thanks to @tomcat1102.

…ge reply (or POSTPONED_ARRAY) when redis appends the blocked client reply list to the real client, it didn't bother to check if it is in fact the master client. so a slave executing that module command will send replies to the master, causing the master to send the slave error responses, which will mess up the replication offset (slave will advance it's replication offset, and the master does not)

Always make redis communicate its actual status to systemd according to the Type=Notify service readiness protocol (refer to sd_notify(3) for details), no matter if systemd supervision was configured or auto-detected.

jtru · 2019-04-27T15:18:18Z

Hrmm... as you can probably tell, I'm not very familiar with github's git workflow practises ;) I'm not sure how my history (actually, my PR's series of commits to apply) ended up like this, but I'll try to sort it out (if all else fails, I'd just create a new branch from current unstable with my changes and create a new PR, if that is acceptable?) asap.

As for the content of my changeset, this is what I intend to provide:

No difference in behaviour between auto-detected and manually configured systemd supervision
Issue a short warning for the user to take care of Timeout(Start|Stop)Sec whenever supervised by systemd
Prevent the redis install helper script from creating sysv initscripts on hosts that run systemd (and really should use service units instead)
Provide an example redis-server systemd service unit file in utils/, next to the redis_init_script example

jtru · 2019-04-27T16:46:06Z

I took the easy way out and created a new branch with new history, leading to the same result (but omitting the one unrelated whitespace change @lamby understandably did not like ;)) - it's to be found at #6052.

ulidtko · 2019-05-02T10:34:14Z

@jtru yea, I think you just did git pull, and that did git merge --ff and borked your branch for you 😃

I highly recommend people to git config --global branch.autosetuprebase always; you can repair this branch by:

git rebase origin/unstable
git push -f

guillemj · 2019-05-27T21:22:38Z

Hey, @jtru and @ulidtko have you considered sending EXTEND_TIMEOUT_USEC= notifications? That could be a way to perhaps solve the timeout control from the redis side. This is also one of the events supported too now by start-stop-daemon on sysvinit-based systems (see https://manpages.debian.org/unstable/start-stop-daemon).

guillemj · 2019-05-27T21:34:40Z

src/server.c

-
-    serverLog(LL_NOTICE, "supervised by systemd, will signal readiness");
-    if ((fd = socket(AF_UNIX, SOCK_DGRAM, 0)) == -1) {
+    handle = dlopen("libsystemd.so", RTLD_LAZY);


This is not a very good idea for various reasons. It will require systems to install the development packages to get the .so symlink to the current SONAME (say libsystemd.so.0). This also disregards any ABI, as we are using the non-versioned filename, which once libsystemd bumps its SOVERSION to say 1, it can happily crash or misbehave, etc. It also makes it very hard to spot when this needs to be updated, and no way to prevent loading the wrong library.

The only sane and safe way to use an external library in an optional way would be to create some kind of redis plugin which dynamically links at build-time against libsystemd.so, and then redis can safely use dlopen on its own plugin, because its ABI is guaranteed to match when built together, and the libsystemd ABI is respected because we are always linking against a versioned shared library.

ulidtko · 2019-05-28T09:27:12Z

Hey @guillemj,

have you considered sending EXTEND_TIMEOUT_USEC= notifications?

Ah yes, please see #6052, it's a re-make of this PR. Also, your comment on dlopen("libsystemd.so") has a point; could you please carry it over to #6052, and elaborate how to do this properly?..

@jtru see? This is the reason I'd use git push -f to my own fork-branch, instead of re-submitting another "v2" PR. Just to avoid fragmenting the discussion, like it starts to happen now. "The easy way out" starts to backfire :)

jtru · 2019-05-28T19:09:42Z

Please look at #6052 instead. I'm sorry for the confusion caused :/

jtru added 7 commits March 24, 2019 14:48

Move redisCommunicateSystemd() prototype to the proper header file

1746a46

Fix whitespace - change tabs to spaces

d441c5a

Report systemd service readiness when replica syncing finishes

51e2d2e

Improve systemd service status string

45166ae

If running as replica, let replication status set service readiness

0d52e7f

I'm not 100% sure that having a "server.masterhost" alone is enough to make this decision, but all usage scenarios that I personally know and have operated should be covered by this, according to my testing.

Trim gratuitous newline

6a13150

antirez and others added 5 commits April 27, 2019 17:06

Fix assert comparison in fetchClusterSlotsConfiguration().

19822ee

ACL: regression test for redis#5998.

ef7dda8

ACL: Fix memory leak in ACLResetSubcommandsForCommand().

036f4a3

This commit fixes bug reported at redis#5998. Thanks to @tomcat1102.

Aesthetic change to redis#5962 to conform to Redis style.

247ea0a

antirez and others added 12 commits April 27, 2019 17:06

Test: disable module testing for now.

c7f7ba6

fix: missing initialization.

57216c9

Add runtest-moduleapi with commandfilter coverage.

a98f5c7

removed obsolete warning as per - redis#5291

2f767a9

FIX: core dump in redis-benchmark when the -r is the last arg

6a252bc

Fix start and end key initialize

a6480e2

Fix memleak in bitfieldCommand

9525fc7

build fix

888cea0

If supervised by systemd, warn about Timeout(Start|Stop)Sec

e3100c4

Bail out of installation if systemd is the host's service manager

2e1ec5b

Provide an example systemd service unit file for redis-server

2663e15

jtru mentioned this pull request Apr 27, 2019

Better systemd integration v2 #6052

Merged

guillemj reviewed May 27, 2019

View reviewed changes

jtru closed this May 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better systemd integration #5963

Better systemd integration #5963

jtru commented Mar 24, 2019

ulidtko commented Mar 28, 2019 •

edited

antirez commented Mar 28, 2019

antirez commented Mar 28, 2019

lamby commented Mar 28, 2019

antirez commented Mar 28, 2019

lamby commented Mar 28, 2019

jtru commented Mar 28, 2019

lamby commented Mar 28, 2019

jtru commented Mar 28, 2019

lamby commented Mar 28, 2019

ulidtko commented Mar 29, 2019

ulidtko commented Mar 29, 2019

lamby commented Mar 29, 2019

ulidtko commented Apr 1, 2019

jtru commented Apr 1, 2019

ulidtko commented Apr 26, 2019

jtru commented Apr 26, 2019

antirez commented Apr 26, 2019

jtru commented Apr 27, 2019 •

edited

jtru commented Apr 27, 2019

ulidtko commented May 2, 2019

guillemj commented May 27, 2019

guillemj May 27, 2019

ulidtko commented May 28, 2019

jtru commented May 28, 2019

Better systemd integration #5963

Better systemd integration #5963

Conversation

jtru commented Mar 24, 2019

ulidtko commented Mar 28, 2019 • edited

antirez commented Mar 28, 2019

antirez commented Mar 28, 2019

lamby commented Mar 28, 2019

antirez commented Mar 28, 2019

lamby commented Mar 28, 2019

jtru commented Mar 28, 2019

lamby commented Mar 28, 2019

jtru commented Mar 28, 2019

lamby commented Mar 28, 2019

ulidtko commented Mar 29, 2019

sd_notify crash course

Type=simple (Type=oneshot, Type=forking, Type=exec)

Type=notify

ulidtko commented Mar 29, 2019

lamby commented Mar 29, 2019

ulidtko commented Apr 1, 2019

jtru commented Apr 1, 2019

ulidtko commented Apr 26, 2019

jtru commented Apr 26, 2019

antirez commented Apr 26, 2019

jtru commented Apr 27, 2019 • edited

jtru commented Apr 27, 2019

ulidtko commented May 2, 2019

guillemj commented May 27, 2019

guillemj May 27, 2019

Choose a reason for hiding this comment

ulidtko commented May 28, 2019

jtru commented May 28, 2019

ulidtko commented Mar 28, 2019 •

edited

jtru commented Apr 27, 2019 •

edited