Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better systemd integration #5963

Closed
wants to merge 24 commits into from
Closed

Conversation

jtru
Copy link
Contributor

@jtru jtru commented Mar 24, 2019

I believe this fixes #5826 while also reducing implementation complexity by dynamically re-using libsystemd.so components.

The patch as implemented only changes behaviour if the configuration specifies redis to be supervised by systemd explicitly. If the server is supervised by systemd "accidentally" only (i.e. by supervised auto being set and the redis unit file specifying Type=notify for the service), the current behaviour of reporting readiness very early after startup is preserved. This should prevent users who aren't aware of the concequences of having a systemd Type=notify service in operation from having to deal with any changes.

dlopen() and use libsystemd.so's sd_notify() to signal the systemd
service manager what state redis-server is in.

Also, if redis-server has been configured to be supervised by systemd
explicitly (i.e. systemd supervision wasn't auto-detected by looking at
the environment), report server status to systemd more correctly: The
service status will be switched from "loading" to "ready" after
redis-server is ready to actually serve client requests.

TODO: Make sure redis-server correctly reports service readiness if
configured as a replica.
I'm not 100% sure that having a "server.masterhost" alone is enough to
make this decision, but all usage scenarios that I personally know and
have operated should be covered by this, according to my testing.
@ulidtko
Copy link
Contributor

ulidtko commented Mar 28, 2019

Good one!

The use of STATUS=... is particularly good and I like it. (That shows up in systemctl status for those unfamiliar.)

However, I strongly disagree with the differing behavior of supervised systemd and supervised auto [in presence of systemd]. IMO, this is way more surprising than a change of currently-incorrect behavior.

Still, this PR is a good step towards fixing #5826 and I'd love to see what @antirez has to say about it.

@antirez
Copy link
Contributor

antirez commented Mar 28, 2019

Thanks, I wonder if @lamby has some wisdom about that.

@antirez
Copy link
Contributor

antirez commented Mar 28, 2019

Good one!

The use of STATUS=... is particularly good and I like it. (That shows up in systemctl status for those unfamiliar.)

However, I strongly disagree with the differing behavior of supervised systemd and supervised auto [in presence of systemd]. IMO, this is way more surprising than a change of currently-incorrect behavior.

Still, this PR is a good step towards fixing #5826 and I'd love to see what @antirez has to say about it.

Could you explain all this in terms of systemd illiterate persons? That would help, thanks!

@lamby
Copy link
Contributor

lamby commented Mar 28, 2019

Nothing particular from me, but +1 for generally better systemd integration but also +1 the -1 (hah!) regarding different behaviour... far too magical. :)

@antirez
Copy link
Contributor

antirez commented Mar 28, 2019

@jtru in light of two people saying -1 to the change of the behavior, would you be open to change the patch in that regard? (here I'm working as mediator because I know zero about systemd, yet want to merge that).

@lamby
Copy link
Contributor

lamby commented Mar 28, 2019

Bonus points for minimising the whitespace changes in the diff too!

@jtru
Copy link
Contributor Author

jtru commented Mar 28, 2019

Thank you all for chiming in on the matter! :) I apologize for my lenghty response, but I think it's necessary, in order to not make redis users potentially shoot themsevles in the foot...

@lamby and @antirez - I'm very open to make the changes you requested. I'd just like to have clarification on this - you would want the changed behaviour implemented, not the current behaviour (where "current" is "signal systemd that redis is ready to go without actually having loaded the dataset yet") while merely dlopen()ing libsystemd.so to do it, right? ;)

The original concern that caused me to stick to the split behaviour between auto-detection and explicit configuration is the following:

systemd services have a property named TimeoutStartUSec - it defines the maximum amount of time that may pass between requesting a service to start, and that service actually advancing to the ready state. On my system, that property defaults to 90 seconds. That's certainly not a problem with redis' current relaxed way of handling that, where READY=1 gets sent to systemd quickly.

However, with the new approach that my patch implements, redis won't progress to the ready state until the dataset has loaded (which, imo, is the correct way to behave). On our systems at work, bigger instances actually take more than 120 seconds to do that pretty regularly. With the default TimeoutStartUSec value (that most users won't change, and might not even be aware of), systemd would shoot that loading instance down with SIGTERM before it could finish loading, making the service fail. The only way to avoid this behaviour, afaik, is for the administrator to raise the timeout for that particular unit, or to change the system-wide default.

Now suppose a user has redis in operation with Type=notify and supervised auto in redis.conf (because that's supported since 3.2 or so, and it works for them - even though not ideally, and/or their distro shipped the redis service unit defined that way), and they're not aware of TimeoutStartUSec at all. The next time they'll start their large-ish instance, redis won't get the necessary work to signal "READY=1" done fast enough, and get killed. Everything has worked as designed and configured, but the user hasn't been aware of all the details. Downtime ensues.

Since redis.conf documents some of the limitiations of supervision that exist today, that comment could be extended to include a warning to keep an eye on the servce unit's TimeoutStartUSec as well for the supervised systemd option, and prevent such a thing from happening. I realize that it's probably unlikely for events to play out like this, but if you consider it a scenario that's worth trying to prevent, please let me know what you think!

(And now, some optional reading about systemd-related stuff, since @antirez requested it :))

systemd defines a protocol where a service can tell systemd what state (loading, ready, stopping, failed, ...) it's in its lifecycle, so that systemd can initiate or defer other actions (i.e. begin starting or stopping other service) based on that information. systemctl displays that state to the administrator, and also offers a simple mechanism for the supervised service to display concise status text (as opposed to emitting a log line/journal entry that might quickly scroll out of the last N lines of a buffer for a busy service) to the operator. It looks like this (watch out for the "Status:" line):

# systemctl -l status test-redis.service
● test-redis.service - Redis demo service for type=notify
   Loaded: loaded (/etc/systemd/system/test-redis.service; static; vendor preset: disabled)
   Active: activating (start) since Thu 2019-03-28 21:28:51 CET; 1s ago
 Main PID: 6050 (redis-server)
   Status: "Waiting for MASTER <-> REPLICA sync"
    Tasks: 4 (limit: 4915)
   Memory: 3.1M
   CGroup: /system.slice/test-redis.service
           └─6050 /home/colo/codebase/redis/src/redis-server 127.0.0.1:6379

@lamby
Copy link
Contributor

lamby commented Mar 28, 2019

I'm afraid this has very quickly got above what I know about systemd so will be unable to comment/review/etc. :)

@jtru
Copy link
Contributor Author

jtru commented Mar 28, 2019

Hm, I didn't mean to scare anyone away :/ I'll try to give a quick tl;dr overview: My patch makes it possible for redis to correctly inform systemd to start services that depend on it. However, for users where redis startup (= the time until clients can meaningfully be served after spawning) is slow, there's a config option in systemd (not redis) that might get the redis service killed during startup. That possible interaction should be hinted at somewhere.

@lamby
Copy link
Contributor

lamby commented Mar 28, 2019

Hm, I didn't mean to scare anyone away

Don't worry, I didn't take it that way. I just lack the systemd knowledge and, alas, I lack the time to invest in learning it to have any kind of +0 or -0 here. Good luck. :)

@ulidtko
Copy link
Contributor

ulidtko commented Mar 29, 2019

sd_notify crash course

So we want to deploy an application which uses Redis, let's call that mogrifier. How do we usually do that? We write an initscript, /etc/init.d/mogrifier — and then the classic SysV init system takes care of starting the process on boot, etc. With systemd, it's just the same: we write a unit file, /etc/systemd/system/mogrifier.service — then systemd knows how to start and babysit the service for us.

Both systemd units & initscripts can have service dependencies, whereby mogrifier.service can declare please don't ever start me before redis.service, I won't work in that case. A line which reads After=redis.service in mogrifier.service declares something close to that effect (the exact interpretation is messy, let me skip).

So all is fine and dandy, services can depend on each other, some order can be instilled into the chaotic and unpredictable boot process, and big swaths of reconnection&readiness-waiting crapcode can be not written in mogrify and other application services (who are the least concerned with system supervision, cannot properly do it and are not responsible for it). Right? Right, up until we ask ourselves: how does systemd know when redis.service has properly started and mogrifier.service can commence?

That's the crucial question, and the answer has two cases: Type=notify case, and Type=simple case. (The mandatory doc link for the "types" is man 5 systemd.service; it's not necessary to click it anyway).

Type=simple (Type=oneshot, Type=forking, Type=exec)

In this case, systemd has no choice but assume that service is ready as soon as its main process has started.

This sort-of works, except sometimes it doesn't. There's a scheduling race between mogrifier main() and redis main().

Said another way, there's a small window of time (hit by chance only sometimes) when
mogrifier can attempt to connect to redis which has not yet fully initialized (didn't reach listen() on the socket yet, hasn't loaded dataset yet, etc etc) — eventhough mogrifier has been started after redis by the init system, exactly as requested.

Type=notify

Hopefully, you can see the race problem above. The sd_notify API has the sole purpose to solve that. Systemd services declare they use this API by having a line Type=notify in the unit file.

In this case, the service itself tells to the supervisor when initial startup has completed — via sending READY=1.

Assuming service dependencies were configured correctly, this feature allows to completely eliminate the startup race problem, so BusyLoadingError never happens again in the clients. Which is immensely helpful (to some people).


@antirez @lamby please see the above, I wrote it specifically to help you review the PR. Questions welcome of course.

@ulidtko
Copy link
Contributor

ulidtko commented Mar 29, 2019

@jtru

[...] Everything has worked as designed and configured, but the user hasn't been aware of all the details. Downtime ensues.

I see your point; let me counter it like this:

  • this sort of downtime is trivial to diagnose & troubleshoot:

    Job for foobar.service failed because a timeout was exceeded.

    Plus crystal-clear logs, plus near-100% reproduceability in the scenarios like you describe;

  • it's just like any other bug/misconfig, I mean you have to test your upgrades before production;

  • the current as-I-hold-incorrect behavior which you propose to save in supervised auto is orders of magnitude more difficult to reproduce, diagnose and test.

I do recognize there are situations when correctness has to be sacrificed for backward compatibility. But this isn't one of those cases. Just choose your nightmare: explaining TimeoutStartUSec & how to CI and test VS hunting obscure races which magically disappear when you switch away from default config?..

@lamby
Copy link
Contributor

lamby commented Mar 29, 2019

I'm not sure why I'm being asked to review this. It should not block on me.

@ulidtko
Copy link
Contributor

ulidtko commented Apr 1, 2019

Hey @jtru, just to make it clear: looks like this PR is currently blocked on you.

@jtru in light of two people saying -1 to the change of the behavior, would you be open to change the patch in that regard? (here I'm working as mediator because I know zero about systemd, yet want to merge that).

I.e. the supervised auto behavior is to be changed.

As I see, there's no complaints about dlopen() and the deduplication of sd_notify logic (which is awesome!).

@jtru
Copy link
Contributor Author

jtru commented Apr 1, 2019

@antirez if you tell me to make the supervised auto (and having detected systemd as the acting service manager) and supervised systemd to work alike, I'll happily implement that. (I'm not yet sure that you actually favour the proposed behaviour over the current one.)

If you'd like, I would also include a word of warning in the default/sample redis.conf about the perils that systemd's TimeoutStartUSec may hold when systemd supervision is active. Maybe that's a fitting occasion to include a default/sample redis.service systemd unit, too (modeled after utils/redis_init_script)?

@ulidtko
Copy link
Contributor

ulidtko commented Apr 26, 2019

A sample redis.service to me looks pretty much essential. You can document in comments the need to increase start (and stop too?..) timeout, and actually do this by way of example.

@jtru
Copy link
Contributor Author

jtru commented Apr 26, 2019

I'll have some time to work on this over the upcoming weekend (UTC+2 here) - I'll try my best to make everyone involved in this ticket happy ;)

@antirez
Copy link
Contributor

antirez commented Apr 26, 2019

Thanks, just want to say that as @lamby I also lack the necessary systemd skill, so it's better to handle that in a "most of the community should be happy". Of course trying always to make simple stuff in the Redis style when possible. Thank you.

antirez and others added 5 commits April 27, 2019 17:06
…ge reply (or POSTPONED_ARRAY)

when redis appends the blocked client reply list to the real client, it didn't
bother to check if it is in fact the master client. so a slave executing that
module command will send replies to the master, causing the master to send the
slave error responses, which will mess up the replication offset
(slave will advance it's replication offset, and the master does not)
@jtru
Copy link
Contributor Author

jtru commented Apr 27, 2019

Hrmm... as you can probably tell, I'm not very familiar with github's git workflow practises ;) I'm not sure how my history (actually, my PR's series of commits to apply) ended up like this, but I'll try to sort it out (if all else fails, I'd just create a new branch from current unstable with my changes and create a new PR, if that is acceptable?) asap.

As for the content of my changeset, this is what I intend to provide:

  1. No difference in behaviour between auto-detected and manually configured systemd supervision
  2. Issue a short warning for the user to take care of Timeout(Start|Stop)Sec whenever supervised by systemd
  3. Prevent the redis install helper script from creating sysv initscripts on hosts that run systemd (and really should use service units instead)
  4. Provide an example redis-server systemd service unit file in utils/, next to the redis_init_script example

@jtru
Copy link
Contributor Author

jtru commented Apr 27, 2019

I took the easy way out and created a new branch with new history, leading to the same result (but omitting the one unrelated whitespace change @lamby understandably did not like ;)) - it's to be found at #6052.

@ulidtko
Copy link
Contributor

ulidtko commented May 2, 2019

@jtru yea, I think you just did git pull, and that did git merge --ff and borked your branch for you 😃

I highly recommend people to git config --global branch.autosetuprebase always; you can repair this branch by:

  • git rebase origin/unstable
  • git push -f

@guillemj
Copy link
Contributor

Hey, @jtru and @ulidtko have you considered sending EXTEND_TIMEOUT_USEC= notifications? That could be a way to perhaps solve the timeout control from the redis side. This is also one of the events supported too now by start-stop-daemon on sysvinit-based systems (see https://manpages.debian.org/unstable/start-stop-daemon).


serverLog(LL_NOTICE, "supervised by systemd, will signal readiness");
if ((fd = socket(AF_UNIX, SOCK_DGRAM, 0)) == -1) {
handle = dlopen("libsystemd.so", RTLD_LAZY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a very good idea for various reasons. It will require systems to install the development packages to get the .so symlink to the current SONAME (say libsystemd.so.0). This also disregards any ABI, as we are using the non-versioned filename, which once libsystemd bumps its SOVERSION to say 1, it can happily crash or misbehave, etc. It also makes it very hard to spot when this needs to be updated, and no way to prevent loading the wrong library.

The only sane and safe way to use an external library in an optional way would be to create some kind of redis plugin which dynamically links at build-time against libsystemd.so, and then redis can safely use dlopen on its own plugin, because its ABI is guaranteed to match when built together, and the libsystemd ABI is respected because we are always linking against a versioned shared library.

@ulidtko
Copy link
Contributor

ulidtko commented May 28, 2019

Hey @guillemj,

have you considered sending EXTEND_TIMEOUT_USEC= notifications?

Ah yes, please see #6052, it's a re-make of this PR. Also, your comment on dlopen("libsystemd.so") has a point; could you please carry it over to #6052, and elaborate how to do this properly?..


@jtru see? This is the reason I'd use git push -f to my own fork-branch, instead of re-submitting another "v2" PR. Just to avoid fragmenting the discussion, like it starts to happen now. "The easy way out" starts to backfire :)

@jtru
Copy link
Contributor Author

jtru commented May 28, 2019

Please look at #6052 instead. I'm sorry for the confusion caused :/

@jtru jtru closed this May 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Systemd readiness signalling is implemented incorrectly