Skip to content

Commit

Permalink
Automatically send status notifications to systemd (#1029)
Browse files Browse the repository at this point in the history
* First pass on systemd integration

This adds a somewhat naive `SystemdService` class that will send appropriate notifications to systemd or do nothing if it doesn't detect that the process is being run by systemd. It pretty much just notifies that the process is ready, is stopping, and actively notifies the watchdog while it's running. It doesn't reach into the actual job system to check that things are OK, or hook into ActiveSupport notifications to tell systemd about other events (like restarting/reloading).

Fixes #1027.

* Log using notifications

* Add some docs

* Handle graceful shutdown notifications better

* Update Sorbet typing information

* Systemd errors should probably always log

* Add tests

* Add example systemd configuration file

* Make Sorbet happy

Stubbed constants are very not cool as far as Sorbet is concerned, so the way I wrote these tests before broke the linter. OTOH, trying to figure out how to make this acceptable did get some slightly nicer test setup.

* Skip socket tests on JRuby

* Move vendored sd_notify into lib/

* Add note about test skipping

It's good to know why these are skipped so we can stop skipping them in the future if the JRuby issue is fixed (or someone figures out a workaround).
  • Loading branch information
Mr0grog committed Aug 6, 2023
1 parent 7f3b4de commit f879537
Show file tree
Hide file tree
Showing 10 changed files with 449 additions and 3 deletions.
3 changes: 2 additions & 1 deletion .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,11 @@ AllCops:
- pkg/**/*
- spec/test_app/**/*
- tmp/**/*
- vendor/**/*
- scripts/**/*
- gemfiles/**/*
- Brewfile
# Vendored dependencies
- lib/good_job/sd_notify.rb
NewCops: enable

Gemspec/DevelopmentDependencies:
Expand Down
92 changes: 92 additions & 0 deletions examples/systemd/goodjob.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# This is an example systemd service configuration that keeps GoodJob running.
#
# Customize this file based on your bundler location, app directory, etc.
#
# TO RUN AS A USER SERVICE...
# Customize and copy this file to ~/.config/systemd/user/goodjob.service
# Then run:
# - systemctl --user enable goodjob
# - systemctl --user {start,stop,restart,status} goodjob
# Also you might want to run:
# - loginctl enable-linger username
# So that the service is not killed when the user logs out.
#
# TO RUN AS A SYSTEM SERVICE...
# Customize and copy this to:
# (on CentOS) /usr/lib/systemd/system/goodjob.service
# (on Ubuntu) /lib/systemd/system/goodjob.service
# Then run (you may need to use `sudo`):
# - systemctl enable goodjob
# - systemctl {start,stop,restart,status} goodjob
#
# This file corresponds to a single GoodJob process. Add multiple copies
# to run multiple processes (goodjob-1, goodjob-2, etc).
#
# Use `journalctl --unit goodjob -rn 100` to view the last 100 log lines.
# Or `journalctl --unit goodjob --follow` to view live log output.
#
[Unit]
Description=GoodJob Background Job Processor
# Start only once the network is available.
# If running Postgres locally and it's also managed by systemd, consider adding
# `postgresql.service` (this list is space-separated).
After=network.target

# See these pages for lots of options:
#
# https://www.freedesktop.org/software/systemd/man/systemd.service.html
# https://www.freedesktop.org/software/systemd/man/systemd.exec.html
#
# THOSE PAGES ARE CRITICAL FOR ANY LINUX DEVOPS WORK; read them multiple
# times! systemd is a critical tool for all developers to know and understand.
#
[Service]
# Type=notify is supported as of GoodJob v3.17.0. In earlier versions, use
# Type=simple and remove the WatchdogSec line.
Type=notify
# If systemd doesn't get pinged by GoodJob at least this often, restart GoodJob.
WatchdogSec=5s

WorkingDirectory=<PATH_TO_YOUR_RAILS_APP>
# The actual command to run.
# If you use the system's ruby:
ExecStart=/usr/local/bin/bundle exec good_job start
# If you use rbenv:
# ExecStart=/bin/bash -lc 'exec /home/<USERNAME>/.rbenv/shims/bundle exec good_job start'
# If you use rvm in production without gemset and your ruby version is 2.6.5
# ExecStart=/home/<USERNAME>/.rvm/gems/ruby-2.6.5/wrappers/bundle exec good_job start
# If you use rvm in production with gemset and your ruby version is 2.6.5
# ExecStart=/home/<USERNAME>/.rvm/gems/ruby-2.6.5@gemset-name/wrappers/bundle exec good_job start
# If you use rvm in production with gemset and ruby version/gemset is specified in .ruby-version,
# .ruby-gemsetor or .rvmrc file in the working directory:
# ExecStart=/home/<USERNAME>/.rvm/bin/rvm in <PATH_TO_YOUR_RAILS_APP> do bundle exec good_job start

# Uncomment this if you are going to use this as a system service
# if using as a user service then leave commented out, or you will get an error trying to start the service
# !!! Change this to your deploy user account if you are using this as a system service !!!
# User=<USERNAME>
# Group=<USERGROUP>
# UMask=0002

# Set any environment variables your application needs, one `Environment=X` line
# per environment variable.
Environment=RAILS_ENV=production
# Greatly reduce Ruby memory fragmentation and heap usage:
# https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
Environment=MALLOC_ARENA_MAX=2

# If GoodJob crashes, restart after a short delay:
RestartSec=1s
Restart=always

# Send output to the systemd journal. You can view it with:
# journalctl --unit goodjob
# To send output to a file, set the path here instead.
StandardOutput=journal
StandardError=journal

# This will default to "bundler" if we don't specify it.
SyslogIdentifier=goodjob

[Install]
WantedBy=multi-user.target
1 change: 1 addition & 0 deletions lib/good_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
require "good_job/probe_server"
require "good_job/scheduler"
require "good_job/shared_executor"
require "good_job/systemd_service"

# GoodJob is a multithreaded, Postgres-based, ActiveJob backend for Ruby on Rails.
#
Expand Down
8 changes: 6 additions & 2 deletions lib/good_job/cli.rb
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,12 @@ def start
GoodJob.configuration.options.merge!(options.symbolize_keys)
configuration = GoodJob.configuration
capsule = GoodJob.capsule
systemd = GoodJob::SystemdService.new

Daemon.new(pidfile: configuration.pidfile).daemonize if configuration.daemonize?

capsule.start
systemd.start

if configuration.probe_port
probe_server = GoodJob::ProbeServer.new(port: configuration.probe_port)
Expand All @@ -114,8 +116,10 @@ def start
break if @stop_good_job_executable || capsule.shutdown?
end

capsule.shutdown(timeout: configuration.shutdown_timeout)
probe_server&.stop
systemd.stop do
capsule.shutdown(timeout: configuration.shutdown_timeout)
probe_server&.stop
end
end

default_task :start
Expand Down
18 changes: 18 additions & 0 deletions lib/good_job/log_subscriber.rb
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,24 @@ def cleanup_preserved_jobs(event)
end
end

# @!macro notification_responder
def systemd_watchdog_start(event)
interval = event.payload[:interval]

info do
"Pinging systemd watchdog every #{interval.round(1)} seconds"
end
end

# @!macro notification_responder
def systemd_watchdog_error(event)
exception = event.payload[:error]

error do
"Error pinging systemd: #{exception.class}: #{exception}\n #{exception.backtrace}"
end
end

# @!endgroup

# Get the logger associated with this {LogSubscriber} instance.
Expand Down
157 changes: 157 additions & 0 deletions lib/good_job/sd_notify.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# frozen_string_literal: true

# This is a copy of https://github.com/agis/ruby-sdnotify as of v0.1.1
# (commit 21240f1)
# Any changes have been marked with "FORK:" comments.
#
# It is included here because it is a very small gem, and doing so reduces
# the number of dependencies and the supply chain risks they pose.

# FORK: nest SdNotify inside the GoodJob module to prevent name collisions in
# case a GoodJob user also uses the actual sd_notify gem.
module GoodJob

# The MIT License
#
# Copyright (c) 2017, 2018, 2019, 2020 Agis Anastasopoulos
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

require "socket"

# SdNotify is a pure-Ruby implementation of sd_notify(3). It can be used to
# notify systemd about state changes. Methods of this package are no-op on
# non-systemd systems (eg. Darwin).
#
# The API maps closely to the original implementation of sd_notify(3),
# therefore be sure to check the official man pages prior to using SdNotify.
#
# @see https://www.freedesktop.org/software/systemd/man/sd_notify.html
module SdNotify
# Exception raised when there's an error writing to the notification socket
class NotifyError < RuntimeError; end

READY = "READY=1"
RELOADING = "RELOADING=1"
STOPPING = "STOPPING=1"
STATUS = "STATUS="
ERRNO = "ERRNO="
MAINPID = "MAINPID="
WATCHDOG = "WATCHDOG=1"
FDSTORE = "FDSTORE=1"

def self.ready(unset_env=false)
notify(READY, unset_env)
end

def self.reloading(unset_env=false)
notify(RELOADING, unset_env)
end

def self.stopping(unset_env=false)
notify(STOPPING, unset_env)
end

# @param status [String] a custom status string that describes the current
# state of the service
def self.status(status, unset_env=false)
notify("#{STATUS}#{status}", unset_env)
end

# @param errno [Integer]
def self.errno(errno, unset_env=false)
notify("#{ERRNO}#{errno}", unset_env)
end

# @param pid [Integer]
def self.mainpid(pid, unset_env=false)
notify("#{MAINPID}#{pid}", unset_env)
end

def self.watchdog(unset_env=false)
notify(WATCHDOG, unset_env)
end

def self.fdstore(unset_env=false)
notify(FDSTORE, unset_env)
end

# @param [Boolean] true if the service manager expects watchdog keep-alive
# notification messages to be sent from this process.
#
# If the $WATCHDOG_USEC environment variable is set,
# and the $WATCHDOG_PID variable is unset or set to the PID of the current
# process
#
# @note Unlike sd_watchdog_enabled(3), this method does not mutate the
# environment.
def self.watchdog?
wd_usec = ENV["WATCHDOG_USEC"]
wd_pid = ENV["WATCHDOG_PID"]

return false if !wd_usec

begin
wd_usec = Integer(wd_usec)
rescue
return false
end

return false if wd_usec <= 0
return true if !wd_pid || wd_pid == $$.to_s

false
end

# Notify systemd with the provided state, via the notification socket, if
# any.
#
# Generally this method will be used indirectly through the other methods
# of the library.
#
# @param state [String]
# @param unset_env [Boolean]
#
# @return [Fixnum, nil] the number of bytes written to the notification
# socket or nil if there was no socket to report to (eg. the program wasn't
# started by systemd)
#
# @raise [NotifyError] if there was an error communicating with the systemd
# socket
#
# @see https://www.freedesktop.org/software/systemd/man/sd_notify.html
def self.notify(state, unset_env=false)
sock = ENV["NOTIFY_SOCKET"]

return nil if !sock

ENV.delete("NOTIFY_SOCKET") if unset_env

begin
Addrinfo.unix(sock, :DGRAM).connect do |s|
s.close_on_exec = true
s.write(state)
end
rescue StandardError => e
raise NotifyError, "#{e.class}: #{e.message}", e.backtrace
end
end
end

# FORK: Finish nesting inside GoodJob.
end
69 changes: 69 additions & 0 deletions lib/good_job/systemd_service.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# frozen_string_literal: true

require 'concurrent/timer_task'
require 'good_job/sd_notify'

module GoodJob # :nodoc:
#
# Manages communication with systemd to notify it about the status of the
# GoodJob CLI. If it doesn't look like systemd is controlling the process,
# SystemdService doesn't do anything.
#
class SystemdService
def self.task_observer(_time, _output, thread_error) # :nodoc:
return if thread_error.is_a? Concurrent::CancelledOperationError

ActiveSupport::Notifications.instrument("systemd_watchdog_error.good_job", { error: thread_error })
GoodJob._on_thread_error(thread_error) if thread_error
end

# Indicates whether the service is actively notifying systemd's watchdog.
def notifying?
@watchdog&.running? || false
end

# Notify systemd that the process is ready. If the service is configured in
# systemd to use the watchdog, this will also start pinging the watchdog.
def start
GoodJob::SdNotify.ready
run_watchdog
end

# Notify systemd that the process is stopping and stop pinging the watchdog
# if currently doing so. If given a block, it will wait for the block to
# complete before stopping watchdog notifications, so systemd has a clear
# indication when graceful shutdown started and finished.
def stop
GoodJob::SdNotify.stopping

yield if block_given?

@watchdog&.kill
@watchdog&.wait_for_termination
end

private

def run_watchdog
return false unless GoodJob::SdNotify.watchdog?

# Systemd recommends pinging the watchdog at half the configured interval:
# https://www.freedesktop.org/software/systemd/man/sd_watchdog_enabled.html
interval = watchdog_interval / 2

ActiveSupport::Notifications.instrument("systemd_watchdog_start.good_job", { interval: interval })
@watchdog = Concurrent::TimerTask.execute(execution_interval: interval) do
GoodJob::SdNotify.watchdog
end
@watchdog.add_observer(self.class, :task_observer)

true
end

def watchdog_interval
return 0.0 unless GoodJob::SdNotify.watchdog?

Integer(ENV.fetch('WATCHDOG_USEC')) / 1_000_000.0
end
end
end
Loading

0 comments on commit f879537

Please sign in to comment.