Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StartUnixWorker(): don't exit() on fork() failure #8427

Merged
merged 1 commit into from Jan 14, 2021
Merged

Conversation

Al2Klimov
Copy link
Member

... but let the caller handle the failure.

Not to stop working completely just because of fork() failure during a reload.

@Al2Klimov Al2Klimov added bug Something isn't working area/cli Command line helpers labels Oct 30, 2020
@Al2Klimov Al2Klimov added this to To review in v2.13.0 merge window via automation Oct 30, 2020
@icinga-probot icinga-probot bot added this to the 2.13.0 milestone Oct 30, 2020
Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this case happend, what's the state of the old worker process? Won't this already have ceased operation then and leave the umbrella process in a running state, but nothing is happening anymore? Will the fork be retried automatically?

lib/cli/daemoncommand.cpp Show resolved Hide resolved
v2.13.0 merge window automation moved this from To review to Changes requested Jan 8, 2021
@Al2Klimov
Copy link
Member Author

  1. Still running.
  2. No.
  3. No.

Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now that I understand more of the code around this, this PR makes more sense to me.

This changes the meaning of the return value -1, so this should be adjusted as well:

* @return The worker's PID on success, -1 on failure (if the worker couldn't load its config)

pid_t nextWorker = StartUnixWorker(configs);
if (nextWorker == -1) {
Log(LogCritical, "Application", "Found error in config: reloading aborted");

... but let the caller handle the failure.

Not to stop working completely just because of fork() failure during a reload.
@Al2Klimov
Copy link
Member Author

➜  icinga2 git:(bugfix/fork-exit) prefix/sbin/icinga2 daemon -c tz.cpp
[2021-01-14 14:17:24 +0100] information/cli: Icinga application loader (version: v2.12.0-427-g931b9307a)
[2021-01-14 14:17:24 +0100] information/cli: Loading configuration file(s).
[2021-01-14 14:17:24 +0100] critical/config: Error: syntax error, unexpected T_IDENTIFIER
Location: in tz.cpp: 4:5-4:8
tz.cpp(2): #include <iostream>
tz.cpp(3):
tz.cpp(4): int main() {
               ^^^^
tz.cpp(5):  std::cout << boost::locale::time_zone::global() << std::endl;
tz.cpp(6): }
[2021-01-14 14:17:24 +0100] critical/cli: Config validation failed. Re-run with 'icinga2 daemon -C' after fixing the config.
➜  icinga2 git:(bugfix/fork-exit) touch 8427.conf; (sleep 10; rm 8427.conf; kill -HUP `cat prefix/var/run/icinga2/*.pid`) & prefix/sbin/icinga2 daemon -c 8427.conf
[2] 39212
[2021-01-14 14:19:45 +0100] information/cli: Icinga application loader (version: v2.12.0-427-g931b9307a)
[2021-01-14 14:19:45 +0100] information/cli: Loading configuration file(s).
[2021-01-14 14:19:45 +0100] information/ConfigItem: Committing config item(s).
[2021-01-14 14:19:45 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-01-14 14:19:45 +0100] information/ScriptGlobal: Dumping variables to file '/Users/aklimov/NET/WS/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2021-01-14 14:19:45 +0100] information/ConfigObject: Restoring program state from file '/Users/aklimov/NET/WS/icinga2/prefix/var/lib/icinga2/icinga2.state'
[2021-01-14 14:19:45 +0100] information/ConfigObject: Restored 281 objects. Loaded 0 new objects without state.
[2021-01-14 14:19:45 +0100] information/ConfigItem: Triggering Start signal for config items
[2021-01-14 14:19:45 +0100] information/ConfigItem: Activated all objects.
[2021-01-14 14:19:50 +0100] information/Application: Got reload command: Starting new instance.
[2]  - 39212 done       ( sleep 10; rm 8427.conf; kill -HUP `cat prefix/var/run/icinga2/*.pid`; )
[2021-01-14 14:19:50 +0100] information/cli: Loading configuration file(s).
[2021-01-14 14:19:50 +0100] critical/cli: Could not compile config files: Error: Function call 'std::ifstream::open' for file '8427.conf' failed with error code 2, 'No such file or directory'

	(0) Compiling configuration file '8427.conf'
[2021-01-14 14:19:50 +0100] critical/Application: Found error in config: reloading aborted
^C[2021-01-14 14:20:02 +0100] information/Application: Received request to shut down.
[2021-01-14 14:20:03 +0100] information/Application: Shutting down...
[2021-01-14 14:20:03 +0100] information/ConfigObject: Dumping program state to file '/Users/aklimov/NET/WS/icinga2/prefix/var/lib/icinga2/icinga2.state'
[2021-01-14 14:20:03 +0100] information/IcingaApplication: Icinga has shut down.
➜  icinga2 git:(bugfix/fork-exit) touch 8427.conf; (sleep 10; kill -HUP `cat prefix/var/run/icinga2/*.pid`) & prefix/sbin/icinga2 daemon -c 8427.conf
[2] 39249
[2021-01-14 14:20:41 +0100] information/cli: Icinga application loader (version: v2.12.0-427-g931b9307a)
[2021-01-14 14:20:41 +0100] information/cli: Loading configuration file(s).
[2021-01-14 14:20:41 +0100] information/ConfigItem: Committing config item(s).
[2021-01-14 14:20:41 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-01-14 14:20:41 +0100] information/ScriptGlobal: Dumping variables to file '/Users/aklimov/NET/WS/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2021-01-14 14:20:41 +0100] information/ConfigObject: Restoring program state from file '/Users/aklimov/NET/WS/icinga2/prefix/var/lib/icinga2/icinga2.state'
[2021-01-14 14:20:41 +0100] information/ConfigObject: Restored 2 objects. Loaded 0 new objects without state.
[2021-01-14 14:20:41 +0100] information/ConfigItem: Triggering Start signal for config items
[2021-01-14 14:20:41 +0100] information/ConfigItem: Activated all objects.
[2021-01-14 14:20:46 +0100] information/Application: Got reload command: Starting new instance.
[2]  - 39249 done       ( sleep 10; kill -HUP `cat prefix/var/run/icinga2/*.pid`; )
[2021-01-14 14:20:46 +0100] information/cli: Loading configuration file(s).
[2021-01-14 14:20:46 +0100] information/ConfigItem: Committing config item(s).
[2021-01-14 14:20:46 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-01-14 14:20:46 +0100] information/ScriptGlobal: Dumping variables to file '/Users/aklimov/NET/WS/icinga2/prefix/var/cache/icinga2/icinga2.vars'
[2021-01-14 14:20:46 +0100] information/Application: Reload done, old process shutting down. Child process with PID '39271' is taking over.
[2021-01-14 14:20:46 +0100] information/Application: Received request to shut down.
[2021-01-14 14:20:46 +0100] information/Application: Shutting down...
[2021-01-14 14:20:46 +0100] information/ConfigObject: Dumping program state to file '/Users/aklimov/NET/WS/icinga2/prefix/var/lib/icinga2/icinga2.state'
[2021-01-14 14:20:46 +0100] information/IcingaApplication: Icinga has shut down.
[2021-01-14 14:20:46 +0100] information/ConfigObject: Restoring program state from file '/Users/aklimov/NET/WS/icinga2/prefix/var/lib/icinga2/icinga2.state'
[2021-01-14 14:20:46 +0100] information/ConfigObject: Restored 2 objects. Loaded 0 new objects without state.
[2021-01-14 14:20:46 +0100] information/ConfigItem: Triggering Start signal for config items
[2021-01-14 14:20:46 +0100] information/ConfigItem: Activated all objects.
^C[2021-01-14 14:20:58 +0100] information/Application: Received request to shut down.
[2021-01-14 14:20:59 +0100] information/Application: Shutting down...
[2021-01-14 14:20:59 +0100] information/ConfigObject: Dumping program state to file '/Users/aklimov/NET/WS/icinga2/prefix/var/lib/icinga2/icinga2.state'
[2021-01-14 14:20:59 +0100] information/IcingaApplication: Icinga has shut down.
➜  icinga2 git:(bugfix/fork-exit)

@Al2Klimov Al2Klimov marked this pull request as ready for review January 14, 2021 13:21
v2.13.0 merge window automation moved this from Changes requested to Approved Jan 14, 2021
@Al2Klimov Al2Klimov merged commit d82e498 into master Jan 14, 2021
v2.13.0 merge window automation moved this from Approved to Merged Jan 14, 2021
@icinga-probot icinga-probot bot deleted the bugfix/fork-exit branch January 14, 2021 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli Command line helpers bug Something isn't working
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants