New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State file not updated on reload #6689

Closed
west0rmann opened this Issue Oct 16, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@west0rmann

west0rmann commented Oct 16, 2018

Expected Behavior

Save current state to /var/lib/icinga2/icinga2.state after running "systemctl reload icinga2"

Current Behavior

State file will be persisted every 5 minutes, on restart and on shutdown - but not on reload.

Possible Solution

This was the reload behaviour in Icinga 2.8.4 (from debug log):

[2018-10-15 14:46:04 +0200] information/Application: Got reload command: Starting new instance.
[2018-10-15 14:46:06 +0200] information/Application: Received request to shut down.
[2018-10-15 14:46:07 +0200] information/Application: Shutting down...

This is the current reload behaviour in Icinga 2.10.1:

[2018-10-15 14:45:12 +0200] information/Application: Got reload command: Starting new instance.
[2018-10-15 14:45:12 +0200] information/Application: Reload requested, letting new process take over.

There is a new method "void Application::SigUsr2Handler(int)" in base/application.cpp that does not save the state file. If I add the shutdown code from the main loop, the state file is saved again:

void Application::SigUsr2Handler(int)
{
	Log(LogInformation, "Application", "Reload requested, letting new process take over.");
#ifdef HAVE_SYSTEMD
	sd_notifyf(0, "MAINPID=%lu", (unsigned long) m_ReloadProcess);
#endif /* HAVE_SYSTEMD */

	/* Write the PID of the new process to the pidfile before this
	 * process exits to keep systemd happy.
	 */
	Application::Ptr instance = GetInstance();
	try {
		instance->UpdatePidFile(Configuration::PidPath, m_ReloadProcess);
	} catch (const std::exception&) {
		/* abort restart */
		Log(LogCritical, "Application", "Cannot update PID file. Aborting restart operation.");
		return;
	}

	instance->ClosePidFile(false);

+	ConfigObject::StopObjects();
+	Application::GetInstance()->OnShutdown();

	Exit(0);
}

Steps to Reproduce (for bugs)

  1. acknowledge a service in a problem state
  2. perform an reload after this operation
  3. the service is not acknowledged anymore

Your Environment

Icinga2 2.10.0 on Oracle Linux 7.5

@dnsmichi

This comment has been minimized.

Member

dnsmichi commented Oct 16, 2018

Awesome catch, thanks a lot 👍 ❤️ Can you provide me with your name/mail address, I'd like to credit you as commit author :)

@dnsmichi dnsmichi self-assigned this Oct 16, 2018

@dnsmichi dnsmichi added this to the 2.10.1 milestone Oct 16, 2018

@dnsmichi

This comment has been minimized.

Member

dnsmichi commented Oct 16, 2018

Tests

Further steps to reproduce:

  • Use a slightly large configuration which increases the reload time to at least 10s:
    many.conf.zip

  • Terminal 1 watches the state file

  • Terminal 2 sends a modify attribute POST Request against an existing host object, followed by a kill HUP signal.

  • Terminal 3 watches the host object with GET in all states.

Terminal 1: State file

watch -n 1 -c 'ls -la icinga2.state'

Terminal 2: Modify attribute and reload

date; curl -k -s -u root:icinga -X POST -H 'Accept: application/json' 'https://localhost:5665/v1/objects/hosts/mbmif.int.netways.de' -d '{ "attrs": { "vars.reload_test": 1 } }' ; kill -HUP $(pidof icinga2); date

Terminal 3: Watch modified attribute

watch -n 1 -c 'curl -k -s -u root:icinga https://localhost:5665/v1/objects/hosts/mbmif.int.netways.de?attrs=vars'

Current behaviour

  • Icinga 2 is running, prepare test case.

screen shot 2018-10-16 at 11 09 14

  • Modified attribute and reload triggered

screen shot 2018-10-16 at 11 09 27

  • Reload in progress

screen shot 2018-10-16 at 11 09 42

  • Reload done, modified attribute gone

screen shot 2018-10-16 at 11 10 09

Fix in place

  • Prepare test case from running patched Icinga 2 instance

screen shot 2018-10-16 at 11 41 17

  • Modify the attribute and trigger the reload

screen shot 2018-10-16 at 11 41 28

  • Reload in progress, the state file has been updated.

screen shot 2018-10-16 at 11 41 45

  • Reload done, modified attribute is read and still there.

screen shot 2018-10-16 at 11 43 16

  • With the new log line, you'll see when the reload is finished.

screen shot 2018-10-16 at 11 43 44

Confirmed working.

dnsmichi added a commit that referenced this issue Oct 16, 2018

Add missing shutdown/program state dumps for SIGUSR2 reload handler
Credits to @west0rmann finding the issue and providing the initial fix.

fixes #6689
fixes #6592

dnsmichi added a commit that referenced this issue Oct 16, 2018

Add missing shutdown/program state dumps for SIGUSR2 reload handler
Credits to @west0rmann finding the issue and providing the initial fix.

fixes #6689
fixes #6592
@kevinstiller

This comment has been minimized.

kevinstiller commented Oct 16, 2018

Can confirm that acks disappear after I reload shortly after the ack. How can I help? 710 Hosts, 10K Services.

@dnsmichi

This comment has been minimized.

Member

dnsmichi commented Oct 16, 2018

@kevinstiller test the snapshot packages, please :)

@kevinstiller

This comment has been minimized.

kevinstiller commented Oct 16, 2018

idk how to change :-X

btw: my Version is r2.9.1-1, at least the icinga cluster check returns:
Icinga 2 has been running for 1 day, 9 hours, 6 minutes and 44 seconds. Version: r2.9.1-1

Is that possible?

Im at home an can't verify on cli :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment