Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

[dev.icinga.com #13255] Deprecate cluster/client mode "bottom up" w/ repository.d and node update-config #4798

Closed
icinga-migration opened this Issue Nov 18, 2016 · 23 comments

Comments

Projects
None yet
3 participants
Member

icinga-migration commented Nov 18, 2016

This issue has been migrated from Redmine: https://dev.icinga.com/issues/13255

Created by mfriedrich on 2016-11-18 17:08:40 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2016-11-23 14:35:03 +00:00)
Target Version: 2.6.0
Last Update: 2016-12-13 07:28:19 +00:00 (in Redmine)

Backport?: Not yet backported
Include in Changelog: 1

This isn't a short term decision but has been discussed for many months now. This issue is a working task to

  1. Announce and explain the change
  2. Provide technical insights on the problems
  3. Provide migration examples

Deprecation and removal

v2.6 will add a deprecation warning.

After that we are planning to support the feature for at least one year, or two major versions in 2017. This early announcement and the final EOL should provide enough time for anyone taking care of possible changes.

Note: Removing the bottom-up synchronisation will not disable the cluster communication between client and master nodes. Client nodes will still attempt to send new check results to master nodes, and they accept such if the config objects exist (generated with node update-config before). Removal of node update-config and the repository just disables the built-in bottom up object sync.
That way Icinga 2 will continue to run and monitor your environment. You are still advised to migrate to one of the "top down" modes.

A short summary of why and what

Things in Icinga 2 should be simple to start with. You don't need configuration tricks, there should be just one way to make it right.

When we designed and implemented the first client setup more than two years ago, the bottom up method sounded reasonable. You'll install the icinga2 package on the client, have the example configuration in conf.d/ already. The user is able to extend that information. Syncing that information to the parent node was more or less syncing a list of objects available on the client to the master.

Community and team members agreed on this mode during design and implementation in early 2014. The months were approaching the v2.2 release. It hasn't been easy filling all the requirements whilst putting a lot of workhours into actually being able to release at that fixed date in late 2014. We were trying to get v2.2 into Debian Jessie stable but that didn't work out.
At some point the bottom up approach wasn't enough. In the late stages of preparing the v2.2 package a simple "execute a check command at another endpoint" was demanded. That was the time when an unplanned feature called "command endpoint" was designed in a short amount of time and also getting integrated into the v2.2 release.

Looking back the design for an agent/client did not work out well. We have 2 different modes, and in addition to that the cluster config sync can be used too. So actually there are 3 different methods (config sync, command endpoint, node update-config). You can even combine config sync and command endpoint, to get the best out of it. The latter is common best practice from customer setups and community environments. We learned that while supporting our colleagues and fellow community members.

One problem which also came up - the documentation was targeting the "bottom up" approach in the first place. It described the client setup using cli commands. In addition to that we had the entire "cluster" chapter which described the cluster config sync, HA capabilities and even more. All in all, the documentation was a confusing mess. Users who started with the icinga 2 client just read the first chapter ("bottom up") and then ran into many of the current unresolved issues.

It just wasn't clear that you can use a different mode. Putting that into a documentation chapter isn't easy after all. For v2.5 we decided to entirely purge the existing documentation on the client and cluster setup and configuration and rewrote them from scratch. Changing the preferred mode priority, and also adding more details including pictures. The scenarios which are described in the documentation are coming from actual production environments and have proven themselves.

Though fixing the documentation again made it clear - it is overly complicated to understand the configuration modes, apart from some other configuration tasks (zones, endpoints, and such).

Common problems with "bottom up"

#9332 There aren't any object attributes synced from the child to the parent. This is essentially bad if you want to configure for example notification apply rules on the master based on group membership or custom attributes. The main design goal was not to expose any sensitive security information from the child to the parent. There's an issue without solution still pending.

#11121 node update-config uses a changelog repository which requires dumping changes there, and once that's ready, a final commit will turn that into actual configuration objects underneath the "repository.d" directory. While it was planned to have sort of staging, the only outcome was an overly complex implementation for dumping configuration from a json file containing host/service object names. It also caused problems with a large number of child hosts parsing the repository files synced from clients and then generating a changelog and then committing them.

#13018 Furthermore it exposes a somewhat experimental config api on the cli - "icinga2 repository". That one lacks of array/dictionary support and many other troubles. We decided to not maintain this cli command a while back. An official deprecation announcement was about to come as well.

#7632 Another noticeable impact on the cluster communication between endpoints is the requirement of syncing the client's host/service objects to the parent node. This involves sending a json blob, writing it into a file on the master. Such messages aren't required if you are using the "top down" approach, and might affect performance and I/O. 

#10054 node update-config and the bottom up mode were originally designed as sort of "auto discovery" for clients. That involves that everyone is managing the configuration on the client itself. That sounds cool if you are thinking about the addition of config management tools such as Puppet, Ansible, etc. Modern environments should definitely have them - but what if you do not have such? The maintenance tasks on each client multiplied by the number of hosts isn't something admins prefer these days. 

Another problem is the asynchronous repository sync bottom up - after the Puppet agent deployed the configuration, it cannot trigger a "node update-config" run on the master immediately (it must wait a while). Turns out, our managed service team tried to implement it that way and finally gave up on that. Lessons learned the hard way. And even though - others kept adding a cronjob in addition to config management to sort of automate the config updates pulled from the client.

#9964 Organising blacklists and whitelists on the master by zone, host, service patterns isn't easy and got some bugs. It's only possible to manage such on the CLI, there are no configuration items involved. Such blacklists are necessary if you don't want to monitor for example the "load" check for all systems. If don't specify any, "node update-config" will generate config objects for every host and service received from the client.

#10980 Multiple cluster levels aren't possible. There is no proxy mode for a satellite which would itself copy over the client's repository up to the master. The larger your environment gets the more interesting a master/satellite/client setup becomes. That is something you probably cannot plan beforehand, or those adding input didn't think of such a scenario either. 

#8213 HA master setups are not supported either.

Generally speaking it causes more problems and bugs whilst not fitting into the Icinga stack with a central master with Icinga Web 2, Icinga 2 API and Icinga Director. From a future maintainer perspective we're focussing on one solution.

Migration

Hint: More details are updated in the documentation: https://docs.icinga.com/icinga2/snapshot/doc/module/icinga2/toc\#!/icinga2/snapshot/doc/module/icinga2/chapter/distributed-monitoring#distributed-monitoring-bottom-up-migration-top-down

The bottom up mode generates configuration on the master. This import depends on user interaction, so unless you are changing something on the client, the configuration files in "repository.d" remain untouched.

"repository.d" is organised as a tree of object types, which you can easily extract.

A best practice strategy:

  1. Decide whether you want to sync the configuration top down to the client again, or switching to command_endpoint execution on the client.
  2. Extract Zone and Endpoint objects first.
  3. Go for Host and Service objects then.
  4. Remove the "zone" attribute from host and service objects.
  5. Proceed with the 2 possible modes for your likings

Command Endpoint Execution

  1. Assuming that the master zone is called "master", create a new directory called /etc/icinga2/zones.d/master
  2. Put the Zone, Endpoint and Host object information into that directory, e.g. .conf as file
  3. Either add the service objects directly into the .conf host config or ...
  4. Identify common service objects and create static apply rules

Once done, you also need to define the command_endpoint. The easiest way is using the existing naming schema generated from repository.d

object Zone "icinga2-client1.localdomain" {
  endpoints = [ "icinga2-client1.localdomain" ]
  parent = "master"
}

object Endpoint "icinga2-client1.localdomain" {
  //...
}

object Host "icinga2-client1.localdomain" {
  //define custom attributes to identify e.g. the os
  vars.os_type = "Linux"
  vars.os_desc = "Ubuntu 16.04 LTS"

  //define that this host is using command endpoint, take the same name as the endpoint (FQDN)
  vars.client_endpoint = name
}

vim /etc/icinga2/zones.d/master/services.conf


apply Service "load" {
  check_command = "load"

  command_endpoint = host.vars.client_endpoint
  //...

  assign where host.vars.client_endpoint //only for command endpoint hosts
}

Note: If you are using your own local custom commands you need to manually copy them to the master node. Therefore create a global zone and put the commands.conf underneath zones.d/. Ensure that all clients have the global zone configured - and "conf.d" is disabled in the icinga2.conf file.

Configuration Sync

This is a bit more tricky as you need to exclude "conf.d" from all your clients icinga2.conf config file and then restart them. This will "flush" the client objects and allow the master to synchronise the same configuration objects.

Note: Use this mode only if you are planning to use your clients as satellites with local check execution and replay log on connection loss. For simple migration, you should prefer the command execution mode.

On the client:

vim /etc/icinga2/icinga2.conf

//include_recursive "conf.d"

systemctl restart icinga2

Then proceed:

  1. Assuming that the master zone is called "master", create a new directory called /etc/icinga2/zones.d/master
  2. Put the Zone and Endpoint object information into that directory (or into zones.conf)
  3. Create a new directory for each client's name (FQDN) underneath /etc/icinga2/zones.d for example /etc/icinga2/zones.d/icinga2-client1.localdomain (you can follow the documentation here)
  4. Put the Host and Service objects into this directory (make sure to have the "zone" attribute removed from all objects!)

If you prefer to identify common service objects, leave them out and a) configure a global template zone b) put all service apply rules into this global template zone. Note: The client must have the global zone configured.

Changesets

2016-11-23 14:33:28 +00:00 by mfriedrich dc29924

Deprecate the client 'bottom up' mode w/ node update-config

This includes deprecation warnings and migration documentation.

fixes #13255

Relations:

Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:10:40 +00:00

  • Relates set to 13257
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:10:45 +00:00

  • Relates set to 9332
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:10:54 +00:00

  • Relates set to 7632
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:01 +00:00

  • Relates set to 12808
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:07 +00:00

  • Relates set to 11121
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:12 +00:00

  • Relates set to 10980
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:18 +00:00

  • Relates set to 10054
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:24 +00:00

  • Relates set to 9964
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:41 +00:00

  • Relates set to 8213
Member

icinga-migration commented Nov 18, 2016

Updated by mfriedrich on 2016-11-18 17:11:50 +00:00

  • Relates set to 13018
Member

icinga-migration commented Nov 23, 2016

Updated by mfriedrich on 2016-11-23 14:34:38 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
Member

icinga-migration commented Nov 23, 2016

Updated by mfriedrich on 2016-11-23 14:35:03 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset dc29924.

Member

icinga-migration commented Nov 23, 2016

Updated by mfriedrich on 2016-11-23 14:53:04 +00:00

  • Relates set to 9267
Member

icinga-migration commented Nov 23, 2016

Updated by mfriedrich on 2016-11-23 15:19:09 +00:00

  • Description updated
  • Priority changed from Low to Normal
  • Start Date set to 2016
  • Done % changed from 0 to 100
Member

icinga-migration commented Dec 10, 2016

Updated by mfriedrich on 2016-12-10 15:17:21 +00:00

  • Relates set to 10411
Member

icinga-migration commented Dec 13, 2016

Updated by gbeutner on 2016-12-13 07:28:19 +00:00

  • Subject changed from Deprecation of cluster/client mode "bottom up" w/ repository.d and node update-config to Deprecate cluster/client mode "bottom up" w/ repository.d and node update-config
Member

icinga-migration commented Jan 14, 2017

Updated by mfriedrich on 2017-01-14 13:06:42 +00:00

  • Relates deleted 9267
Member

icinga-migration commented Jan 14, 2017

Updated by mfriedrich on 2017-01-14 13:08:09 +00:00

  • Relates deleted 9332
Member

icinga-migration commented Jan 14, 2017

Updated by mfriedrich on 2017-01-14 13:08:47 +00:00

  • Relates deleted 10054
Member

icinga-migration commented Jan 14, 2017

Updated by mfriedrich on 2017-01-14 13:09:02 +00:00

  • Relates deleted 10411

@icinga-migration icinga-migration added this to the 2.6.0 milestone Jan 17, 2017

Contributor

m-reiter commented Apr 19, 2017

The link to the documentation in the Migration section seems dead.

Contributor

m-reiter commented Apr 19, 2017

Yes, I found it without problems. I just thought that since this issue is the first hit on a Google search for "icinga2 migrate bottom up top down", someone might want to update the link in the description itself to actually point somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment