Skip to content
This repository has been archived by the owner. It is now read-only.

[dev.icinga.com #4199] multiple idomod modules: only first gets data from registered callback functions #1282

Closed
icinga-migration opened this issue May 18, 2013 · 4 comments
Labels
bug
Milestone

Comments

@icinga-migration
Copy link
Member

@icinga-migration icinga-migration commented May 18, 2013

This issue has been migrated from Redmine: https://dev.icinga.com/issues/4199

Created by mfriedrich on 2013-05-18 17:45:50 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2013-05-22 11:29:35 +00:00)
Target Version: 1.10
Last Update: 2013-05-22 11:29:35 +00:00 (in Redmine)

Icinga Version: 1.9.0
OS Version: any

once a module gets loaded via dlopen() it registers its callback functions. if one loads multiple modules of the same binary, the exported symbol space is shared and only the first entry wins.

http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html

Symbols introduced into a program through calls to dlopen() may be used in relocation activities. Symbols so introduced may duplicate symbols already defined by the program or previous dlopen() operations. To resolve the ambiguities such a situation might present, the resolution of a symbol reference to symbol definition is based on a symbol resolution order. Two such resolution orders are defined: load or dependency ordering. Load order establishes an ordering among symbol definitions, such that the definition first loaded (including definitions from the image file and any dependent objects loaded with it) has priority over objects added later (via dlopen()). Load ordering is used in relocation processing. Dependency ordering uses a breadth-first order starting with a given object, then all of its dependencies, then any dependents of those, iterating until all dependencies are satisfied. With the exception of the global symbol object obtained via a dlopen() operation on a file of 0, dependency ordering is used by the dlsym() function. Load ordering is used in dlsym() operations upon the global symbol object.

Changesets

2013-05-18 18:51:17 +00:00 by (unknown) 5248386

nebmods: fix multiple modules sharing same symbols not receiving callback data

once dlopen() loads modules with the same name, it will make sure to
register only the first symbols, and not overriding the coming ones from
more modules of the same binary.

this is bad, as e.g. multiple idomod's will try to register the same
callback function symbols, and only the first one wins then. this is the
reason why only the first instance_name is getting populated in idoutils
database then too.

the initial handshake of ido2db with idomod only happens for the reason
of calling the nebmodule_init() function symbol (from the first module)
with different arguments, passed by the core. each module should have at
least a different instance_name defined in its idomod.cfg, so the init
function will be called with those, doing a handshake (and insert into
icinga_instances table then) then. but afterwards, no data gets received
from the callbacks not having such a diversity (how should the core then
know anyways, it's a _callback_ function called only once from symbol
space!).

basically, the root issue is an omd patch living for more than 2 years
now, but the old behaviour is now restored with this fix (plus adding a
debug idea stolen from Andreas Ericsson, thanks).
alongside fixing this, the solution is rather simple - make a temp copy
of the binary to some random name, and let dlopen actually load that
into memory. then all the registered function symbols will remain module
based and everything else works with the callbacks and so on.

as this is a behavioural change, it should get a note in Changelog once
merged into release trees.

refs #4199

2013-05-22 12:46:13 +00:00 by (unknown) a84cd3d

update Changelog for #4199

refs #4199
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 18, 2013

Updated by mfriedrich on 2013-05-18 18:18:22 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 1.9.1

the root cause is the omd patch, removing the temporary copy of the neb module itsself. reverting the patch, and adding a revamped version of the original allows us to trick dlopen again into at least 2 different binaries, with 2 different symbol spaces and therefore having the registered callback functions for their own.

while having this fixed, 2 idomod modules loaded will not only cause 2 ido2db connections, but send actual data. the initial api handshake sending instance_name over to ido2db does not require any callback function symbol, but is done during neb module init!

define module{
        module_name     idomod1
        module_type     neb
        path            /usr/lib/idomod.so
        #path            /usr/bin/idomod.o
        args            config_file=/etc/icinga/idomod.cfg
        }

define module{
        module_name     idomod2
        module_type     neb
        path            /usr/lib/idomod.so
        args            config_file=/etc/icinga/idomod2.cfg
        }

# grep instance_name /etc/icinga/idomod*.cfg
/etc/icinga/idomod2.cfg:instance_name=icinga-dev2
/etc/icinga/idomod.cfg:instance_name=icinga-dev

mysql> select i.instance_name as instance, count(*) from icinga_services s join icinga_instances i on s.instance_id=i.instance_id group by i.instance_name;
+-------------+----------+
| instance    | count(*) |
+-------------+----------+
| icinga-dev  |     4094 |
| icinga-dev2 |     4094 |
+-------------+----------+
2 rows in set (0.01 sec)

mysql> select i.instance_name as instance, count(*) from icinga_hosts h join icinga_instances i on h.instance_id=i.instance_id group by i.instance_name;
+-------------+----------+
| instance    | count(*) |
+-------------+----------+
| icinga-dev  |      274 |
| icinga-dev2 |      274 |
+-------------+----------+
2 rows in set (0.00 sec)

i haven't seen much benefit from the direct dlopen loading (only some tmpfs permission issues), so i do not see a showstopper to fix that for 1.9.1 - marking this as CHANGE then ofc.

Considering this a bug, disabling a feature of Icinga - now that we have the socket queue in ido2db, loading more than one idomod module sounds like a good (backup) idea too.

What do others think?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 18, 2013

Updated by mfriedrich on 2013-05-18 18:40:11 +00:00

Ok, some more details.

When the first idomod module gets loaded via dlopen, everything is fine. the symbol space gets registered and so on.
once there is a second module loaded with the same filename, dlopen will make sure that this does not override already registered symbols.

Only a single copy of an object file is brought into the address space, even if dlopen() is invoked multiple times in reference to the file, and even if different pathnames are used to reference the file.

So, how does the initial handshake with ido2db and different instance_name's happen? Well, during neb_load_module() the neb module gets directly loaded, but the neb module's init function is still the first symbol registered. That does not hurt here, because for every module attemped to be loaded (even if symbols are ignored by dlopen!) the init function is passed with the module arguments.
these arguments contain the configuration file path to idomod.cfg where the different instance_name is defined. So to speak, the first modules initfunction symbol takes care of dumping the first handshake data for all to-be-loaded modules.
later on, when the core itsself just runs, calling the callback functions, only the module with the first registered function symbols actually gets the data (first dlopen handle!), the other modules won't get any updates here.
by faking the name of the modules to some random name in temp, and actually making dlopen load these the symbol space won't get scrambled, but works for each on their very own.
my first attempt with making dlopen only using the local symbols failed, as this is a different purpose here (RTLD_LOCAL vs RTLD_GLOBAL).

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 22, 2013

Updated by mfriedrich on 2013-05-22 07:30:56 +00:00

  • Target Version changed from 1.9.1 to 1.10

a change, so major release tree needed.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented May 22, 2013

Updated by mfriedrich on 2013-05-22 11:29:35 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

merged to next. tests required.

@icinga-migration icinga-migration added this to the 1.10 milestone Jan 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant
You can’t perform that action at this time.