Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround issues with ld from binutils 2.29 #266

Merged
merged 7 commits into from Dec 14, 2017

Commits on Dec 12, 2017

  1. Med: qblog.h: better explanation + behaviour of QB_LOG_INIT_DATA

    Based on better understanding how link-time callsite collection works,
    put a better description for the macro.  Also based on poor user
    experience in case that feature does not work well, say because
    the linker deliberately changes the previously settled visibility
    of the section boundary symbols (happened in ld from binutils-2.29,
    fix is forthcoming), tweak the assertion message a bit, together
    with an extension of the general intro to point that macro out.
    Also play fair, include the <assert.h> header, which this macro
    requires.
    
    - use case:
      /usr/sbin/pacemakerd --features
    
    - before:
      pacemakerd: utils.c:69: common:
      Assertion `0' failed
    
    - after:
      pacemakerd: utils.c:69: common:
      Assertion `"non-empty implicit callsite section" && QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP' failed.
    
    Restructuring of the assertion inspired by the suggestion of Ferenc
    Wágner (for the subsequent commit, actually).
    
    And regarding:
    > as a side effect, it can ensure the boundary-denoting symbols for
    > the target collection area are kept alive with some otherwise unkind
    > linkers
    this was actually empirically discovered in one particular combination
    with ld.bfd @ binutils 2.29, and extensively with 2.29.1 (placing here
    a forward reference for the following commits that elaborate on the
    libqb-target cross-combination matrix and the findings).
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    5e53701 View commit details
    Browse the repository at this point in the history
  2. build: configure: check section boundary symbols present in the test

    There was an idea to make apparently run-time test set to that effect,
    but it would make cross-building harder.  So arrange the test as if it
    would be meant for AC_TRY_RUN, but achieve the same as with the first
    assertion by the means of inspecting the linked result with, possibly
    target-specific, nm utility.
    
    While arranging the test for AC_TRY_RUN, based on feedback by Ferenc
    Wágner, unify and increase usability of the run-time error signalling
    through assertions.
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    6c99487 View commit details
    Browse the repository at this point in the history
  3. tests: new sort of tests dubbed "functional", cover linker vs. logging

    These are for quick manual sanity checking, assuming the target audience
    -- maintainers -- are clear on the context of use and the purpose
    (perhaps with the help of static files for comparison and/or additional
    checking harness, usually available through "make check", but not to be
    confused with regular unit + broader tests).  These test are meant to be
    compiled on demand only, not during the standard building routine, for
    which a trick leveraging GNUmakefile-Makefile precedence with GNU make
    was devised (GNU make/gmake already required by configure script for
    other reasons [some pattern-based matching not available with FreeBSD's
    default "make", IIRC], so this introduces no new build dependency).
    
    The respective new tests are meant to simulate logging variants in two
    different library consumption models:
      a. regular: linking against system-wide library
      b. developmental: consuming library from a local sub-checkout tree,
                        using libtool conventions and hence attaching the
                        library through libqb.la intermediate library
                        descriptor of libtool
    and between up to three possibly affected logging system participants
    (discrete compilation units):
      1. libqb itself will emit log messages in boundary conditions or
         for tracing purposes
      2. client program that consumes libqb's logging API directly
      3. ditto, but the client program furthermore links with a library
         (referred to as "interlib") that itself exercises the logging
         API (it's also linked with libqb) -- through induction, this
         should cover whole class of N interlib cases
    
    Especially the latter perspective makes for a test matrix to possibly
    (hopefully) demonstrate a fix allowing to cope with the changed
    behaviour of ld from binutils 2.29+ wrt. boundaries denoting symbols for
    a (custom) orphan section that are no longer externally visible.  Such
    commit is in the pipeline...
    
    Developmental consumption model (a.) is now also tested automatically
    in Travis CI runs and as a part of %check within upstream-suggested
    libqb.spec for RPM packaging, whereas the regular one (b.) serves as
    a building block for new log_test_mock.sh runner of said test matrix
    -- it iterates through all the possible permutations of linker-imposed
    implicit visibility of mentioned symbols between various affected
    link participants all making use of logging (see 1. - 3. above) so as
    to demonstrate A/ the impact of the problem (see table below), and
    subsequently B/ that the fix is effective in all these situations
    (updated table will be provided as well) once it lands.  This script
    also allows convoluting the test matrix further, notably with on-demand
    defusing the self-checks based on QB_LOG_INIT_DATA macro, which is
    of significance as demonstrated below (and will become even more
    important with upcoming patches in this series).
    
    * * *
    
    Current state for such matrix, in which participants 1. - 3. map like:
      1. ~ libqb(Y)
      2. ~ "direct"
      3. ~ libX(Y)  [a.k.a. interlib]
    and where "X(Y)" denotes "X linked with linker Y":
      X(a) .. ld.bfd < 2.29
      X(b) .. ld.bfd = 2.29 (and only 2.29),
    goes like this:
    
    +=========+=========+=========+=========+=========+=========+=========+
    #client(x)#        libqb(a) usage       #        libqb(b) usage       #
    #   vvv   #---------+---------+---------+---------+---------+---------+
    #    V    #  direct | libX(a) : libX(b) #  direct | libX(a) : libX(b) #
    +=========+=========+=========+=========+=========+=========+=========+
    #  x = a  #   OK    |   OK    : BAD[*2] # BAD[*1] | BAD[*D] : BAD[*3] #
    #  x = b  # BAD[*A] | BAD[*B] : BAD[*C] # BAD[*1] | BAD[*C] : BAD[*3] #
    +=========+=========+=========+=========+=========+=========+=========+
    
    whereas if we swap 2.29 for 2.29.1, i.e., X(b) .. ld.bfd = 2.29.1, we
    can observe a somewhat simpler story (DEP ~ "depends"):
    
    +=========+=========+=========+=========+=========+=========+=========+
    #client(x)#        libqb(a) usage       #        libqb(b) usage       #
    #   vvv   #---------+---------+---------+---------+---------+---------+
    #    V    #  direct | libX(a) : libX(b) #  direct | libX(a) : libX(b) #
    +=========+=========+=========+=========+=========+=========+=========+
    #  x = a  #   OK    |   OK    : DEP[*J] # BAD[*1] | BAD[*1] : BAD[*L] #
    #  x = b  # DEP[*I] | DEP[*I] : DEP[*K] # BAD[*1] | BAD[*1] : BAD[*L] #
    +=========+=========+=========+=========+=========+=========+=========+
    
    * * *
    
    [*1] client logging not working
    [*2] interlib logging not working
    [*3] both client and interlib logging not working
    
    [*A] boils down to [*1], unless QB_LOG_INIT_DATA used on client side,
         which fails on 'implicit callsite section is populated' assertion
    [*B] boils down to [*1], unless QB_LOG_INIT_DATA used on interlib side,
         which fails on 'implicit callsite section is populated' assertion
    [*C] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side,
         which fails on 'implicit callsite section is populated' assertion
    [*D] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side,
         which makes it boil down just to [*1] (hypothesis: mere internal
         self-reference to the section's boundary symbols makes them
         overcome some kind of symbol garbage collection at the linkage
         stage, so they are exposed even they wouldn't be otherwise as
         demonstrated with the initial, plain case of [*3])
    
    [*I] boils down to [*1], unless QB_LOG_INIT_DATA used on client side,
         which makes it, likely through self-reference keepalive (see
         below) work OK
    [*J] boils down to [*2], unless QB_LOG_INIT_DATA used on interlib side,
         which makes it, likely through self-reference keepalive (see
         below) work OK
    [*K] boils down to [*3], unless QB_LOG_INIT_DATA used on both client
         and interlib side, which makes it, likely through self-reference
         keepalive (see below) work OK  (it's expected that this a mere
         composite of situations [*I] and [*J] with consequences as stated)
    [*L] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side
         (sufficient?), which makes it, likely through self-reference
         keepalive (see below) boil down just to [*1]
    
    * * *
    
    Note: as observed with [*D] case (libqb linked with ld.bfd < 2.29
    whereas interlib and its client linked with ld.bfd = 2.29), the exact
    availability of a working logging doesn't depend solely on the linkers
    in question, but generally (further investigation out of scope) the
    conclusion is that when 2.29 ld.bfd is involved somewhere in the chain
    of logging-related discrete compilation units, also (self-)referencing
    of the section's boundary denoting symbols is a deciding factor whether
    particular logging source will be honored.  This may be a result of
    some internal linkage garbage collection mechanisms involved.
    Anyway, it is supposed that the fix to broken-by-linkage logging can be
    proclaimed complete once all combinations pass barring QB_LOG_INIT_DATA
    usage (incurring the mentioned active referential use of the symbols),
    along with a spin using it everywhere for good measure.
    
    For another level of the analysis depth, one can further play with
    combinations of -n{sc,cl,il} options (explained upon -h switch) to
    log_test_mock.sh (taking an oracle, this is added mostly to justify
    the upcoming self-check test change because linker-script-based
    workaround for newer linkers will cause the section boundary symbols
    to be present regardless if that section is utilized, leading to
    a self-inflicted breakage due to these empty section symbols suddenly
    winning in the symbol resolution mechanism).
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    4546106 View commit details
    Browse the repository at this point in the history
  4. tests: add a script to generate callsite-heavy logging client...

    ...so as to evaluate use of resources.  In particular, the intention
    here is to uncover the observable differences between the same logging
    code built with callsite section (default when available) and
    purposefully (overriding that default by force) without it.
    To allow for the latter being applied conveniently, new macro,
    QB_KILL_ATTRIBUTE_SECTION, can be defined in the compile-time of the
    client code that wishes to opt-out from the callsite section feature.
    
    * * *
    
    Following is a discussion on these differences, sticking with the
    logging client code generated with the script defaults, i.e., as it
    would be run with these switches:
      --callsite-count=3640
      --branching-factor=3
      --callsites-per-fnc=10
      --round-count=1000
    and then built twice (as detailed in
    tests/functional/log_external/Makefile.am):
      * log_callsite_bench_sectionfull
        - with callsite section
      * log_callsite_bench_sectionless
        - without callsite section, imposed with -DQB_KILL_ATTRIBUTE_SECTION
          in CPPFLAGS
    
    --> Static size of the executable:
    
    $ size -B log_callsite_bench_section* | tr '\t' ' ' | tr -s ' ' \
      | cut --complement -d' ' -f6 | column -t
    > text    data    bss  dec     filename
    > 82761   146180  4    228945  log_callsite_bench_sectionfull
    > 190000  588     4    190592  log_callsite_bench_sectionless
    
    We can see that sectionfull is few kB bigger in the object sections
    of interest, though the text-data ratio changes considerably, with
    code section being cut in half in comparison to sectionless, which
    can actually help the code locality (and hence utilization
    of CPU caches) in the former case.
    
    --> Dynamic memory/heap operations:
    
    $ valgrind --log-fd=1 ./log_callsite_bench_section{full,less} 2>/dev/null
    
    total heap usage:
    - log_callsite_bench_sectionfull:
    > 88 allocs, 87 frees, 3,427 bytes allocated[*]
    - log_callsite_bench_sectionless:
    > 11,894 allocs, 11,893 frees, 486,035 bytes allocated[*]
    
    [*] "32 bytes in 1 blocks still reachable" looks rather as a spurious
        warning on the valgrind's side (matter of dynamic linking library)
    
    Apparently, sectionless keeps stirring the heap constantly, with all
    the possible downsides associated with that, like hitting the page
    faults leading to less timely execution.
    
    --> Run-time efficiency:
    
    $ time ./log_callsite_bench_section{full,less} 2>/dev/null
    
    mean attempts out of 3 consecutive runs:
    - log_callsite_bench_sectionfull:
    > real	0m1.298s
    > user	0m0.965s
    > sys	0m0.331s
    
    - log_callsite_bench_sectionless:
    > real	0m1.436s
    > user	0m1.067s
    > sys	0m0.365s
    
    As expected, we can observe sectionfull is slightly faster/more
    efficient.
    
    * * *
    
    Based on the above, we can conclude that leveraging the callsite
    section for logging as facilitated by the toolchain intrinsics is
    beneficial, especially for performance-critical applications (corosync
    being the showcase here).  Therefore it's desired to struggle for
    retaining this nifty trick despite some troubles emerged with recent
    binutils releases (starting with 2.29) and the changed behaviour we
    relied on so far in respective ld.bfd linkers (as mentioned in
    preceding commits).  That motive is immediately followed -- well,
    judging the impact fairly, actually outclassed -- with the intention
    to preserve binary compatibility (incl. continuous library support for
    callsite section offloading spread in the existing client space widely
    for quite some years already) to the utmost extent possible.
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    32555d8 View commit details
    Browse the repository at this point in the history
  5. Med: add extra run-time (client, libqb) checks that logging will work

    Now that the previous commit provides a foundation for what exactly can
    go wrong with ld.bfd = 2.29+ linker, let's start reconciling that with
     some reasonable assurance that logging is not silently severed, because
    realizing the logs are missing is otherwise bound to happen when the
    logs are suddently pretty crucial analytical resource :-)
    We'll proceed in two steps as detailed.
    
    * * *
    
    As a first step, the table below concludes how the test matrix overview
    introduced with a message for the preceding commit (also introducing
    log_test_mock.sh runner which got reused here) looks as of this
    refreshed sanity check, once QB_LOG_INIT_DATA macro at hand gets applied
    (meaning "for non-libqb logging participants" so as not complicate the
    matrix further).  That macro is nothing triggered directly, it will just
    plant a constructor-like function (to be invoked automatically early in
    the execution) that will run through the checks (one original and couple
    of new ones as of this changeset).
    
    Note that for libqb users, this implies a new link dependency on libdl,
    because they may opt-in for refreshed QB_LOG_INIT_DATA sanity check that
    calls out to dlopen/dlsym/dladdr directly in case of "attribute section"
    being available for the particular platform, and hence immediately needs
    those symbols resolved in link time.  Hence, add this conditional link
    dependency to libqb.pc pkg-config file under Libs variable -- we
    actually restore the occurrence of "-ldl" there as it used to be present
    until commit 56754d0.  While doing so, also move immediate link
    dependencies of libqb (if any, currently not but that may be
    a regression arising from the cleanup related to the mentioned commit)
    represented with the LIBS autoconf variable under Libs.private variable
    in libqb.pc, where it belongs per pkg-config documentation.
    
    The promised table follows, but first as a recap, "X(Y)" denotes
    "X linked with linker Y":
      X(a) .. ld.bfd < 2.29
      X(b) .. ld.bfd = 2.29 (and only 2.29)
    
    goes like this (values in <angle brackets> denote non-trivial change
    [not mere rewording] introduced as of this commit, in comparison to
    the table stated in the preceding commit):
    
    +=========+=========+=========+=========+=========+=========+=========+
    #client(x)#        libqb(a) usage       #        libqb(b) usage       #
    #   vvv   #---------+---------+---------+---------+---------+---------+
    #    V    #  direct | libX(a) : libX(b) #  direct | libX(a) : libX(b) #
    +=========+=========+=========+=========+=========+=========+=========+
    #  x = a  #   OK    |   OK    : BAD[*2] #<BAD[*E]>|<BAD[*F]>:<BAD[*G]>#
    #  x = b  # BAD[*A] | BAD[*B] : BAD[*C] #<BAD[*E]>|<BAD[*F]>:<BAD[*G]>#
    +=========+=========+=========+=========+=========+=========+=========+
    
    Woefully, nothing changes if we swap binutils 2.29 for 2.29.1, i.e.,
    X(b) .. ld.bfd = 2.29.1, compared to previous state, i.e., the second
    table from the previous commit is still applicable for that situation.
    The added sanity checks are useful nonetheless, consider for example,
    that attribute-section-less libqb is what gets run-time linked to an
    attribute-section-full target.  The most precise check we could use
    -- a custom logger function applied in a self-test scheme -- is not
    available at the point the macro-defined function gets invoked, simply
    because qb_log_init hasn't been invoked by the time that constructor
    gets triggered.  However, what we can do is to add a non-trapping
    libqb-residing reverse-testing of the client space that (and once it)
    voluntarily initiates qb_log_init (delivering abruption all of a sudden
    at some unanticipated, as opposed to a well-timed like with
    constructors, execution point, seems pretty bad idea + libqb as
    a library is a mere helper, not an undertaker :) -- this check then
    only announces, via syslog (the only pre-enabled logging target),
    the target's logging may be severed.
    
    * * *
    
    Hence, as a promised second step, after incorporating the syslog
    change (and extending log_test_mock.sh so as to capture syslog
    stream within the container), not much changes with the table above,
    i.e., X(b) .. ld.bfd = 2.29:
    [*A] in addition, unless QB_LOG_INIT_DATA used on client side,
         syslog carries this notice:
      "(libqb) log module hasn't observed target chain supplied callsite
       section, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly, or target chain built purposefully without
       these sections)"
         logged by libqb proper
    [*C] in addition, unless QB_LOG_INIT_DATA used on interlib side,
         syslog carries this notice:
      "(libqb) log module hasn't observed target chain supplied callsite
       section, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly, or target chain built purposefully without
       these sections)"
         logged by libqb proper
    [*E] in addition, unless QB_LOG_INIT_DATA used on client side,
         syslog carries this warning:
      "(libqb) log module has observed target chain supplied section
       unpopulated, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly)"
         logged by libqb proper
    [*F] in addition, unless QB_LOG_INIT_DATA used on interlib side,
         syslog carries this warning:
      "(libqb) log module has observed target chain supplied section
       unpopulated, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly)"
         logged by libqb proper
    [*G] in addition, unless QB_LOG_INIT_DATA used on interlib side,
         syslog carries this warning:
      "(libqb) log module has observed target chain supplied section
       unpopulated, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly)"
         logged by libqb proper
    
    but desirably changes with "X(b) .. ld.bfd = 2.29.1" one
    (DEP ~ "depends"):
    
    +=========+=========+=========+=========+=========+=========+=========+
    #client(x)#        libqb(a) usage       #        libqb(b) usage       #
    #   vvv   #---------+---------+---------+---------+---------+---------+
    #    V    #  direct | libX(a) : libX(b) #  direct | libX(a) : libX(b) #
    +=========+=========+=========+=========+=========+=========+=========+
    #  x = a  #   OK    |   OK    : DEP[*J] #<BAD[*M]>|<BAD[*M]>:<BAD[*L]>#
    #  x = b  #<DEP[*N]>| DEP[*I] :<DEP[*O]>#<BAD[*M]>|<BAD[*M]>:<BAD[*L]>#
    +=========+=========+=========+=========+=========+=========+=========+
    
    * * *
    
    [*1] client logging not working
    [*2] interlib logging not working
    [*3] both client and interlib logging not working
    
    [*A] boils down to [*1], unless QB_LOG_INIT_DATA used on client side,
         which then fails on
      "implicit callsite section is populated, otherwise target's build
       is at fault, preventing reliable logging"
         assertion
    [*B] boils down to [*1], unless QB_LOG_INIT_DATA used on interlib side,
         which then fails on
      "implicit callsite section is populated, otherwise target's build
       is at fault, preventing reliable logging"
         assertion
    [*C] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side,
         which then fails on
      "implicit callsite section is populated, otherwise target's build
       is at fault, preventing reliable logging"
         assertion
    [*E] boils down to [*1], unless QB_LOG_INIT_DATA used on client side,
         which then fails on
      "implicit callsite section is self-observable, otherwise target's
       and/or libqb's build is at fault, preventing reliable logging"
         assertion
    [*F] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side,
         which then fails on
      "libqb's callsite section is populated, otherwise libqb's build is
       at fault, preventing reliable logging"
         assertion
    [*G] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side,
         which then fails on
      "implicit callsite section is self-observable, otherwise target's
       and/or libqb's build is at fault, preventing reliable logging"
         assertion
    
    [*I] boils down to [*1], unless QB_LOG_INIT_DATA used on client side,
         which makes it, likely through self-reference keepalive (see
         below), work OK
    [*J] boils down to [*2], unless QB_LOG_INIT_DATA used on interlib side,
         which makes it, likely through self-reference keepalive (see
         below), work OK
    [*K] boils down to [*3]
         in addition, syslog carries this notice:
      "(libqb) log module hasn't observed target chain supplied callsite
       section, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly, or target chain built purposefully without
       these sections)"
         logged by libqb proper
    [*L] boils down to [*3], unless QB_LOG_INIT_DATA used on interlib side
         (sufficient?), which makes it, likely through self-reference
         keepalive (see below), boil down just to [*1];
         in addition, syslog carries this notice:
      "(libqb) log module hasn't observed target chain supplied callsite
       section, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly, or target chain built purposefully without
       these sections)"
         logged by libqb proper
    [*M] boils down to [*1];
         in addition, syslog carries this notice:
      "(libqb) log module hasn't observed target chain supplied callsite
       section, target's and/or libqb's build is at fault, preventing
       reliable logging (unless qb_log_init invoked in no-custom-logging
       context unexpectedly, or target chain built purposefully without
       these sections)"
         logged by libqb proper
    [*N] boils down to [*M], unless QB_LOG_INIT_DATA used on client side,
         which makes it, likely through self-reference keepalive (see
         below), work OK
    [*O] boils down to [*K], unless QB_LOG_INIT_DATA used on both client
         and interlib side, which makes it, likely through self-reference
         keepalive (see below), work OK  (it's expected that this a mere
         composite of situations [*I] and [*J] with consequences as stated)
    
    * * *
    
    Note: the only problematic (i.e. not captured automatically by the
    QB_LOG_INIT_DATA macro presumably utilized at every non-libqb logging
    system participant in the form of a discrete compilation unit)
    combination with 2.29 is the one intersecting at "BAD[*2]" pertaining
    "everything but interlib compiled with ld.bfd < 2.29".  It would, of
    course, be solvable as well, but presumably not in an easy way, and
    that use case should not be as frequent.
    
    Takeway: whenever your target (library or client program) actively
    utilizes logging (meaning it emits at least a single log message,
    otherwise there's an imminent danger of possibly even run-terminating
    false positive in the self-check mechanism!),
    
      YOU ARE strongly ENCOURAGED TO USE QB_LOG_INIT_DATA macro function
      at (exactly) one of the source code files (presumably the main one)
      per respective target's compilation unit.
    
    It will alleviate the hassles possibly caused by downgrading libqb
    to the linker-vs-libqb incompatibly compiled one or in similar
    circumstances arising merely from the linker behaviour change,
    which the current build system/code shake is all about.
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    cdb0269 View commit details
    Browse the repository at this point in the history
  6. High: bare fix for libqb logging not working with ld.bfd/binutils 2.29+

    (or rather [read on]: "bare" fix, now that we established means to
    analyse the impact of the linker-dependent misbehaviour and to detect
    some of its symptoms in preceding two commits, respectively)
    
    Initially with the help of the internal test suite and the failing log
    test, it was eventually discovered[1] that these binutils commits going
    to the recent 2.29 release affected the treatment of _start_SECNAME
    and __stop_SECNAME symbols denoting the boundary start/stop addresses
    of a SECNAME orphan section -- specifically in libqb context a custom
    section (SECNAME=__verbose) used for link-time ("run-time amortizing")
    callsite collection when there's a support in the toolchain[*]:
    
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=cbd0eecf261c2447781f8c89b0d955ee66fae7e9
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=b27685f2016c510d03ac9a64f7b04ce8efcf95c4
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7dba9362c172f1073487536eb137feb2da30b0ff
    
    The first one explicitly states:
    > Also __start_SECNAME and __stop_SECNAME symbols are marked as hidden
    > by ELF linker so that __start_SECNAME and __stop_SECNAME symbols for
    > section SECNAME in different modules are unique.
    
    The problem is that libqb silently depends on the previous status quo
    ld.bfd linker behaviour of keeping those symbols externally visible,
    which was apparently not granted as it has deliberately changed per
    above.
    
    And then for 2.29.1 release of binutils once again, as someone actually
    noticed something went overboard with the 2.29 changes:
    
    http://lists.gnu.org/archive/html/bug-binutils/2017-08/msg00195.html
    (overview of the original bug discussion, rather than directly
    https://sourceware.org/bugzilla/show_bug.cgi?id=21964, which is
    a result of a conflct resolution when restoring bugzilla backup)
    https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=487b6440dad57440939fab7afdd84a218b612796
    
    At least that change doesn't invalidate all the effort being put into
    the original version of the changeset, only the configure script check
    had to be refined so as not to miss the "orphan section magic not
    working properly out of the box, without band aid" observation
    (see the inline comment) -- the workaround arrangement needs
    to be applied in that case as well.
    
    * * *
    
    So regarding the solution itself, the core of the fix was sketched at
    the original Fedora targeted bug against binutils[2].  In short, we are
    using a custom linker script that (re)describes the mentioned custom
    orphan output section, or better yet, assuredly pushes that section, and
    more importantly, it's own boundary denoting symbols, through into the
    resulting executable when it's being linked (as in compile-time step).
    
    This solution alone, while working for the non-libqb (more on that
    below) logging participants, is not good enough, as it requires all
    libqb targets to start using new incantation (namely "-Wl,foo.t" switch)
    in the final link step during compilation, which might be solvable
    with a tweak in libqb's pkg-config file under assumption that practice
    of using "pkg-config --libs libqb" is rigidly followed.  Which is likely
    a false expectation, and furthermore only for the regular consumption
    model, as it doesn't cover the least bit the developmental one (refer
    to previous-but-one "tests" commit message), e.g. applied for internal
    examples + tests (but no local sub-checkout tree usage can be excluded).
    
    So further extensions were devised to cover both consumption models:
    
    - a. regular:
      courtesy of binutils maintainer[3], we follow an idea to make libqb.so
      (i.e. what the targets link against) rather a linker script on its
      own, which first include the version-specified (e.g.  libqb.so.0) file
      into the link, then lists, in situ, the content of the linker script
      per above, hence -lqb linking has the same effect as having both
      "-lqb -Wl,foo.t" explicitly in the link command prior to this trick
    
    - b. developmental:
      to eliminate any kind of race condition arising from the attempt
      to post-modify libqb.la libtool archive file generated internally
      by libtool, we sort of abuse "inherited_linker_flags" variable
      within this file format, as it forms an accumulative value across
      the whole transitive dependencies chain (if not impaired per the
      note below), fitting exactly our purpose of injecting "-Wl,foo.t"
      switch equivalent for those libtool-linking by L{D,IB}ADD'ing
      libqb.la; it's then enough to craft a custom libtool archive file
      declaring that value, and hook it into such dependency chain through
      libqb_la_LIBADD, and with a little bit of further fiddling, it works
      as desired (note that double occurrence of "-Wl,foo.t" equivalent
      present at some stages of sorting this trick turned out to be,
      surprisingly, counter-productive, which should now demistify the
      very existence of effectively empty qblog_script_noop.ld file);
    
      NOTE: some forms of libtool distribution (debian + derivatives ones
      in particular) undermine natural transitive dependency propagation
      with a deliberate cut off (https://bugs.debian.org/702737), so we
      need to ensure the "impairment" is not happening by force (corosync
      precedent: corosync/corosync@0f1dc5c1)
      ^ something like this needs to be applied for any such "private
        consumer" (although it hopefully goes without saying this way
        of consuming libqb outside of it's own playground is hardly
        the Right Thing) if portability is important, nonetheless!
    
    * * *
    
    On the address of linker script workaround, there are linkers out there
    that do not support the trick, for instance:
    - ld.gold from binutils (but it has hardly ever been working with
      orphan sections, anyway:
      https://sourceware.org/bugzilla/show_bug.cgi?id=22291)
    - ancient versions of ld.bfd, e.g. 10+ years old one used as a native
      system linker even in the most recent releases of FreeBSD, unless
      GCC toolchain is used instead
    If these are hit when (because) the compiler has already demonstrated it
    supports "section" attribute, the build system configuration is forcibly
    stopped, simply to stay conceptually compatible with the prior state in
    which the affinity to leverage that feature hasn't been called off
    under any circumstances.  One is, however, able to achieve exactly
    this behaviour with --enable-nosection-fallback switch, but if some
    other participants in the logging, possibly linked with a more friendly
    linker, do utilize this orphan section, logging may silently break
    (another reason to require an explicit sign-off).
    
    Another note, the particular self-check change slightly touched in the
    previous commit but otherwise predating this whole effort by far needs
    to be modified now once again, this time because linker-script-based
    workaround for newer linkers as stated causes the section boundary
    symbols to be present regardless if that section is utilized, leading
    to a self-inflicted breakage due to these empty section symbols suddenly
    winning in the symbol resolution mechanism (previously the empty section
    would be dropped incl. the boundary symbols), causing problems down the
    line.  It also makes this very check self-contained in the same
    compilation unit that trigggers it, whereas previously it used to be the
    said "arbitrary" winner and things kept silently working just because
    failure condition -- empty section -- would be implicitly isolated.
    
    Last but not least, libqb itself needs to be linked with the mentioned
    "-Wl,foo.t" equivalent for its own outgoing log messages to be honoured
    under all circumstances, which is already achieved with the arrangement
    for b. above, and by experiments, further redefinition of those boundary
    denoting symbols as weak was necessary so as to make them truly global
    within libqb.so proper (at least with binutils 2.29).
    
    * * *
    
    To provide a high-level prioritized overview of what drove the approach:
    
    - PRESERVATION OF BINARY COMPATIBILITY (ABI), which is achieved except
      for a single "ABI nongracefulness" I am aware of but that's more
      a consequence of slightly incorrect assumptions in the logic of
      QB_LOG_INIT_DATA macro function predating this whole affair by
      a long shot and which the patchset finally rectifies:
    
      if in the run-time dynamic link, following is combined:
      (. libqb, arbitrary variant: pre-/post-fix, binutils < / >= 2.29)
      . an "intermediate" library (something that the end executable links
        with) triggering QB_LOG_INIT_DATA macro and being built with
        pre-fix libqb (and perhaps only with binutils < 2.29)
      . end executable using no libqb's logging at all, but being built
        with post-fix libqb (and arbitrary binutils < / >= 2.29)
      then, unlike when executable is built with pre-fix libqb, the
      special callsite data containing section in the ELF structure
      of the executable is created + its boundary denoting symbols
      defined within, despite the section being empty (did not happen
      with pre-fix libqb), and because the symbols defined within the
      target program have priority over that of shared libraries in the
      symbol resolution fallback scheme, the assertion of QB_LOG_INIT_DATA
      of the mentioned intermediate library will actually be evaluating
      the inequality of boundaries for the section of the executable(!)
      rather than it's own (or whatever higher prio symbols are hit,
      presumably only present if the section at that level is non-empty,
      basically a generalization of the story so far);
    
      the problem then manifests as unability to run said executable
      as it will fail because of the intermediate library inflicted
      assertion (sadly with very unhelpful "Assertion `0' failed"
      message);
    
      fortunately, there's enough flexibility so as how to fix
      this, either should be fine:
      . have everything in the executable's library dependency closure
        that links against libqb assurably (compile-time) linked with one
        variant of libqb only (either all pre-fix or post-fix, mind the
        apparent limitation of binutils' versions with the former)
      . have the end executable (that does not use logging at all as
        discussed precondition) linked using substitution like this:
        s/-lqb/-l:libqb.so.0/  (you may need to adapt the number later)
        and you may also need to add this CPPFLAG for the executable:
        -DQB_KILL_ATTRIBUTE_SECTION
    
    - as high level of isolation of the client space from the linker
      (respectively toolchain) subtleties as possible (no new compilation
      flags and such required, plus there's no way to hook any dynamic
      computational ad-hoc decision when the compilation is about to
      happen, anyway), and in turn, versatility is preserved as much as
      possible
    
    * * *
    
    Finally, let's have a look how the already well-known test matrix
    overview changes as of this commit, but first as a recap,
    "X(Y)" denotes "X linked with linker Y":
      X(a) .. ld.bfd < 2.29
      X(b) .. ld.bfd = 2.29 (+ 2.29.1 and hopefully on)
    
    and here you are (values in <angle brackets> denote non-trivial change
    [not mere rewording] introduced as of this commit, in comparison to the
    table stated in the preceding commit):
    
    +=========+=========+=========+=========+=========+=========+=========+
    #client(x)#        libqb(a) usage       #        libqb(b) usage       #
    #   vvv   #---------+---------+---------+---------+---------+---------+
    #    V    #  direct | libX(a) : libX(b) #  direct | libX(a) : libX(b) #
    +=========+=========+=========+=========+=========+=========+=========+
    #  x = a  #   OK    |   OK    :  <OK>   #  <OK>   |  <OK>   :  <OK>   #
    #  x = b  #  <OK>   |  <OK>   :  <OK>   #  <OK>   |  <OK>   :  <OK>   #
    +=========+=========+=========+=========+=========+=========+=========+
    
    Everything is green \o/
    
    * * *
    
    Note: as of this fix, it is assumed that the non-green counterpart of
    this table in the message for the preceding commit (loosely though[!],
    as the occurrence of empty callsite section can no longer be attributed
    to something bad going on as of this fix that enforces its presence
    unconditionally, whereas it would be suppressed when unused before
    with kind linkers, hence some other conditions can be witnessed
    especially when QB_LOG_INIT_DATA misused in no-logging context)
    doubles as an indicator how will mixing the logging participants wrt.
    linker+libqb version work out, when "X(Y)" becomes read as "X linked
    with linker Y under additional restriction on libqb version when
    compile-time link is performed of the particular part":
      X(a) .. ld.bfd < 2.29 OR [arbitrary ld.bfd AND libqb after this fix)
      X(b) .. ld.bfd = 2.29 (and likely on) AND libqb up to, but excluding
              this fix
    
    * * *
    
    Let's also state some imperfections and loops kept open:
    
    Deficiencies:
    * whenever anything is compiled against our install-time-modified
      libqb.so so as to force the visibility of the discussed symbols
      (or when compiling [with] libqb internally):
    > /usr/bin/ld: warning: ../lib/qblog_script.ld contains output sections; did you forget -T?
      - not solvable as long as we use the linker script, and there's
        hardly any other way not requiring the libqb consumers to adapt
        in any aspect
    * as already mentioned, lacking compatibility with ld.gold linker and
      won't foreseeably be (cf. https://bugzilla.redhat.com/1500898#c7)
      - please stick with ld.bfd (i.e. default ld linker), which you
        had to do in the past anyway (at least for compiling libqb
        itself)
    
    Open questions:
    * should we enable attribute((__section__)) for powerpc and other minor
      platforms if the feature is proved to be working there as well?
      and if/when that's going to happen, we need to figure out the
      transition plan to be spread throughout an extended period to keep
      the transition smooth -- notably when now-with-callsite-section
      clients will get run-time linked with callsite-section-not-a-default
      libqb (say upon it's downgrade), and for that, the libqb's support
      alone should be enabled year(s) ahead of the actual client space...
    
    * * *
    
    [*] basically GCC's section("SECNAME") __attribute__ annotation of the
    global variables + linker described behaviour previously mistakenly
    taken for granted
    
    References:
    [1] http://oss.clusterlabs.org/pipermail/developers/2017-July/000503.html
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=1477354#c2 + comment 8
    [3] https://bugzilla.redhat.com/show_bug.cgi?id=1477354#c9
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    20246f5 View commit details
    Browse the repository at this point in the history
  7. Low: fix internal object symbol's leak & expose run-time lib version

    The object in question has never been published through the header file,
    hence it's presumably safe to make it static as it's meant to be.
    
    On the other hand, QB_LOG_INIT_DATA macro from qblog.h has already
    started to depend on that symbol so as to locate the library handle
    for libqb itself correctly.  This is trivially fixed by finally exposing
    library versioning info in run-time ("online") as a structure with
    members corresponding to compile-time ("offline") counterparts from
    qbconfig.h header file, which are admittedly of very limited use
    as opposed to the newly introduced dynamic info, plus lower-cased
    equivalent of QB_VER_STR.  Better than to roll out a futile data object
    serving as an artificial anchor for the above purpose, and this was
    due for a while, afterall.
    
    In turn, also bump "current" and "age" of fields of the libtool's
    "-version-info" versioning system.
    
    Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
    jnpkrn committed Dec 12, 2017
    Copy the full SHA
    c011b12 View commit details
    Browse the repository at this point in the history