Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Med: libcrmcommon/iso8601: fix train of thinkos regarding time offsets #1759

Closed
wants to merge 3 commits into from

Conversation

jnpkrn
Copy link
Contributor

@jnpkrn jnpkrn commented Apr 25, 2019

Previously, the above example would be parsed as (+0020), which may
apparently derail the carefully polished execution plans of the users.
Also add a respective test to capture any regression.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 25, 2019

This weigh in on the side speaking for dropping undertested and
burdensome code we have better alternatives for in glib we already
rely on anyway ... coincidentally, the topic touched on literally
few hours back at another PR:

#1756 (comment)

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 25, 2019

(Hmm, any reason Travis hasn't kicked in?)

@@ -240,3 +240,7 @@ Duration: -1 month
Duration ends at: 2009-02-28 00:00:00Z
=#=#=#= End test: 2009-03-31 - 1 Month - OK (0) =#=#=#=
* Passed: iso8601 - 2009-03-31 - 1 Month
=#=#=#= Begin test: Test correct plus-sign at timezone parsing =#=#=#=
Date: 1556221374
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jenkins is getting 1556228574 here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it must be even more funny :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, time zone qualified date -> seconds since epoch (UTC) shall
be a very deterministic constant, without any other factor involved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it's even more broken than expected...

Will fix that as well, but would shift ditching that code to
a mid-term goal rather than long-term.

It also implements unsolicited superset of the ISO 8601 formats
(just as it works with a superset of XML, relying on implementation
details of libxml we've hit earlier).

@kgaillot
Copy link
Contributor

(Hmm, any reason Travis hasn't kicked in?)

The first run after access restrictions are enabled needs approval -- which has been done, so the next push should trigger it.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 26, 2019 via email

@jnpkrn jnpkrn changed the title Med: fix a thinko when parsing time offset with a '+' sign (e.g. +0200) Med: libcrmcommon/iso8601: fix train of thinkos regarding time offsets Apr 26, 2019
@kgaillot
Copy link
Contributor

Still not coming in and grabbing the CI tasks?

I just submitted a new PR, and it took longer than normal, but both CI deployments started.

Interestingly, for this PR, travis did run, it just didn't put the link on this page:

https://travis-ci.org/ClusterLabs/pacemaker/builds/525063666

It's possible it's just due to the situation at the time the PR was created. Maybe try deleting and re-creating to verify.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 26, 2019

It has caught up.
Thanks!

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 26, 2019

Fixed the (typical) problem cts-X.in vs. cts-X.

lib/common/iso8601.c Outdated Show resolved Hide resolved
It's not very clear why the whole parsing is so diverted from the actual
ISO 8601 standard prescribed syntax, but multiple problems were
discovered when testing something as basic as 2019-04-25T21:42:54+0200:

1/ the proclaimed ISO 8601 date/time format parser is schizophrenic
   in a sense it both (a) supports undocumented extensions above what
   the standard defines (like a tolerance to white space at various
   places within the format string, like mentioned in 2/), i.e.,
   implementing a support for a superset of ISO 8601, and at the same
   time, (b) doesn't implement support for the mandated syntactical
   constructions (said optional support for white spaces is turned into
   requiring them where are undue per the standard, meaning that the
   above would be treated as "2019-04-25T21:42:54 of whatever currently
   configured timezone"), i.e., effectively implementing a subset of
   the standard

2/ with 1/ fixed, that is, at least at the location affecting parsing
   of the above example input, '+' sign (e.g. +0200) would consume
   artificially a position where digit was expected (the above would
   be garbled and parsed as if it effectively was +0020)

These may, apparently, derail the carefully tweaked execution plans
of the users.

* * *

The identified problems are fixed while retaining the original
tolerance (against the standard), and a respective tests to capture
any regressions relevant to these fixes is added (two variants of
the same test, actually, so as to limit the possiblity of the problem
marked 1/ being masked -- simply because the runner's timezone matches
that marked with an offset, as happened in the first iteration of this
fix, thanks to CI/Ken for noticing it :-)

Sooner rather than later, we should swap this highly problematic
chunk of code for some offloaded solution, Glib's GDateTime:
https://developer.gnome.org/glib/stable/glib-GDateTime.html#g-date-time-new-from-iso8601
being a prime target.

Such a change will likely need to be accompanied some schema
upgrade procedures as well, so as not keep users on their own
with configurations that once worked well (only because of an
unexpectedly botched and untested implementation).
i.e., "you can rely on the (mis)parsing to be deterministic if you
(a) use the very same parsing code = same version, and (b) you don't
deliberately switch system's time zone settings unless always specify
the timezone offsets[*] (which is rather self-understood, hopefully)".

Wrapped in more diplomatic wordings towards the users, indeed.  We need
to be able to fix some apparent faults (see the previous "fix train of
thinkos regarding time offsets" commit).  Downright incompatible changes
like migrating to an external date/time parser and/or dropping the
support of unintendedly lax approach to ISO 8601 formats will need to
be accompanied with the respective schema migration routines anyway
(but the discipline established with this commit will even more
required at that time, since inputs to crm_rule are user-private
part of the configuration we have no way to mark stale/sanitize
automatically).

[*] note that, sadly, 1/ point in the said preceding commit would
    fail that assumption previously, but hopefully, it was the last
    such timezone-related problem of this kind (fingers crosed)

P.S. It's no accident that "hopefully" was used twice above.
Sadly, despite the standard references throughout, the outputs were
ostentatiously ignoring that and were producing non-conforming outputs
instead.  This is now tackled, and we also care to provide well-formed
inputs to cts-cli tests that are very related (note that -E switch to
iso8601 are meant for testing only, hence shall be rather un-publicized
at some point).

There was some fallout in terms of changed expected outputs for
cts-scheduler as well.
" versions of pacemaker, possibly -- depending on the"
" permanency of the arising decisions -- mandating"
" any such to be re-evaluated anew upon an update).",
pcmk_option_paragraph},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this note. It engenders more confusion than clarity, and man pages / help output is not the right place to document release notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it sounds maybe too harsh.

Do you suggest to document the expected problems arising from
real parser bugs elsewhere?

The "closure" makes it as least good-in-bad, so shall be documented
somewhere...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugs are documented in the change log. We simply don't need to warn users about fixed bugs. There are at least hundreds of potential problems with mixed-version clusters due to bugs that are fixed in the newer version. The documentation already strongly recommends that mixed versions be used only during a brief rolling upgrade window to minimize the chance for problems. Also, the fact that no one's ever reported the bug suggests that it's a rare usage; the average user shouldn't be bothered with it.

Copy link
Contributor Author

@jnpkrn jnpkrn Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we should educate users to use the software
correctly wrt. slight changes in behaviour and precomputed assumptions
-- it's much safer to re-evaluate some decision making variables in the
game after each update than to stay totally static there.

Especially when/if the parser is to be changed completely one
day -- CIB is mostly fine, it's the state we have access to and
can reconcile the situation with it. But we don't have the
visibility into all the surrounding scripting etc. so that's why
I think it'd be wise to lead an example of very dynamic adaptations
whereever possible.

It's not limited to date/time parsing as such, it shall be a very
general guidance, actually.

Copy link
Contributor Author

@jnpkrn jnpkrn Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very contrived, next to bogus example, to demonstrate what I mean:

  • user has a home-grown algorithm to allocate new cron tasks (local
    only) it is given as uniformly throughout the day as possible (for
    some reason it is critical for the overall business)

  • as a last step, this algoritm consults crm_rule and crossmatches
    selected rule ids against the time currently predestined for the
    new task to be scheduled

  • so far so good, now two branches

a) user upgrades pacemaker, and suddenly, the rules containing
a timezone without space gets the offset with moved precision
from units of hours to tens of minutes (as we have just seen)

  • only documentation may help, since it's the new pacemaker not
    fitting the old state of affairs (due to a slight behaviour
    altering)

  • this cannot be solved with schema upgrade, since if we would
    leave the buggy behaviour as was and just did the contra-shift,
    it would totally confuse users when observed afterwards
    ("you know, it's not exactly as you see it, since the SW
    contained a bug, so we mitigated that with this inverse
    action, and you are now mandated to parse the timezone
    in your brain just as the SW does -- at least we think
    it's what you actually wanted, because the zone +00:20
    is nowhere to be found") -- it feels a far better choice
    to just fix the parsing, even if at the same expense as
    above of slightly changing the behaviour, since only then
    it's actually evaluated correctly and the date spec
    observed by the user truly captures the reality
    (is not mangled as with the former variant), i.e., one-off
    inconvenience but sanity going forward

b) user upgrades pacemaker, is educated only partly so that she
rushes to have the above cron tasks rescheduled reflecting
new answers to "is given rule active at given point in time?",
but alas, forgetting she can't make these decisions based
on other version of the pacemaker in her simulated cluster
environments using VMs on her laptop

  • the above documentation shall highlight the necessity
    of version/scope lock-step for decisions about the
    "system" to stay on the safest side

Now, how to sell it to the users, that SW perfection is unattainable
in general, and that pieces that are not totally self-contained
(without effect on the outer world) -- which pacemaker as a distributed
system can hardly be (causing so many compatibility headaches) -- and
hence some level of users' cooperation and assistance and carefulness
is needed so they can live in peace and accord with such SW...

@jnpkrn jnpkrn changed the title Med: libcrmcommon/iso8601: fix train of thinkos regarding time offsets [WIP] Med: libcrmcommon/iso8601: fix train of thinkos regarding time offsets Apr 30, 2019
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 30, 2019

I've thought this through and the plan is to start a lightweight
framework written in Python consisting of relatively loosely coupled
modules (particular fixups) that would have this repertoire of methods
(approaximately):

isEligible(context) -> bool

  • detects if the fixup is applicable at all
  • here, the prerequisite is "source pacemaker deployment contains parser
    bugs fixed in this PR"

fixUp(context) -> new CIB | None

  • run the fixup, returns new, fixed CIB, or nothing if the user
    didn't opt-in for any change in due curse of the method

There would be TUI runner for the all/specified fixups:

# pcmk-update-fixup
Source version to update-and-fix from (autodetected: [2.0.1]): <enter>
Target version to update-and-fix onto (expected: [2.0.2]): <enter>
Running eligible fixup module 1/1: timedate-zone-nonspace...
 * affected rule my-rule-1 (2019-04-25T21:42:54+0200)
   - previously, it would be parsed as 2019-04-25T21:42:54+0020, and
     it is unclear whether you prefer:
     (a) literal meaning of how the rule was specified (i.e., changing
         the effective behaviour slightly, since previously it was
         treated differently as mentioned) -- this option is preferred
         if nothing from the wider system relies on the exact
         already established physical time anchoring
     (b) retaining the effect of the previous mis-parsing, in which case,
         this fixup will change that specification to literal
         2019-04-25T21:42:54+0020, since the new version no longer
         misinterprets the original specification
   - preferred option ([a]/b/A/B): <enter>
   - date/time specification used within rule my-rule-1 kept as was
Done running fixup module 1/1: timedate-zone-nonspace

No effective changes made, your CIB is already compatible with the upgrade
in respect to your declared intentions.

(capital A/B denotes "stick with the respective decision for all such
instances of the same fixup")

The framework/modules shall be reasonably usable directly as a Python
package for, e.g., higher level management tools to use.

We could then handle small issues of this kind routinelly without
thinking, while staying 100% honest to the users. It would also
help substantially more than 1000+ words written on the topic.
It could now briefly describe the problem of configuration
interpretation subtleties, and refer to pcmk-update-fixup tool
in the context of pacemaker updates.

Also note that it would only serve the ambiguous resolutions like
this very one. Straightforward, undisputable changes shall be scheduled
for a major CIB schema upgrade with automatic handling in the form an
XSL code as has been the norm so far.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented May 7, 2019

In #1756, another ISO 8601 parsing bug was discovered:
#1756 (comment)

nonetheless, it was fixed in a self-contained manner, meaning that it
doesn't depend on this in any way -- in fact, this PR looks like a more
a slow running process to figure out the directions, whether to apply
the cautious approach as proposed to stay on the safe side.

@kgaillot
Copy link
Contributor

This will be forward-ported to the current code base as part of #2737

@kgaillot kgaillot closed this Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants