New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Med: libcrmcommon/iso8601: fix train of thinkos regarding time offsets #1759
Conversation
This weigh in on the side speaking for dropping undertested and |
(Hmm, any reason Travis hasn't kicked in?) |
@@ -240,3 +240,7 @@ Duration: -1 month | |||
Duration ends at: 2009-02-28 00:00:00Z | |||
=#=#=#= End test: 2009-03-31 - 1 Month - OK (0) =#=#=#= | |||
* Passed: iso8601 - 2009-03-31 - 1 Month | |||
=#=#=#= Begin test: Test correct plus-sign at timezone parsing =#=#=#= | |||
Date: 1556221374 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jenkins is getting 1556228574 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it must be even more funny :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, time zone qualified date -> seconds since epoch (UTC) shall
be a very deterministic constant, without any other factor involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it's even more broken than expected...
Will fix that as well, but would shift ditching that code to
a mid-term goal rather than long-term.
It also implements unsolicited superset of the ISO 8601 formats
(just as it works with a superset of XML, relying on implementation
details of libxml we've hit earlier).
The first run after access restrictions are enabled needs approval -- which has been done, so the next push should trigger it. |
28985e7
to
4d8d232
Compare
On 26/04/19 14:19 +0000, Ken Gaillot wrote:
> (Hmm, any reason Travis hasn't kicked in?)
The first run after access restrictions are enabled needs approval
-- which has been done, so the next push should trigger it.
Still not coming in and grabbing the CI tasks?
…--
Jan (Poki)
|
I just submitted a new PR, and it took longer than normal, but both CI deployments started. Interestingly, for this PR, travis did run, it just didn't put the link on this page: https://travis-ci.org/ClusterLabs/pacemaker/builds/525063666 It's possible it's just due to the situation at the time the PR was created. Maybe try deleting and re-creating to verify. |
4d8d232
to
2121307
Compare
It has caught up. |
2121307
to
2b2a3f7
Compare
Fixed the (typical) problem |
2b2a3f7
to
c9551b0
Compare
c9551b0
to
2e18e00
Compare
It's not very clear why the whole parsing is so diverted from the actual ISO 8601 standard prescribed syntax, but multiple problems were discovered when testing something as basic as 2019-04-25T21:42:54+0200: 1/ the proclaimed ISO 8601 date/time format parser is schizophrenic in a sense it both (a) supports undocumented extensions above what the standard defines (like a tolerance to white space at various places within the format string, like mentioned in 2/), i.e., implementing a support for a superset of ISO 8601, and at the same time, (b) doesn't implement support for the mandated syntactical constructions (said optional support for white spaces is turned into requiring them where are undue per the standard, meaning that the above would be treated as "2019-04-25T21:42:54 of whatever currently configured timezone"), i.e., effectively implementing a subset of the standard 2/ with 1/ fixed, that is, at least at the location affecting parsing of the above example input, '+' sign (e.g. +0200) would consume artificially a position where digit was expected (the above would be garbled and parsed as if it effectively was +0020) These may, apparently, derail the carefully tweaked execution plans of the users. * * * The identified problems are fixed while retaining the original tolerance (against the standard), and a respective tests to capture any regressions relevant to these fixes is added (two variants of the same test, actually, so as to limit the possiblity of the problem marked 1/ being masked -- simply because the runner's timezone matches that marked with an offset, as happened in the first iteration of this fix, thanks to CI/Ken for noticing it :-) Sooner rather than later, we should swap this highly problematic chunk of code for some offloaded solution, Glib's GDateTime: https://developer.gnome.org/glib/stable/glib-GDateTime.html#g-date-time-new-from-iso8601 being a prime target. Such a change will likely need to be accompanied some schema upgrade procedures as well, so as not keep users on their own with configurations that once worked well (only because of an unexpectedly botched and untested implementation).
2e18e00
to
bbd54bf
Compare
i.e., "you can rely on the (mis)parsing to be deterministic if you (a) use the very same parsing code = same version, and (b) you don't deliberately switch system's time zone settings unless always specify the timezone offsets[*] (which is rather self-understood, hopefully)". Wrapped in more diplomatic wordings towards the users, indeed. We need to be able to fix some apparent faults (see the previous "fix train of thinkos regarding time offsets" commit). Downright incompatible changes like migrating to an external date/time parser and/or dropping the support of unintendedly lax approach to ISO 8601 formats will need to be accompanied with the respective schema migration routines anyway (but the discipline established with this commit will even more required at that time, since inputs to crm_rule are user-private part of the configuration we have no way to mark stale/sanitize automatically). [*] note that, sadly, 1/ point in the said preceding commit would fail that assumption previously, but hopefully, it was the last such timezone-related problem of this kind (fingers crosed) P.S. It's no accident that "hopefully" was used twice above.
341be88
to
d5ebc55
Compare
Sadly, despite the standard references throughout, the outputs were ostentatiously ignoring that and were producing non-conforming outputs instead. This is now tackled, and we also care to provide well-formed inputs to cts-cli tests that are very related (note that -E switch to iso8601 are meant for testing only, hence shall be rather un-publicized at some point). There was some fallout in terms of changed expected outputs for cts-scheduler as well.
d5ebc55
to
ecc689f
Compare
" versions of pacemaker, possibly -- depending on the" | ||
" permanency of the arising decisions -- mandating" | ||
" any such to be re-evaluated anew upon an update).", | ||
pcmk_option_paragraph}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this note. It engenders more confusion than clarity, and man pages / help output is not the right place to document release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree it sounds maybe too harsh.
Do you suggest to document the expected problems arising from
real parser bugs elsewhere?
The "closure" makes it as least good-in-bad, so shall be documented
somewhere...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugs are documented in the change log. We simply don't need to warn users about fixed bugs. There are at least hundreds of potential problems with mixed-version clusters due to bugs that are fixed in the newer version. The documentation already strongly recommends that mixed versions be used only during a brief rolling upgrade window to minimize the chance for problems. Also, the fact that no one's ever reported the bug suggests that it's a rare usage; the average user shouldn't be bothered with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we should educate users to use the software
correctly wrt. slight changes in behaviour and precomputed assumptions
-- it's much safer to re-evaluate some decision making variables in the
game after each update than to stay totally static there.
Especially when/if the parser is to be changed completely one
day -- CIB is mostly fine, it's the state we have access to and
can reconcile the situation with it. But we don't have the
visibility into all the surrounding scripting etc. so that's why
I think it'd be wise to lead an example of very dynamic adaptations
whereever possible.
It's not limited to date/time parsing as such, it shall be a very
general guidance, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very contrived, next to bogus example, to demonstrate what I mean:
-
user has a home-grown algorithm to allocate new cron tasks (local
only) it is given as uniformly throughout the day as possible (for
some reason it is critical for the overall business) -
as a last step, this algoritm consults
crm_rule
and crossmatches
selected ruleid
s against the time currently predestined for the
new task to be scheduled -
so far so good, now two branches
a) user upgrades pacemaker, and suddenly, the rules containing
a timezone without space gets the offset with moved precision
from units of hours to tens of minutes (as we have just seen)
-
only documentation may help, since it's the new pacemaker not
fitting the old state of affairs (due to a slight behaviour
altering) -
this cannot be solved with schema upgrade, since if we would
leave the buggy behaviour as was and just did the contra-shift,
it would totally confuse users when observed afterwards
("you know, it's not exactly as you see it, since the SW
contained a bug, so we mitigated that with this inverse
action, and you are now mandated to parse the timezone
in your brain just as the SW does -- at least we think
it's what you actually wanted, because the zone +00:20
is nowhere to be found") -- it feels a far better choice
to just fix the parsing, even if at the same expense as
above of slightly changing the behaviour, since only then
it's actually evaluated correctly and the date spec
observed by the user truly captures the reality
(is not mangled as with the former variant), i.e., one-off
inconvenience but sanity going forward
b) user upgrades pacemaker, is educated only partly so that she
rushes to have the above cron tasks rescheduled reflecting
new answers to "is given rule active at given point in time?",
but alas, forgetting she can't make these decisions based
on other version of the pacemaker in her simulated cluster
environments using VMs on her laptop
- the above documentation shall highlight the necessity
of version/scope lock-step for decisions about the
"system" to stay on the safest side
Now, how to sell it to the users, that SW perfection is unattainable
in general, and that pieces that are not totally self-contained
(without effect on the outer world) -- which pacemaker as a distributed
system can hardly be (causing so many compatibility headaches) -- and
hence some level of users' cooperation and assistance and carefulness
is needed so they can live in peace and accord with such SW...
I've thought this through and the plan is to start a lightweight isEligible(context) -> bool
fixUp(context) -> new CIB | None
There would be TUI runner for the all/specified fixups:
(capital The framework/modules shall be reasonably usable directly as a Python We could then handle small issues of this kind routinelly without Also note that it would only serve the ambiguous resolutions like |
In #1756, another ISO 8601 parsing bug was discovered: nonetheless, it was fixed in a self-contained manner, meaning that it |
This will be forward-ported to the current code base as part of #2737 |
Previously, the above example would be parsed as (+0020), which may
apparently derail the carefully polished execution plans of the users.
Also add a respective test to capture any regression.