-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bison regression since 3.3.90 up to and including 3.7.4 #72
Comments
Wow. This is really bad. I was in the process of releasing 3.7.5, but I might have to delay this until I could take some time to study your problem more closely. Thanks a lot for the careful analysis! I would very much like to have the grammar that triggered the problem, please. Cheers! |
Here's the grammar: https://gist.github.com/bazsi/842eaa86760da2c13d65b5d0b6b1dcfa |
btw: I went back to an earlier version of syslog-ng (3.29) that doesn't yet have the 3.4.x+ updates, so this will spew a lot of warnings with the latest master. I did this to be able to bisect from 3.0.5 (which worked) up to 3.7.4. So don't be alarmed by the amount of warnings this will generate, we are using these options: |
I was looking at the patch, and although I don't really understand the internals of bison, but one thing is a bit strange. I am using hunks of the patch revert, so "-" is in fact the wrong version and "+" is the correct (but slow) one. This is the change that converted the use of an array to the use of a bitset. The single location that flips bits in our bitset:
pos[i] gets assigned "place", which is an index of some kind (e.g not just 0 or 1). Then the read side of the bitset:
If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector(). This means that whereas earlier we only turned ok to false when pos[k] was equal to the current loop variable, now we do that every time even if that was set in an earlier iteration. I might be completely at loss here, but the conversion does not seem to be equivalent. |
scrap my last comment. sorry for the noise. |
The problem is... I don't know that part very well either....
I don't see that. In the |
I think that the part: /* Store PLACE into POS_SET. PLACE might not belong to the set
of possible values for instance with useless tokens. It
would be more satisfying to eliminate the need for this
'if'. */
if (0 <= nstates + place)
bitset_set (pos_set, nstates + place); is worth worrying (I mean, the |
I think I have the problem, I just can't come up with a good solution yet. I have added some logging into these places, and I've found differences between the "good" and the "bad" versions and it indeed relates to the condition before bitset_set(). It seems that a bitset_set() is not invoked on a "place" value of -147, because "nstates + place" is indeed less than zero (nstates being 70) When executing bison in the "good" version, -147 is matched in the original "pos" array and thus skipped in pack_vector(). So the problem seems to be that we indeed need to eliminate that conditional by shifting the bitset even more. So the base "nstates" is not the right base that the code uses right now. We need to find the one that shifts "place" to be a positive value and thus eliminate the conditional. I am not sure what "place" is, for me the most negative value is -147, which does not relate to the number of states (70), the number of vectors (86), the number of entries (50). This is the access log of the "pos_set" bitset.
The problem starts at the 9th res XXX is ok, where pack_vector() considers -147 and accepts it, and if you see earlier in the log, -147 should already have been stored in pos_set, but it wasn't, since -147 + nstates is still less than zero. So my conclusion is that we indeed need to drop the conditional and do that by shifting the bitset even more to positive values so that it can encapsulate all potential bits.
|
I'm at the same place, and I have a working solution: really implementing a set of integers on top of bitsets. The problem is that the "base" (the smallest int to represent) may change, and that's where the current implementation is wrong.
I.e., we have 32915 bits of range, although we only need to go from -147 to 29. I'm looking a being less eager right now. |
Balazs, I will not have enough time to finish this today. I shall address this tomorrow in the morning. Thanks a lot for having found the bad commit, this was an immense help! Cheers! |
Thanks a lot for your efforts, really appreciate them.
I probably have my signature on file with the FSF, but I am willing to do
any paperwork necessary.
I problem with submitting the code directly is that it's not just my
copyright in there. I sold the company that originally wrote syslog-ng,
thus some of the copyright is already at its new owner. And I am not in any
decisive capacity.
Maybe if I would remove the code sections, comments and change naming and
only retain the structure of the grammar that would suffice to reproduce
the problem and would get rid off any copyrightable material.
What do you think?
…On Sat, Jan 23, 2021, 18:44 Akim Demaille ***@***.***> wrote:
Balazs, I will not have enough time to finish this today. I shall address
this tomorrow in the morning. Thanks a lot for having found the bad commit,
this was an immense help!
What I really wish I had right now is a small grammar I could add to the
test suite to check that in the future. If I can't design one, would you
agree to sign disclaimers to the FSF so that I could add your grammar to
the test suite?
Cheers!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#72 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFOK5Q6D33EK5QZBYJCBGTS3MDHBANCNFSM4WPTCNWA>
.
|
Hi Balazs
Well, I'm sorry in the first place for having let the bug in...
I need something specific to Bison.
I can't tell. And actually, the FSF is not really concerned about that: that's really the problem of the current owner. Of course, if I were them, I would definitely understand that:
But then again, I can't tell. However I confess that the most precious test case would be some parsing that does fail without the fix. I dislike tests that just look at the generated tables: some day we might want to change the implementation of these tables. They are irrelevant per-se, what matters is that the parser behaves correctly. Unfortunately reproducing your problem is already difficult when comparing the tables (none of the test case in Bison's test suite actually change with the fix!), it seems even more delicate to have a running example. So I would really like to have not just your grammar go into the test suite, but also the input that triggers the bad behavior. We still don't care about the genuine actions, they are still irrelevant. But I need something to exercise the behavior rather than the details of the implementation. |
In some rare conditions, the generated parser can be wrong when there are useless tokens. Reported by Balázs Scheidler. #72 Balázs managed to prove that the bug was introduced in commit af1c6f9 Author: Theophile Ranquet <ranquet@lrde.epita.fr> Date: Tue Nov 13 10:38:49 2012 +0000 tables: use bitsets for a performance boost Suggested by Yuri at <http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>. The improvement is marginal for most grammars, but notable for large grammars (e.g., PosgreSQL's postgre.y), and very large for the sample.y grammar submitted by Yuri in http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html. Measured with --trace=time -fsyntax-only. parser action tables postgre.y sample.y Before 0,129 (44%) 37,095 (99%) After 0,117 (42%) 5,046 (93%) * src/tables.c (pos): Replace this set of integer coded as an unsorted array of integers with... (pos_set): this bitset. which was implemented long ago, but that I installed only recently (March 2019), first published in v3.3.90. That patch introduces a bitset to represent a set of integers. It managed negative integers by using a (fixed) base (the smallest integer to represent). It avoided negative accesses into the bitset by ignoring integers smaller than the base, under the asumption that these cases correspond to useless tokens that are ignored anyway. While it turns out to be true for all the test cases in the test suite (!), Balázs' use case demonstrates that it is not always the case. So we need to be able to accept negative integers that are smaller than the current base. "Amusingly" enough, the aforementioned patch was visibly unsure about itself: /* Store PLACE into POS_SET. PLACE might not belong to the set of possible values for instance with useless tokens. It would be more satisfying to eliminate the need for this 'if'. */ This commit needs several improvements in the future: - support from bitset for bit assignment and shifts - amortized resizing of pos_set - test cases * src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test): New. Use them instead of using bitset_set and bitset_test directly.
In some rare conditions, the generated parser can be wrong when there are useless tokens. Reported by Balázs Scheidler. #72 Balázs managed to prove that the bug was introduced in commit af1c6f9 Author: Theophile Ranquet <ranquet@lrde.epita.fr> Date: Tue Nov 13 10:38:49 2012 +0000 tables: use bitsets for a performance boost Suggested by Yuri at <http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>. The improvement is marginal for most grammars, but notable for large grammars (e.g., PosgreSQL's postgre.y), and very large for the sample.y grammar submitted by Yuri in http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html. Measured with --trace=time -fsyntax-only. parser action tables postgre.y sample.y Before 0,129 (44%) 37,095 (99%) After 0,117 (42%) 5,046 (93%) * src/tables.c (pos): Replace this set of integer coded as an unsorted array of integers with... (pos_set): this bitset. which was implemented long ago, but that I installed only recently (March 2019), first published in v3.3.90. That patch introduces a bitset to represent a set of integers. It managed negative integers by using a (fixed) base (the smallest integer to represent). It avoided negative accesses into the bitset by ignoring integers smaller than the base, under the asumption that these cases correspond to useless tokens that are ignored anyway. While it turns out to be true for all the test cases in the test suite (!), Balázs' use case demonstrates that it is not always the case. So we need to be able to accept negative integers that are smaller than the current base. "Amusingly" enough, the aforementioned patch was visibly unsure about itself: /* Store PLACE into POS_SET. PLACE might not belong to the set of possible values for instance with useless tokens. It would be more satisfying to eliminate the need for this 'if'. */ This commit needs several improvements in the future: - support from bitset for bit assignment and shifts - amortized resizing of pos_set - test cases * src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test): New. Use them instead of using bitset_set and bitset_test directly.
Released in 3.7.5. |
Due to akimd/bison#72 we are going to need at least bison 3.7.5, make that explicit in the configure scripts and the grammar. Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
Due to akimd/bison#72 and issue 74 we are going to need at least bison 3.7.6, make that explicit in the configure scripts and the grammar. Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
bison minimum version had to be updated to version 3.7.6 (see reasons in [1], [2]) bison installation from source tarball was introduced in [3] We decided that platforms except the `syslog-ng-tarball` image must build syslog-ng from tarball. `syslog-ng-kira` image should be added as another exception, so Kira-syslog-ng can build syslog-ng from source as well. [1] akimd/bison#74 [2] akimd/bison#72 [3] syslog-ng#3588 Signed-off-by: Gabor Nagy <gabor.nagy@oneidentity.com>
Due to akimd/bison#72 and issue 74 we are going to need at least bison 3.7.6, make that explicit in the configure scripts and the grammar. Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
bison minimum version had to be updated to version 3.7.6 (see reasons in [1], [2]) bison installation from source tarball was introduced in [3] We decided that platforms except the `syslog-ng-tarball` image must build syslog-ng from tarball. `syslog-ng-kira` image should be added as another exception, so Kira-syslog-ng can build syslog-ng from source as well. [1] akimd/bison#74 [2] akimd/bison#72 [3] syslog-ng#3588 Signed-off-by: Gabor Nagy <gabor.nagy@oneidentity.com>
I am pretty positive that af1c6f9 (using bitsets instead of a linear lookup table) introduced a subtle regression that is difficult to trigger, but I have ran into. I have confirmed that reverting this patch "fixes" the problem for me on top of 3.7.4.
Description of the problem
Due to the way we use bison generated grammars in syslog-ng, we have a lot of unused terminals and non-terminals in our grammar (and we suppress the generated warning).
I wanted to introduce a new %token and while the very same grammar (without the %token) worked (I was using 3.7.4), simply by adding the new token (but no rules) caused the parser to behave incorrectly.
This is the only change on the grammar that triggers the incorrect behavior:
The grammar is pretty long and complex (because of the way we generate it from its source files), so I am not attaching here, but I can do that too if needed, let me however provide the information how I found the buggy patch.
What happens is that with the only change above, bison reduces using the wrong rule. Here's the debug output of the parser:
Good
As you can see, the entire input is properly consumed, reductions happen via rules:
The very same in the bad case:
In this case the input file and the grammar is the same, bison does not have the revert, e.g. it is vanilla 3.7.4:
In this case, we have two leftover tokens and reductions happen via:
Comparing bison outputs
The report file is the same in both cases (-r all), there's a difference between the generated .c file though:
I suspect there's an off-by-one error somewhere. I am yet to diagnose the patch itself, fortunately that single change is what triggers the bug and it can easily be reverted on top of the latest release.
Let me know what else you would need.
The text was updated successfully, but these errors were encountered: