Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bison regression since 3.3.90 up to and including 3.7.4 #72

Closed
bazsi opened this issue Jan 23, 2021 · 13 comments
Closed

bison regression since 3.3.90 up to and including 3.7.4 #72

bazsi opened this issue Jan 23, 2021 · 13 comments

Comments

@bazsi
Copy link

bazsi commented Jan 23, 2021

I am pretty positive that af1c6f9 (using bitsets instead of a linear lookup table) introduced a subtle regression that is difficult to trigger, but I have ran into. I have confirmed that reverting this patch "fixes" the problem for me on top of 3.7.4.

Description of the problem

Due to the way we use bison generated grammars in syslog-ng, we have a lot of unused terminals and non-terminals in our grammar (and we suppress the generated warning).

I wanted to introduce a new %token and while the very same grammar (without the %token) worked (I was using 3.7.4), simply by adding the new token (but no rules) caused the parser to behave incorrectly.

This is the only change on the grammar that triggers the incorrect behavior:

diff --git a/modules/csvparser/csvparser-grammar.ym b/modules/csvparser/csvparser-grammar.ym
index 27177db05..93304e01e 100644
--- a/modules/csvparser/csvparser-grammar.ym
+++ b/modules/csvparser/csvparser-grammar.ym
@@ -60,6 +60,7 @@
 %token KW_NULL
 %token KW_CHARS
 %token KW_STRINGS
+%token KW_DROP_INVALID
 
 %type  <ptr> parser_expr_csv
 %type   <num> parser_csv_flags

The grammar is pretty long and complex (because of the way we generate it from its source files), so I am not attaching here, but I can do that too if needed, let me however provide the information how I found the buggy patch.

What happens is that with the only change above, bison reduces using the wrong rule. Here's the debug output of the parser:

Good

Next token is token ')' (5.47: )
Reducing stack by rule 29 (line 841):
-> $$ = nterm string_list_build (5.47: )
Entering state 57
Stack now 0 1 3 6 7 12 24 40 40 40 57
Reducing stack by rule 28 (line 840):
   $1 = nterm string (5.42-46: )
   $2 = nterm string_list_build (5.47: )
-> $$ = nterm string_list_build (5.42-46: )
Entering state 57
Stack now 0 1 3 6 7 12 24 40 40 57
Reducing stack by rule 28 (line 840):
   $1 = nterm string (5.36-40: )
   $2 = nterm string_list_build (5.42-46: )
-> $$ = nterm string_list_build (5.36-46: )
Entering state 57
Stack now 0 1 3 6 7 12 24 40 57
Reducing stack by rule 28 (line 840):
   $1 = nterm string (5.30-34: )
   $2 = nterm string_list_build (5.36-46: )
-> $$ = nterm string_list_build (5.30-46: )
Entering state 42
Stack now 0 1 3 6 7 12 24 42
Reducing stack by rule 27 (line 836):
   $1 = nterm string_list_build (5.30-46: )
-> $$ = nterm string_list (5.30-46: )
Entering state 41
Stack now 0 1 3 6 7 12 24 41
Next token is token ')' (5.47: )
Shifting token ')' (5.47: )
Entering state 58
Stack now 0 1 3 6 7 12 24 41 58
Reducing stack by rule 13 (line 437):
   $1 = token KW_COLUMNS (5.22-28: )
   $2 = token '(' (5.29: )
   $3 = nterm string_list (5.30-46: )
   $4 = token ')' (5.47: )
-> $$ = nterm parser_csv_opt (5.22-47: )
Entering state 18
Stack now 0 1 3 6 7 18
Reading a token
Next token is token ')' (5.48: )
Reducing stack by rule 5 (line 424):
-> $$ = nterm parser_csv_opts (5.48: )
Entering state 30
Stack now 0 1 3 6 7 18 30
Reducing stack by rule 4 (line 423):
   $1 = nterm parser_csv_opt (5.22-47: )
   $2 = nterm parser_csv_opts (5.48: )
-> $$ = nterm parser_csv_opts (5.22-47: )
Entering state 17
Stack now 0 1 3 6 7 17
Next token is token ')' (5.48: )
Shifting token ')' (5.48: )
Entering state 29
Stack now 0 1 3 6 7 17 29
Reducing stack by rule 3 (line 413):
   $1 = token KW_CSV_PARSER (5.11-20: )
   $2 = token '(' (5.21: )
   $3 = nterm $@1 (5.22: )
   $4 = nterm parser_csv_opts (5.22-47: )
   $5 = token ')' (5.48: )
-> $$ = nterm parser_expr_csv (5.11-48: )
Entering state 4
Stack now 0 1 4
Reducing stack by rule 1 (line 408):
   $1 = token LL_CONTEXT_PARSER (5.11-20: )
   $2 = nterm parser_expr_csv (5.11-48: )
Stack now 0

As you can see, the entire input is properly consumed, reductions happen via rules:

  • Reducing stack by rule 29 (line 841):
  • Reducing stack by rule 28 (line 840):
  • Reducing stack by rule 28 (line 840):
  • Reducing stack by rule 28 (line 840):
  • Reducing stack by rule 27 (line 836):
  • Reducing stack by rule 13 (line 437):
  • Reducing stack by rule 5 (line 424):
  • Reducing stack by rule 4 (line 423):
  • Reducing stack by rule 3 (line 413):
  • Reducing stack by rule 1 (line 408):

The very same in the bad case:

In this case the input file and the grammar is the same, bison does not have the revert, e.g. it is vanilla 3.7.4:

Next token is token ')' (5.47: )
Shifting token ')' (5.47: )
Entering state 61
Stack now 0 1 3 6 7 12 24 40 40 40 61
Reducing stack by rule 9 (line 433):
   $1 = nterm string (5.30-34: )
   $2 = nterm string (5.36-40: )
   $3 = nterm string (5.42-46: )
   $4 = token ')' (5.47: )
-> $$ = nterm parser_csv_opt (5.30-47: )
Entering state 18
Stack now 0 1 3 6 7 12 24 18
Reading a token
Next token is token ')' (5.48: )
Reducing stack by rule 5 (line 424):
-> $$ = nterm parser_csv_opts (5.48: )
Entering state 30
Stack now 0 1 3 6 7 12 24 18 30
Reducing stack by rule 4 (line 423):
   $1 = nterm parser_csv_opt (5.30-47: )
   $2 = nterm parser_csv_opts (5.48: )
-> $$ = nterm parser_csv_opts (5.30-47: )
Entering state 17
Stack now 0 1 3 6 7 12 24 17
Next token is token ')' (5.48: )
Shifting token ')' (5.48: )
Entering state 29
Stack now 0 1 3 6 7 12 24 17 29
Reducing stack by rule 3 (line 413):
   $1 = nterm $@1 (5.22: )
   $2 = token KW_COLUMNS (5.22-28: )
   $3 = token '(' (5.29: )
   $4 = nterm parser_csv_opts (5.30-47: )
   $5 = token ')' (5.48: )
-> $$ = nterm parser_expr_csv (5.22-48: )
Entering state 4
Stack now 0 1 3 6 4
Reducing stack by rule 1 (line 408):
   $1 = token '(' (5.21: )
   $2 = nterm parser_expr_csv (5.22-48: )
Stack now 0 1 3
Cleanup: popping token KW_CSV_PARSER (5.11-20: )
Cleanup: popping token LL_CONTEXT_PARSER (5.11-20: )

In this case, we have two leftover tokens and reductions happen via:

  • Reducing stack by rule 9 (line 433):
  • Reducing stack by rule 5 (line 424):
  • Reducing stack by rule 4 (line 423):
  • Reducing stack by rule 3 (line 413):
  • Reducing stack by rule 1 (line 408):

Comparing bison outputs

The report file is the same in both cases (-r all), there's a difference between the generated .c file though:

$ diff -u rossz.c jo.c
--- rossz.c	2021-01-23 10:29:45.522057550 +0000
+++ jo.c	2021-01-23 10:30:38.365153239 +0000
@@ -2235,7 +2235,7 @@
 };
 #endif
 
-#define YYPACT_NINF (-148)
+#define YYPACT_NINF (-149)
 
 #define yypact_value_is_default(Yyn) \
   ((Yyn) == YYPACT_NINF)
@@ -2249,13 +2249,13 @@
      STATE-NUM.  */
 static const yytype_int16 yypact[] =
 {
-      -4,  -142,    16,  -147,  -148,  -148,  -148,   -87,  -146,  -143,
-    -142,  -141,  -140,  -139,  -138,  -137,  -136,  -136,   -87,  -148,
-    -124,  -124,  -124,  -124,  -124,  -125,  -124,  -124,  -124,  -148,
-    -148,  -148,  -148,  -135,  -148,  -124,  -134,  -133,  -148,  -132,
-    -124,  -131,  -148,  -129,  -126,  -124,  -148,  -147,  -148,  -123,
-    -122,  -121,  -148,  -148,  -148,  -148,  -148,  -148,  -148,  -124,
-    -124,  -148,  -148,  -148,  -148,  -148,  -120,  -119,  -148,  -148
+      -4,  -142,    16,  -146,  -149,  -149,  -149,   -87,  -143,  -141,
+    -140,  -139,  -138,  -137,  -136,  -135,  -134,  -148,   -87,  -149,
+    -124,  -124,  -124,  -124,  -124,  -125,  -124,  -124,  -124,  -149,
+    -149,  -149,  -149,  -133,  -149,  -124,  -132,  -131,  -149,  -130,
+    -124,  -127,  -149,  -123,  -122,  -121,  -149,  -147,  -149,  -120,
+    -119,  -118,  -149,  -149,  -149,  -149,  -149,  -149,  -149,  -124,
+    -124,  -149,  -149,  -149,  -149,  -149,  -117,  -116,  -149,  -149
 };
 
   /* YYDEFACT[STATE-NUM] -- Default reduction number in state STATE-NUM.
@@ -2275,8 +2275,8 @@
   /* YYPGOTO[NTERM-NUM].  */
 static const yytype_int16 yypgoto[] =
 {
-    -148,  -148,  -148,  -148,    29,  -148,  -148,     1,  -148,    14,
-    -148,   -20,    28,    -9,    12,  -148
+    -149,  -149,  -149,  -149,     4,  -149,  -149,   -16,  -149,     8,
+    -149,   -20,    28,    -9,    12,  -149
 };
 
   /* YYDEFGOTO[NTERM-NUM].  */
@@ -2292,10 +2292,10 @@
 static const yytype_int8 yytable[] =
 {
        8,    36,     1,    39,    40,    48,    49,    50,    51,    31,
-      31,     3,    32,    32,    43,    44,     5,     6,    20,     9,
-      40,    21,    22,    23,    24,    25,    26,    27,    28,    29,
-      52,    54,    55,    56,    58,    59,    43,    44,    60,    66,
-      40,    61,    63,    64,    65,    68,    69,    30,    62,    53,
+      31,     3,    32,    32,    43,    44,     5,    29,     6,     9,
+      40,    20,    30,    21,    22,    23,    24,    25,    26,    27,
+      28,    62,    52,    54,    55,    56,    43,    44,    58,    66,
+      40,    59,    60,    53,    61,    63,    64,    65,    68,    69,
       38,    67,    57,     0,     0,     0,     0,     0,     0,     0,
        0,     0,     0,     0,     0,     0,     0,    10,    11,    12,
       13,    14,    15,    16
@@ -2304,10 +2304,10 @@
 static const yytype_int16 yycheck[] =
 {
       87,    21,     6,    23,    24,    25,    26,    27,    28,   134,
-     134,   153,   137,   137,   161,   162,     0,   164,   164,   106,
-      40,   164,   164,   164,   164,   164,   164,   164,   164,   165,
-     165,   165,   165,   165,   165,   164,   161,   162,   164,    59,
-      60,   165,   165,   165,   165,   165,   165,    18,    47,    35,
+     134,   153,   137,   137,   161,   162,     0,   165,   164,   106,
+      40,   164,    18,   164,   164,   164,   164,   164,   164,   164,
+     164,    47,   165,   165,   165,   165,   161,   162,   165,    59,
+      60,   164,   164,    35,   165,   165,   165,   165,   165,   165,
       22,    60,    40,    -1,    -1,    -1,    -1,    -1,    -1,    -1,
       -1,    -1,    -1,    -1,    -1,    -1,    -1,   154,   155,   156,
      157,   158,   159,   160

I suspect there's an off-by-one error somewhere. I am yet to diagnose the patch itself, fortunately that single change is what triggers the bug and it can easily be reverted on top of the latest release.

Let me know what else you would need.

@akimd
Copy link
Owner

akimd commented Jan 23, 2021

Wow. This is really bad.

I was in the process of releasing 3.7.5, but I might have to delay this until I could take some time to study your problem more closely. Thanks a lot for the careful analysis!

I would very much like to have the grammar that triggered the problem, please.

Cheers!

@bazsi
Copy link
Author

bazsi commented Jan 23, 2021

@bazsi
Copy link
Author

bazsi commented Jan 23, 2021

btw: I went back to an earlier version of syslog-ng (3.29) that doesn't yet have the 3.4.x+ updates, so this will spew a lot of warnings with the latest master. I did this to be able to bisect from 3.0.5 (which worked) up to 3.7.4.

So don't be alarmed by the amount of warnings this will generate, we are using these options:
-Wno-yacc -Wno-other -Werror=conflicts-sr -Werror=conflicts-rr

@bazsi
Copy link
Author

bazsi commented Jan 23, 2021

I was looking at the patch, and although I don't really understand the internals of bison, but one thing is a bit strange. I am using hunks of the patch revert, so "-" is in fact the wrong version and "+" is the correct (but slow) one.

This is the change that converted the use of an array to the use of a bitset. The single location that flips bits in our bitset:

@@ -750,12 +746,7 @@ pack_table (void)
         /* Action of I were already coded for S.  */
         place = base[s];
 
-      /* Store PLACE into POS_SET.  PLACE might not belong to the set
-         of possible values for instance with useless tokens.  It
-         would be more satisfying to eliminate the need for this
-         'if'.  */
-      if (0 <= nstates + place)
-        bitset_set (pos_set, nstates + place);
+      pos[i] = place;
       base[order[i]] = place;
     }
 

pos[i] gets assigned "place", which is an index of some kind (e.g not just 0 or 1). Then the read side of the bitset:

@@ -665,8 +659,10 @@ pack_vector (vector_number vector)
               ok = false;
           }
 
-        if (ok && bitset_test (pos_set, nstates + res))
-          ok = false;
+        if (ok)
+          for (int k = 0; k < vector; k++)
+            if (pos[k] == res)
+              ok = false;
       }
 
       if (ok)

If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector(). This means that whereas earlier we only turned ok to false when pos[k] was equal to the current loop variable, now we do that every time even if that was set in an earlier iteration.

I might be completely at loss here, but the conversion does not seem to be equivalent.

@bazsi
Copy link
Author

bazsi commented Jan 23, 2021

scrap my last comment. sorry for the noise.

@akimd
Copy link
Owner

akimd commented Jan 23, 2021

I don't really understand the internals of bison

The problem is... I don't know that part very well either....

If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector().

I don't see that. In the + version above, we iterate over the whole pos (an array of ints) to see if res is in it. Its position in pos is irrelevant.

@akimd
Copy link
Owner

akimd commented Jan 23, 2021

I think that the part:

      /* Store PLACE into POS_SET.  PLACE might not belong to the set
         of possible values for instance with useless tokens.  It
         would be more satisfying to eliminate the need for this
         'if'.  */
      if (0 <= nstates + place)
        bitset_set (pos_set, nstates + place);

is worth worrying (I mean, the if: it should be unconditional). And it would also explain why this went unnoticed for so long but hit you: it would be related to useless tokens, that most people don't have.

@bazsi
Copy link
Author

bazsi commented Jan 23, 2021

I think I have the problem, I just can't come up with a good solution yet.

I have added some logging into these places, and I've found differences between the "good" and the "bad" versions and it indeed relates to the condition before bitset_set().

It seems that a bitset_set() is not invoked on a "place" value of -147, because "nstates + place" is indeed less than zero (nstates being 70) When executing bison in the "good" version, -147 is matched in the original "pos" array and thus skipped in pack_vector().

So the problem seems to be that we indeed need to eliminate that conditional by shifting the bitset even more. So the base "nstates" is not the right base that the code uses right now. We need to find the one that shifts "place" to be a positive value and thus eliminate the conditional.

I am not sure what "place" is, for me the most negative value is -147, which does not relate to the number of states (70), the number of vectors (86), the number of entries (50).

This is the access log of the "pos_set" bitset.

  • Whenever you see "res XXX is ok", that means that we haven't found a res value.
  • When you see "not storing: place=XXXX" that means that we don't store a value in the bitset because of the conditional.

The problem starts at the 9th res XXX is ok, where pack_vector() considers -147 and accepts it, and if you see earlier in the log, -147 should already have been stored in pos_set, but it wasn't, since -147 + nstates is still less than zero.

So my conclusion is that we indeed need to drop the conditional and do that by shifting the bitset even more to positive values so that it can encapsulate all potential bits.

res -87 is ok
not storing: place=-87
not storing: place=-87
res -20 is ok
res -125 is ok
not storing: place=-125
res -124 is ok
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
res -147 is ok
not storing: place=-147   <---- THIS IS WHERE WE DON'T STORE -147
res -4 is ok
res -142 is ok
not storing: place=-142
res 16 is ok
res -147 is ok                  <----- THIS IS WHERE THE BUG IS TRIGGERED -147 is already encountered at this point, but wasn't stored in pos_set.
not storing: place=-147
res -146 is ok
not storing: place=-146
res -143 is ok
not storing: place=-143
res -142 is ok
not storing: place=-142
res -141 is ok
not storing: place=-141
res -140 is ok
not storing: place=-140
res -139 is ok
not storing: place=-139
res -138 is ok
not storing: place=-138
res -137 is ok
not storing: place=-137
res -136 is ok
not storing: place=-136
res -136 is ok
not storing: place=-136
res -135 is ok
not storing: place=-135
res -134 is ok
not storing: place=-134
res -133 is ok
not storing: place=-133
res -132 is ok
not storing: place=-132
res -131 is ok
not storing: place=-131
res -129 is ok
not storing: place=-129
res -126 is ok
not storing: place=-126
res -124 is ok
not storing: place=-124
res -123 is ok
not storing: place=-123
res -122 is ok
not storing: place=-122
res -121 is ok
not storing: place=-121
res -120 is ok
not storing: place=-120
res -119 is ok
not storing: place=-119
res 29 is ok
res 1 is ok
res 14 is ok
res 28 is ok
res -9 is ok
res 12 is ok
res -87 is ok
not storing: place=-87
not storing: place=-87
res -20 is ok
res -125 is ok
not storing: place=-125
res -124 is ok
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
res -147 is ok
not storing: place=-147
res -4 is ok
res -142 is ok
not storing: place=-142
res 16 is ok
res -147 is ok
not storing: place=-147
res -146 is ok
not storing: place=-146
res -143 is ok
not storing: place=-143
res -142 is ok
not storing: place=-142
res -141 is ok
not storing: place=-141
res -140 is ok
not storing: place=-140
res -139 is ok
not storing: place=-139
res -138 is ok
not storing: place=-138
res -137 is ok
not storing: place=-137
res -136 is ok
not storing: place=-136
res -136 is ok
not storing: place=-136
res -135 is ok
not storing: place=-135
res -134 is ok
not storing: place=-134
res -133 is ok
not storing: place=-133
res -132 is ok
not storing: place=-132
res -131 is ok
not storing: place=-131
res -129 is ok
not storing: place=-129
res -126 is ok
not storing: place=-126
res -124 is ok
not storing: place=-124
res -123 is ok
not storing: place=-123
res -122 is ok
not storing: place=-122
res -121 is ok
not storing: place=-121
res -120 is ok
not storing: place=-120
res -119 is ok
not storing: place=-119
res 29 is ok
res 1 is ok
res 14 is ok
res 28 is ok
res -9 is ok
res 12 is ok

@akimd
Copy link
Owner

akimd commented Jan 23, 2021

I'm at the same place, and I have a working solution: really implementing a set of integers on top of bitsets. The problem is that the "base" (the smallest int to represent) may change, and that's where the current implementation is wrong.
I've also discovered that we are using way too much bits. On your grammar I have:

pos_set (32915, -147) = -147 -146 -143 -142 -141 -140 -139 -138 -137 -136 -135 -134 -133 -132 -131 -129 -126 -124 -123 -122 -121 -120 -119 -58 -20 -9 -4 1 12 14 16 28 29

I.e., we have 32915 bits of range, although we only need to go from -147 to 29. I'm looking a being less eager right now.

@akimd
Copy link
Owner

akimd commented Jan 23, 2021

Balazs, I will not have enough time to finish this today. I shall address this tomorrow in the morning. Thanks a lot for having found the bad commit, this was an immense help!
What I really wish I had right now is a small grammar I could add to the test suite to check that in the future. If I can't design one, would you agree to sign disclaimers to the FSF so that I could add your grammar to the test suite?

Cheers!

@bazsi
Copy link
Author

bazsi commented Jan 23, 2021 via email

@akimd
Copy link
Owner

akimd commented Jan 24, 2021

Hi Balazs

Thanks a lot for your efforts, really appreciate them.

Well, I'm sorry in the first place for having let the bug in...

I probably have my signature on file with the FSF,

I need something specific to Bison.

but I am willing to do any paperwork necessary. I problem with submitting the code directly is that it's not just my copyright in there. I sold the company that originally wrote syslog-ng, thus some of the copyright is already at its new owner. And I am not in any decisive capacity. Maybe if I would remove the code sections, comments and change naming and only retain the structure of the grammar that would suffice to reproduce the problem and would get rid off any copyrightable material. What do you think?

I can't tell. And actually, the FSF is not really concerned about that: that's really the problem of the current owner. Of course, if I were them, I would definitely understand that:

  • the code is GPL anyway, so it does not make any difference for them
  • none of the genuine content of the code will be used, we just need the parser (not the actions)
  • this is to ensure that a dependency of their tool (Bison) does not wreck the tool

But then again, I can't tell.

However I confess that the most precious test case would be some parsing that does fail without the fix. I dislike tests that just look at the generated tables: some day we might want to change the implementation of these tables. They are irrelevant per-se, what matters is that the parser behaves correctly. Unfortunately reproducing your problem is already difficult when comparing the tables (none of the test case in Bison's test suite actually change with the fix!), it seems even more delicate to have a running example.

So I would really like to have not just your grammar go into the test suite, but also the input that triggers the bad behavior. We still don't care about the genuine actions, they are still irrelevant. But I need something to exercise the behavior rather than the details of the implementation.

akimd added a commit that referenced this issue Jan 24, 2021
In some rare conditions, the generated parser can be wrong when there
are useless tokens.

Reported by Balázs Scheidler.
#72

Balázs managed to prove that the bug was introduced in

    commit af1c6f9
    Author: Theophile Ranquet <ranquet@lrde.epita.fr>
    Date:   Tue Nov 13 10:38:49 2012 +0000

    tables: use bitsets for a performance boost

    Suggested by Yuri at
    <http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.

    The improvement is marginal for most grammars, but notable for large
    grammars (e.g., PosgreSQL's postgre.y), and very large for the
    sample.y grammar submitted by Yuri in
    http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
    Measured with --trace=time -fsyntax-only.

    parser action tables    postgre.y     sample.y
    Before                 0,129 (44%)  37,095 (99%)
    After                  0,117 (42%)   5,046 (93%)

    * src/tables.c (pos): Replace this set of integer coded as an unsorted
    array of integers with...
    (pos_set): this bitset.

which was implemented long ago, but that I installed only recently
(March 2019), first published in v3.3.90.

That patch introduces a bitset to represent a set of integers.  It
managed negative integers by using a (fixed) base (the smallest
integer to represent).  It avoided negative accesses into the bitset
by ignoring integers smaller than the base, under the asumption that
these cases correspond to useless tokens that are ignored anyway.
While it turns out to be true for all the test cases in the test suite
(!), Balázs' use case demonstrates that it is not always the case.

So we need to be able to accept negative integers that are smaller
than the current base.

"Amusingly" enough, the aforementioned patch was visibly unsure about
itself:

    /* Store PLACE into POS_SET.  PLACE might not belong to the set
       of possible values for instance with useless tokens.  It
       would be more satisfying to eliminate the need for this
       'if'.  */

This commit needs several improvements in the future:
- support from bitset for bit assignment and shifts
- amortized resizing of pos_set
- test cases

* src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test):
New.
Use them instead of using bitset_set and bitset_test directly.
akimd added a commit that referenced this issue Jan 24, 2021
In some rare conditions, the generated parser can be wrong when there
are useless tokens.

Reported by Balázs Scheidler.
#72

Balázs managed to prove that the bug was introduced in

    commit af1c6f9
    Author: Theophile Ranquet <ranquet@lrde.epita.fr>
    Date:   Tue Nov 13 10:38:49 2012 +0000

    tables: use bitsets for a performance boost

    Suggested by Yuri at
    <http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00000.html>.

    The improvement is marginal for most grammars, but notable for large
    grammars (e.g., PosgreSQL's postgre.y), and very large for the
    sample.y grammar submitted by Yuri in
    http://lists.gnu.org/archive/html/bison-patches/2012-01/msg00012.html.
    Measured with --trace=time -fsyntax-only.

    parser action tables    postgre.y     sample.y
    Before                 0,129 (44%)  37,095 (99%)
    After                  0,117 (42%)   5,046 (93%)

    * src/tables.c (pos): Replace this set of integer coded as an unsorted
    array of integers with...
    (pos_set): this bitset.

which was implemented long ago, but that I installed only recently
(March 2019), first published in v3.3.90.

That patch introduces a bitset to represent a set of integers.  It
managed negative integers by using a (fixed) base (the smallest
integer to represent).  It avoided negative accesses into the bitset
by ignoring integers smaller than the base, under the asumption that
these cases correspond to useless tokens that are ignored anyway.
While it turns out to be true for all the test cases in the test suite
(!), Balázs' use case demonstrates that it is not always the case.

So we need to be able to accept negative integers that are smaller
than the current base.

"Amusingly" enough, the aforementioned patch was visibly unsure about
itself:

    /* Store PLACE into POS_SET.  PLACE might not belong to the set
       of possible values for instance with useless tokens.  It
       would be more satisfying to eliminate the need for this
       'if'.  */

This commit needs several improvements in the future:
- support from bitset for bit assignment and shifts
- amortized resizing of pos_set
- test cases

* src/tables.c (pos_set_base, pos_set_dump, pos_set_set, pos_set_test):
New.
Use them instead of using bitset_set and bitset_test directly.
@akimd
Copy link
Owner

akimd commented Jan 24, 2021

Released in 3.7.5.

@akimd akimd closed this as completed Jan 24, 2021
@bazsi bazsi mentioned this issue Feb 11, 2021
MrAnno pushed a commit to MrAnno/syslog-ng that referenced this issue Feb 19, 2021
Due to akimd/bison#72 we are going to need at
least bison 3.7.5, make that explicit in the configure scripts and
the grammar.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
bazsi added a commit to bazsi/syslog-ng that referenced this issue Apr 22, 2021
Due to akimd/bison#72 and issue 74 we are
going to need at least bison 3.7.6, make that explicit in the
configure scripts and the grammar.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
gaborznagy pushed a commit to gaborznagy/syslog-ng that referenced this issue Apr 23, 2021
bison minimum version had to be updated to version 3.7.6 (see reasons in [1], [2])
bison installation from source tarball was introduced in [3]

We decided that platforms except the `syslog-ng-tarball` image must build syslog-ng from tarball.
`syslog-ng-kira` image should be added as another exception, so Kira-syslog-ng can build syslog-ng from source as well.

[1] akimd/bison#74
[2] akimd/bison#72
[3] syslog-ng#3588

Signed-off-by: Gabor Nagy <gabor.nagy@oneidentity.com>
bazsi added a commit to bazsi/syslog-ng that referenced this issue Apr 29, 2021
Due to akimd/bison#72 and issue 74 we are
going to need at least bison 3.7.6, make that explicit in the
configure scripts and the grammar.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
bazsi pushed a commit to bazsi/syslog-ng that referenced this issue Apr 29, 2021
bison minimum version had to be updated to version 3.7.6 (see reasons in [1], [2])
bison installation from source tarball was introduced in [3]

We decided that platforms except the `syslog-ng-tarball` image must build syslog-ng from tarball.
`syslog-ng-kira` image should be added as another exception, so Kira-syslog-ng can build syslog-ng from source as well.

[1] akimd/bison#74
[2] akimd/bison#72
[3] syslog-ng#3588

Signed-off-by: Gabor Nagy <gabor.nagy@oneidentity.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants