Skip to content

Commit

Permalink
yysyntax_error: fix for consistent error with lookahead.
Browse files Browse the repository at this point in the history
* NEWS (2.5): Document.
* data/yacc.c (yysyntax_error): In a verbose syntax error
message while in a consistent state with a default action (which
must be an error action given that yysyntax_error is being
invoked), continue to drop the expected token list, but don't
drop the unexpected token unless there actually is no lookahead.
Moreover, handle that internally instead of returning 1 to tell
the caller to do it.  With that meaning of 1 gone, renumber
return codes more usefully.
(yyparse, yypush_parse): Update yysyntax_error usage.  Most
importantly, set yytoken to YYEMPTY when there's no lookahead.
* data/glr.c (yyreportSyntaxError): As in yacc.c, don't drop the
unexpected token unless there actually is no lookahead.
* data/lalr1.cc (yy::parser::parse): If there's no lookahead,
pass yyempty_ not yyla.type to yysyntax_error_.
(yy::parser::yysyntax_error_): Again, don't drop the unexpected
token unless there actually is no lookahead.
* data/lalr1.java (YYParser::parse): If there's no lookahead,
set yytoken to yyempty_ before invoking yysyntax_error.
(YYParser::yysyntax_error): Again, don't drop the unexpected
token unless there actually is no lookahead.
* tests/conflicts.at (parse.error=verbose and consistent
errors): Extend test group to further reveal how the previous
use of the simple "syntax error" message was too general.  Test
yacc.c, glr.c, lalr1.cc, and lalr1.java.  No longer an expected
failure.
* tests/java.at (AT_JAVA_COMPILE, AT_JAVA_PARSER_CHECK): Move
to...
* tests/local.at: ... here.
(_AT_BISON_OPTION_PUSHDEFS): Push AT_SKEL_JAVA_IF definition.
(AT_BISON_OPTION_POPDEFS): Pop it.
(AT_FULL_COMPILE): Extend to handle Java.
  • Loading branch information
Joel E. Denny committed Nov 7, 2010
1 parent 25a648d commit d2060f0
Show file tree
Hide file tree
Showing 11 changed files with 785 additions and 449 deletions.
36 changes: 36 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,39 @@
2010-11-07 Joel E. Denny <jdenny@clemson.edu>

yysyntax_error: fix for consistent error with lookahead.
* NEWS (2.5): Document.
* data/yacc.c (yysyntax_error): In a verbose syntax error
message while in a consistent state with a default action (which
must be an error action given that yysyntax_error is being
invoked), continue to drop the expected token list, but don't
drop the unexpected token unless there actually is no lookahead.
Moreover, handle that internally instead of returning 1 to tell
the caller to do it. With that meaning of 1 gone, renumber
return codes more usefully.
(yyparse, yypush_parse): Update yysyntax_error usage. Most
importantly, set yytoken to YYEMPTY when there's no lookahead.
* data/glr.c (yyreportSyntaxError): As in yacc.c, don't drop the
unexpected token unless there actually is no lookahead.
* data/lalr1.cc (yy::parser::parse): If there's no lookahead,
pass yyempty_ not yyla.type to yysyntax_error_.
(yy::parser::yysyntax_error_): Again, don't drop the unexpected
token unless there actually is no lookahead.
* data/lalr1.java (YYParser::parse): If there's no lookahead,
set yytoken to yyempty_ before invoking yysyntax_error.
(YYParser::yysyntax_error): Again, don't drop the unexpected
token unless there actually is no lookahead.
* tests/conflicts.at (parse.error=verbose and consistent
errors): Extend test group to further reveal how the previous
use of the simple "syntax error" message was too general. Test
yacc.c, glr.c, lalr1.cc, and lalr1.java. No longer an expected
failure.
* tests/java.at (AT_JAVA_COMPILE, AT_JAVA_PARSER_CHECK): Move
to...
* tests/local.at: ... here.
(_AT_BISON_OPTION_PUSHDEFS): Push AT_SKEL_JAVA_IF definition.
(AT_BISON_OPTION_POPDEFS): Pop it.
(AT_FULL_COMPILE): Extend to handle Java.

2010-11-07 Joel E. Denny <jdenny@clemson.edu>

yysyntax_error: more preparation for readability of next patch.
Expand Down
43 changes: 35 additions & 8 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -223,14 +223,41 @@ Bison News
Bison now warns when a character literal is not of length one. In
some future release, Bison will report an error instead.

** Verbose error messages fixed for nonassociative tokens.

When %error-verbose is specified, syntax error messages produced by
the generated parser include the unexpected token as well as a list of
expected tokens. Previously, this list erroneously included tokens
that would actually induce a syntax error because conflicts for them
were resolved with %nonassoc. Such tokens are now properly omitted
from the list.
** Verbose syntax error message fixes:

When %error-verbose or `#define YYERROR_VERBOSE' is specified, syntax
error messages produced by the generated parser include the unexpected
token as well as a list of expected tokens. The effect of %nonassoc
on these verbose messages has been corrected in two ways, but
additional fixes are still being implemented:

*** When %nonassoc is used, there can exist parser states that accept no
tokens, and so the parser does not always require a lookahead token
in order to detect a syntax error. Because no unexpected token or
expected tokens can then be reported, the verbose syntax error
message described above is suppressed, and the parser instead
reports the simpler message, "syntax error". Previously, this
suppression was sometimes erroneously triggered by %nonassoc when a
lookahead was actually required. Now verbose messages are
suppressed only when all previous lookaheads have already been
shifted or discarded.

*** Previously, the list of expected tokens erroneously included tokens
that would actually induce a syntax error because conflicts for them
were resolved with %nonassoc in the current parser state. Such
tokens are now properly omitted from the list.

*** Expected token lists are still often wrong due to state merging
(from LALR or IELR) and default reductions, which can both add and
subtract valid tokens. Canonical LR almost completely fixes this
problem by eliminating state merging and default reductions.
However, there is one minor problem left even when using canonical
LR and even after the fixes above. That is, if the resolution of a
conflict with %nonassoc appears in a later parser state than the one
at which some syntax error is discovered, the conflicted token is
still erroneously included in the expected token list. We are
currently working on a fix to eliminate this problem and to
eliminate the need for canonical LR.

** Destructor calls fixed for lookaheads altered in semantic actions.

Expand Down
52 changes: 36 additions & 16 deletions data/glr.c
Original file line number Diff line number Diff line change
Expand Up @@ -2081,11 +2081,7 @@ yyreportSyntaxError (yyGLRStack* yystackp]b4_user_formals[)
#if ! YYERROR_VERBOSE
yyerror (]b4_lyyerror_args[YY_("syntax error"));
#else
int yyn;
yyn = yypact[yystackp->yytops.yystates[0]->yylrState];
if (YYPACT_NINF < yyn && yyn <= YYLAST)
{
yySymbol yytoken = YYTRANSLATE (yychar);
yySymbol yytoken = yychar == YYEMPTY ? YYEMPTY : YYTRANSLATE (yychar);
size_t yysize0 = yytnamerr (NULL, yytokenName (yytoken));
size_t yysize = yysize0;
size_t yysize1;
Expand All @@ -2096,23 +2092,47 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
const char *yyformat = 0;
/* Arguments of yyformat. */
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
/* Number of reported tokens (one for the "unexpected", one per
"expected"). */
int yycount = 0;

/* There are many possibilities here to consider:
- If this state is a consistent state with a default action, then
the only way this function was invoked is if the default action
is an error action. In that case, don't check for expected
tokens because there are none.
- The only way there can be no lookahead present (in yychar) is if
this state is a consistent state with a default action. Thus,
detecting the absence of a lookahead is sufficient to determine
that there is no unexpected or expected token to report. In that
case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this state is a
consistent state with a default action. There might have been a
previous inconsistent state, consistent state with a non-default
action, or user semantic action that manipulated yychar.
- Of course, the expected token list depends on states to have
correct lookahead information, and it depends on the parser not
to perform extra reductions after fetching a lookahead from the
scanner and before detecting a syntax error. Thus, state merging
(from LALR or IELR) and default reductions corrupt the expected
token list. However, the list is correct for canonical LR with
one exception: it will still contain any token that will not be
accepted due to an error action in a later state.
*/
if (yytoken != YYEMPTY)
{
int yyn = yypact[yystackp->yytops.yystates[0]->yylrState];
yyarg[yycount++] = yytokenName (yytoken);
if (!yypact_value_is_default (yyn))
{
/* Start YYX at -YYN if negative to avoid negative indexes in
YYCHECK. In other words, skip the first -YYN actions for this
state because they are default actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;

/* Stay within bounds of both yycheck and yytname. */
int yychecklim = YYLAST - yyn + 1;
int yyxend = yychecklim < YYNTOKENS ? yychecklim : YYNTOKENS;

/* Number of reported tokens (one for the "unexpected", one per
"expected"). */
int yycount = 0;
int yyx;

yyarg[yycount++] = yytokenName (yytoken);

for (yyx = yyxbegin; yyx < yyxend; ++yyx)
if (yycheck[yyx + yyn] == yyx && yyx != YYTERROR
&& !yytable_value_is_error (yytable[yyx + yyn]))
Expand All @@ -2128,13 +2148,16 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
yysize_overflow |= yysize1 < yysize;
yysize = yysize1;
}
}
}

switch (yycount)
{
#define YYCASE_(N, S) \
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
Expand Down Expand Up @@ -2175,9 +2198,6 @@ if (YYPACT_NINF < yyn && yyn <= YYLAST)
yyerror (]b4_lyyerror_args[YY_("syntax error"));
yyMemoryExhausted (yystackp);
}
}
else
yyerror (]b4_lyyerror_args[YY_("syntax error"));
#endif /* YYERROR_VERBOSE */
yynerrs += 1;
}
Expand Down
60 changes: 44 additions & 16 deletions data/lalr1.cc
Original file line number Diff line number Diff line change
Expand Up @@ -862,7 +862,8 @@ m4_ifdef([b4_lex_param], [, ]b4_lex_param)));])[
{
++yynerrs_;
error (]b4_args(b4_locations_if([yyla.location]),
[[yysyntax_error_ (yystack_[0].state, yyla.type)]])[);
[[yysyntax_error_ (yystack_[0].state,
yyempty ? yyempty_ : yyla.type)]])[);
}

]b4_locations_if([[
Expand Down Expand Up @@ -979,26 +980,52 @@ b4_error_verbose_if([state_type yystate, int yytoken],
[int, int])[)
{]b4_error_verbose_if([[
std::string yyres;
int yyn = yypact_[yystate];
if (yypact_ninf_ < yyn && yyn <= yylast_)
// Number of reported tokens (one for the "unexpected", one per
// "expected").
size_t yycount = 0;
// Its maximum.
enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
// Arguments of yyformat.
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];

/* There are many possibilities here to consider:
- If this state is a consistent state with a default action, then
the only way this function was invoked is if the default action
is an error action. In that case, don't check for expected
tokens because there are none.
- The only way there can be no lookahead present (in yytoken) is
if this state is a consistent state with a default action.
Thus, detecting the absence of a lookahead is sufficient to
determine that there is no unexpected or expected token to
report. In that case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this state is
a consistent state with a default action. There might have
been a previous inconsistent state, consistent state with a
non-default action, or user semantic action that manipulated
yyla. (However, yyla is currently not documented for users.)
- Of course, the expected token list depends on states to have
correct lookahead information, and it depends on the parser not
to perform extra reductions after fetching a lookahead from the
scanner and before detecting a syntax error. Thus, state
merging (from LALR or IELR) and default reductions corrupt the
expected token list. However, the list is correct for
canonical LR with one exception: it will still contain any
token that will not be accepted due to an error action in a
later state.
*/
if (yytoken != yyempty_)
{
yyarg[yycount++] = yytname_[yytoken];
int yyn = yypact_[yystate];
if (!yy_pact_value_is_default_ (yyn))
{
/* Start YYX at -YYN if negative to avoid negative indexes in
YYCHECK. In other words, skip the first -YYN actions for
this state because they are default actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;

/* Stay within bounds of both yycheck and yytname. */
int yychecklim = yylast_ - yyn + 1;
int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;

// Number of reported tokens (one for the "unexpected", one per
// "expected").
size_t yycount = 0;
// Its maximum.
enum { YYERROR_VERBOSE_ARGS_MAXIMUM = 5 };
// Arguments of yyformat.
char const *yyarg[YYERROR_VERBOSE_ARGS_MAXIMUM];
yyarg[yycount++] = yytname_[yytoken];
for (int yyx = yyxbegin; yyx < yyxend; ++yyx)
if (yycheck_[yyx + yyn] == yyx && yyx != yyterror_
&& !yy_table_value_is_error_ (yytable_[yyx + yyn]))
Expand All @@ -1011,6 +1038,8 @@ b4_error_verbose_if([state_type yystate, int yytoken],
else
yyarg[yycount++] = yytname_[yyx];
}
}
}

char const* yyformat = 0;
switch (yycount)
Expand All @@ -1019,13 +1048,15 @@ b4_error_verbose_if([state_type yystate, int yytoken],
case N: \
yyformat = S; \
break
YYCASE_(0, YY_("syntax error"));
YYCASE_(1, YY_("syntax error, unexpected %s"));
YYCASE_(2, YY_("syntax error, unexpected %s, expecting %s"));
YYCASE_(3, YY_("syntax error, unexpected %s, expecting %s or %s"));
YYCASE_(4, YY_("syntax error, unexpected %s, expecting %s or %s or %s"));
YYCASE_(5, YY_("syntax error, unexpected %s, expecting %s or %s or %s or %s"));
#undef YYCASE_
}

// Argument number.
size_t yyi = 0;
for (char const* yyp = yyformat; *yyp; ++yyp)
Expand All @@ -1036,9 +1067,6 @@ b4_error_verbose_if([state_type yystate, int yytoken],
}
else
yyres += *yyp;
}
else
yyres = YY_("syntax error");
return yyres;]], [[
return YY_("syntax error");]])[
}
Expand Down
53 changes: 43 additions & 10 deletions data/lalr1.java
Original file line number Diff line number Diff line change
Expand Up @@ -627,6 +627,8 @@ else if ((yyn = yytable_[yyn]) <= 0)
if (yyerrstatus_ == 0)
{
++yynerrs_;
if (yychar == yyempty_)
yytoken = yyempty_;
yyerror (]b4_locations_if([yylloc, ])[yysyntax_error (yystate, yytoken));
}
Expand Down Expand Up @@ -727,17 +729,52 @@ private String yysyntax_error (int yystate, int tok)
{]b4_error_verbose_if([[
if (yyErrorVerbose)
{
int yyn = yypact_[yystate];
if (yypact_ninf_ < yyn && yyn <= yylast_)
/* There are many possibilities here to consider:
- Assume YYFAIL is not used. It's too flawed to consider.
See
<http://lists.gnu.org/archive/html/bison-patches/2009-12/msg00024.html>
for details. YYERROR is fine as it does not invoke this
function.
- If this state is a consistent state with a default action,
then the only way this function was invoked is if the
default action is an error action. In that case, don't
check for expected tokens because there are none.
- The only way there can be no lookahead present (in tok) is
if this state is a consistent state with a default action.
Thus, detecting the absence of a lookahead is sufficient to
determine that there is no unexpected or expected token to
report. In that case, just report a simple "syntax error".
- Don't assume there isn't a lookahead just because this
state is a consistent state with a default action. There
might have been a previous inconsistent state, consistent
state with a non-default action, or user semantic action
that manipulated yychar. (However, yychar is currently out
of scope during semantic actions.)
- Of course, the expected token list depends on states to
have correct lookahead information, and it depends on the
parser not to perform extra reductions after fetching a
lookahead from the scanner and before detecting a syntax
error. Thus, state merging (from LALR or IELR) and default
reductions corrupt the expected token list. However, the
list is correct for canonical LR with one exception: it
will still contain any token that will not be accepted due
to an error action in a later state.
*/
if (tok != yyempty_)
{
StringBuffer res;
// FIXME: This method of building the message is not compatible
// with internationalization.
StringBuffer res =
new StringBuffer ("syntax error, unexpected ");
res.append (yytnamerr_ (yytname_[tok]));
int yyn = yypact_[yystate];
if (!yy_pact_value_is_default_ (yyn))
{
/* Start YYX at -YYN if negative to avoid negative
indexes in YYCHECK. In other words, skip the first
-YYN actions for this state because they are default
actions. */
int yyxbegin = yyn < 0 ? -yyn : 0;
/* Stay within bounds of both yycheck and yytname. */
int yychecklim = yylast_ - yyn + 1;
int yyxend = yychecklim < yyntokens_ ? yychecklim : yyntokens_;
Expand All @@ -746,11 +783,6 @@ private String yysyntax_error (int yystate, int tok)
if (yycheck_[x + yyn] == x && x != yyterror_
&& !yy_table_value_is_error_ (yytable_[x + yyn]))
++count;
// FIXME: This method of building the message is not compatible
// with internationalization.
res = new StringBuffer ("syntax error, unexpected ");
res.append (yytnamerr_ (yytname_[tok]));
if (count < 5)
{
count = 0;
Expand All @@ -762,6 +794,7 @@ private String yysyntax_error (int yystate, int tok)
res.append (yytnamerr_ (yytname_[x]));
}
}
}
return res.toString ();
}
}
Expand Down

0 comments on commit d2060f0

Please sign in to comment.