Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yacc start keyword and parsetab caching #52

Closed
matthew-brett opened this issue Dec 23, 2014 · 1 comment
Closed

yacc start keyword and parsetab caching #52

matthew-brett opened this issue Dec 23, 2014 · 1 comment

Comments

@matthew-brett
Copy link

I think I found a bug with parse table caching and the start keyword to
yacc.yacc().

This script illustrates the problem:

""" Nasty behavior for start=
"""

tokens = ['FOO', 'BAR']

t_FOO = r'foo'
t_BAR = r'bar'


def p_foo_bar(p):
    ' foo_bar : FOO BAR'
    p[0] = 'have foobar'


def p_bar(p):
    ' bar : BAR '
    p[0] = 'have bar'


if __name__ == '__main__':
    import os
    from ply import lex, yacc
    lex.lex()
    # Remove written parsed tables
    if os.path.exists('parsetab.py'):
        os.unlink('parsetab.py')
    if os.path.exists('parsetab.pyc'):
        os.unlink('parsetab.pyc')
    # Generate a parser with non-default start rule
    parser = yacc.yacc(start='bar')             # no error if commenting
    assert parser.parse('bar') == 'have bar'    # out these two lines
    # Generate a parser with default start rule and another tabmodule
    parser = yacc.yacc(start='foo_bar', tabmodule='another')
    # This works
    assert parser.parse('foobar') == 'have foobar'
    # Generate a parser with default start rule and tabmodule
    parser = yacc.yacc(start='foo_bar')
    # The following failus "yacc: Syntax error at line 1, token=FOO"
    assert parser.parse('foobar') == 'have foobar'

Investigating further, I think what is happening is that the changes to the
start symbol around 3129 of yacc.py get written to the parsetab module, but
they do not change the signature of the parsetab module. When yacc.yacc()
gets called with another start symbol (or the default), it reads the lex /
yacc symbols from the relevant module or class, checks the signature, detects
that the signature matches the cached parsetab signature, and uses the cached
parstab, even though the specified (or default) start synbol differs from the
start symbol in the previoulsy written parsetab. This can be very confusing,
because the actual start symbol used will depend which one got written first.

It wasn't clear to me what the right fix for this was. I wonder whether the
yacc() should specify the start symbol in the lexer / grammar symbols before
checking the signatures, something like:

diff --git a/ply/yacc.py b/ply/yacc.py
index f70439e..e50d81c 100644
--- a/ply/yacc.py
+++ b/ply/yacc.py
@@ -3054,6 +3054,10 @@ def yacc(method='LALR', debug=yaccdebug, module=None, tabmodule=tab_module, star
     else:
         pdict = get_caller_module_dict(2)

+    # Set start symbol if specified
+    if start is not None:
+        pdict['start'] = start
+
     # Collect parser information from the dictionary
     pinfo = ParserReflect(pdict,log=errorlog)
     pinfo.get_all()

This does change the signature from pinfo.signature() so will force yacc()
to regenerate the parsetab module unless the explicit start symbol was the
same.

Thanks for a lot for Ply, I have had good use from it.

@dabeaz
Copy link
Owner

dabeaz commented Apr 16, 2015

Fixed in PLY 3.5.

@dabeaz dabeaz closed this as completed Apr 16, 2015
moloney added a commit to moloney/nibabel that referenced this issue Jun 10, 2015
No longer need to work around ply issue
(dabeaz/ply#52)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants