Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please make the parsetab.py files reproducible #79

Closed
lamby opened this Issue Sep 19, 2015 · 11 comments

Comments

Projects
None yet
6 participants
@lamby
Copy link

commented Sep 19, 2015

Whilst working on the Debian reproducible builds effort, I noticed that python-ply generates parsetab.py files with non-determinstic contents.

I first had a quick go at fixing this by adding a bunch of sorts inside write_table but looking deeper into the data structures it appears that "more" determinism is needed to ensure that the states are consistently numbered across builds. There are whole bunch of iterations over dict's items() throughout the table generation which—as you are no doubt aware—are non-determinstic. I'm sure some of these are harmless from a reproducibility point of view, so simply adding sorted() everywhere would be a total mess.

Of course, one solution would be to wontfix this and simply decree that these files are non-determistc.. but that would require that Debian etc. would not be able to ship these useful optimisations as they would render the package unreproducible.

@joeedh

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2015

It might not be a bad idea to use an ordered dict class. There was even a version of ply where I had to do this (it was relying on dict keys having a consistent order).

@LocutusOfBorg

This comment has been minimized.

Copy link

commented May 10, 2016

Hi, ping? :)

@dabeaz

This comment has been minimized.

Copy link
Owner

commented Aug 30, 2016

Dictionaries are used all over the place in yacc. Not sure I can easily fix this or not.

@refi64

This comment has been minimized.

Copy link

commented Aug 30, 2016

@dabeaz Could you just do sorted(dct.items()) instead of dct.items()?

@johnyf

This comment has been minimized.

Copy link

commented Aug 30, 2016

Would such a change affect performance though? Should it be for the debugging implementation only, and not any optimized ones?

@refi64

This comment has been minimized.

Copy link

commented Aug 30, 2016

@johnyf I think that would only be run when parsetab.py is being written, though, so it would be only for the first run.

@dabeaz

This comment has been minimized.

Copy link
Owner

commented Aug 30, 2016

Suggest using yacc(write_tables=False) to disable the creation of the parsetab.py file entirely.

Background: The whole reason that parsetab.py file is there in the first place is that the first version of PLY was written on a 200Mhz PC and the parser table creation was slow. To make startup faster on subsequent runs, parsetab.py was written and used as a kind of cache. I'm not even sure it matters now. For one, machines are a LOT faster. Also, PLY switched over to a different, much faster, algorithm ages ago (generating the tables for C with some 353 states takes about 0.3s on my current machine).

Honestly, I've been thinking about ditching all of this parsetab.py/lextab.py business entirely in some future version.

@lamby

This comment has been minimized.

Copy link
Author

commented Aug 30, 2016

(Not all dicts would have to be changed, mind you...)

@johnyf

This comment has been minimized.

Copy link

commented Aug 30, 2016

In my experience with promela, the parsetab.py is useful and does accelerate runs. I would prefer that it remains available as functionality. Quantifying the difference would require collecting measurements from running a representative collection of large parsers.

@dabeaz

This comment has been minimized.

Copy link
Owner

commented Aug 30, 2016

Ditching parsetab.py/lextab.py is not something I'm likely to do in the context of PLY. I have a more modern project in the works (a successor to PLY) that will ditch the generated files however.

@LocutusOfBorg

This comment has been minimized.

Copy link

commented Jan 23, 2017

Hello, FWIW I switched the Debian packaging to the new release 1.0.0
https://github.com/viraptor/phply that seems to be the new one published on pypi

https://pypi.debian.net/phply
https://pypi.python.org/pypi/phply

So, I consider this issue "fixed" for my packaging needs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.