Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python performance issue while parsing ECMAScript #1243

Open
renatahodovan opened this issue Aug 8, 2016 · 5 comments
Open

Python performance issue while parsing ECMAScript #1243

renatahodovan opened this issue Aug 8, 2016 · 5 comments

Comments

@renatahodovan
Copy link
Contributor

I've tried to parse one of the githubs JS backend files with the "official" ECMAScript grammar. Surprisingly, the parse took only 2 minutes in Java but more than 4 hours with Python3 (even with the recently added performance fix too).
The test case is undoubtedly large and - as @ericvergnaud said earlier - Python is 20 - 30 times slower than Java. But in this case the slow-down factor is more than 120 so there might be something else in the background too.
I'm trying to figure out the reason but also reporting the issue hoping that someone else can solve it faster.

@parrt
Copy link
Member

parrt commented Nov 25, 2016

@renatahodovan could this be related to #994 ? Are you trying to parse very large expressions with a left recursive rule perhaps? Or maybe it is a hash code issue that JavaScript had #1408? @ericvergnaud any ideas?

@renatahodovan
Copy link
Contributor Author

@parrt Yes, it might be related to recursive rules - it was my first thought too -, although I don't know how to fix it. On the other hand, Python has its own hash function - unlike JavaScript - so I don't think that we could outperform that with a new implementation.
I've attached a slightly minimized version of my original test case. It's now ~20KB of valid ECMAScript code. The parsing takes ~50sec for me with the latest Python3 target and ~1sec with Java (not like 50 sec is unbearable but unfortunately this difference scales up and in case of a bigger real life test case 2 minutes vs. 4 hours is inconvenient).
The attachment also contains a cProfile entry which might help understanding what happens.

@parrt
Copy link
Member

parrt commented Nov 28, 2016

If Java target was fast on it, then it's not the new optimization; likely hash function issue?

@ghost
Copy link

ghost commented Aug 6, 2017

I tried parsing a 4 megabyte Javascript file with the python 3 backend on ANTLR 4.7 and its extremely slow. The python slimit Javascript parsing library (ply based) takes 15 seconds to parse the file but I hit a bug in it that I don't know how to fix, so I decided to try ANTLR. I haven't really seen anything else so things seem pretty hopeless, writing my own ECMAScript parser doesn't seem very feasible...

With respect to how much slower it is, I don't know because its still running but its taken 40 minutes of cpu time so far...

Thanks for any input.

Edit: I can't post the js file publicly because it's part of a game. I'm trying to mod it.

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Aug 7, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants