New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major speed regression for gron.awk in goawk 1.11.0+ #93
Comments
Oooh, this is good to know, thank you! And yes, the kinds of things gron does is basically the worst case for this (lots of string length / substr operations). I'm glad you can use the I've tested locally, and I guess we're essentially seeing "accidentally quadratic" behavior here (because individual string length operations are O(N) now, and there are N of them)? Gawk probably determines the string length as its splitting the input. We would have to do that in GoAWK to improve this. It might slow down others things, though, and wouldn't be trivial, which is why I didn't do that at first. Thanks for the report. Maybe making |
I'm thinking of just reverting the change from bytes to unicode outright (and hope not many people are depending on |
…ch (#95) The reason is because the new O(N) behavior of these functions was just too problematic, and caused "accidentally quadratic" issues such as: #93 A JSON-processing script that previously took 1 second now took over 8 minutes. That's not tenable. The expectation is clearly that these functions are O(N). This commit reverts b7ec795, but leaves in the new bytes tests that were added. Fixes #93
I know that this is sort of expected as it changes the logic of string-related functions to be more POSIX-compliant.
And I can confirm that adding
-b
flag makes it as fast as before.BUT! What is very interesting, GAWK is able to show very close (I would say, the same) speed for with and without
-b
switch!Did they find a way to make string functions O(1), not O(N)?
If yes, shell we use the same approach for GoAWK?
If need be I can provide more info and instructions to reproduce the results I've found.
The command I use for benchmark is
The files gron.awk and test_data/big.json can be found in https://github.com/xonixx/gron.awk repo.
The text was updated successfully, but these errors were encountered: