Very inneficient gettext parser #596

digitalnature opened this Issue Jul 24, 2012 · 2 comments


None yet
2 participants

The parser itself it's ok I guess, though the PO catalog parser is about 3 times faster than the MO (compiled catalog) parser. This is kind of awkward considering that the reason the binary version exists is to make reading faster by machines :)

But the real problem is the merging of messages, which slows things down about 12 times.

For example - a catalog with 3000 lines gets parsed in 0.09s on my machine (the MO version of the catalog in 0.26s).
The merging functions add another 1.2s to this process. Most of this time probably comes from the function call overhead x 3000 times.

I think no methods should be called inside the for / fgets loop, and code moved inside the loop even if it means some duplicate code. Also the resulting array contains some unnecessary fields like ids, id, and probably comments and occurrences. These just take up some memory. The id is already present in the key, and the value should just be an array containing the translated messages...

@ghost ghost assigned davidpersson Jan 10, 2014


davidpersson commented Jan 11, 2014

My results are (set: ~1600 entries):


Total (1000 iterations)
Took: 201.7883810997

Per iteration
Took: 0.2017883810997

Single sample
Took: 0.20334100723267


Total (1000 iterations)
Took: 201.46384692192

Per iteration
Took: 0.20146384692192

Single sample
Took: 0.24216485023499

davidpersson commented Sep 25, 2014

I'm closing this as (a) the real solution to this would be caching (see #1054) and (b) very inefficient is in my eyes exaggerated as it is one of the fastest PHP parsers I know, (c) yes MO parsing should be faster but this PHP and that doesn't benefit from parsing binary vs text formats much, (d) memory efficiency is not a priority for the parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment