-
-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Netkan localization parser performance #2816
Conversation
I'm trying to take that regex apart:
This is the locale name (e.g. de-de, en-us, pl...) part with potential white space before and after, right?
That's the content of the localization, right? Is there no chance that there are curly braces in the translation text?
This is not white-space only for potential comments?
|
Yes, I think that makes sense. Will push that change in a moment.
I don't know whether it's allowed in KSP cfg files syntactically. I'll try to find that out, but I'm pessimistic about matching such a format with a regex, and I'm trying to avoid implementing a full parser for this.
I don't think so; if someone does this, we don't want the overall
Just trying to make it permissive to avoid future problems with the syntax, but yes comments are a good example. Thinking about the backtracking some more, I think we should repeat the |
FYI, this here exists: https://github.com/WuphonsReach/KSP-ConfigParser
Yeah, I see that now.
👍
Sounds reasonable. |
Interesting. I think that project treats { and } as not allowed in values; if it finds one anywhere on a line, it splits the line at that point: If this is how KSP does it as well, then this PR should be OK. |
Did you find anything (official)? So in my feeling curly braces are not allowed in value strings, So I would give this PR a go. |
That's exactly what I found as well; nobody documents this aspect of the format, and MM doesn't do its own parsing. |
Is it safe to unfreeze |
At the beginning of each run, the bot gets a fresh copy of |
Sigh, there's always one... Though that's essentially using the localization resources incorrectly. Those should be EDIT: Those resources are not being used at all. They were added along with some C# changes in the middle of a PR with dozens of commits, and then later those C# changes were lost in the shuffle of a merge from another branch. |
Problem
This netkan hangs for a long time, maybe indefinitely:
This is inhibiting the functioning of the bot.
Cause
I think we're experiencing catastrophic backtracking. The regex for parsing a
Localization
block has several instances of the same character potentially being matchable by multiple consecutive pieces of the regex.A
{
could match the first or second part of this:A letter could match the first or second part of this:
And so on. This results in a lot of retries when it reaches the end of the string without a match. In the case of PolishTranslation, I think the issue is simply that its
Localization
blocks are very, very long, so the inherent performance problems of this regex are multiplied.Changes
Now the regex for
Localization
is rewritten to be more specific in terms of what's expected. It must start with Localization followed by optional whitespace and an open curly brace. Then some non-brace characters followed by one or more sequences of an open curly brace followed by non-brace characters followed by a close curly brace and any number of non-brace characters. Finally the final close brace. This regex no longer cares about the locale names; it leaves that to thelocaleRegex
. This makes it much simpler to determine whether a cfg file's contents match what we're looking for, thus preventing backtracking.Netkan is now able to inflate PolishTranslation quickly, and other netkans that were the subject of previous fixes are still working.
Fixes #2814.