Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codechange: Simplify FormatString #9416

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

@fsimonis
Copy link
Contributor

@fsimonis fsimonis commented Jul 7, 2021

Motivation / Problem

FormatString uses a stack to handle formatting of inlined strings.
This is overly complex and leads to unnecessary allocations in the drawing code of OpenTTD,

Description

Based on #9405, this PR simplifies FormatString by removing its stack and replacing it by recursion for the format codes SCC_NEWGRF_STRINL and SCC_NEWGRF_PRINT_WORD_STRING_ID.
This change removes the use of temporary allocations in this function.
With this change, the overall temporary allocations are down by ~90% for the test case mentioned in #9405.

I am not sure why, but SCC_NEWGRF_PRINT_WORD_STRING_ID got triggered repeatedly in world generation.

Limitations

I was unable to test the changes for the affected format code SCC_NEWGRF_STRINL as I couldn't find a NewGrf which uses it.

Checklist for review

Some things are not automated, and forgotten often. This list is a reminder for the reviewers.

  • The bug fix is important enough to be backported? (label: 'backport requested')
  • This PR affects the save game format? (label 'savegame upgrade')
  • This PR affects the GS/AI API? (label 'needs review: Script API')
    • ai_changelog.hpp, gs_changelog.hpp need updating.
    • The compatibility wrappers (compat_*.nut) need updating.
  • This PR affects the NewGRF API? (label 'needs review: NewGRF')
This removes the use of the stack and replaces its uses by recursion.
This also prevents unnecessary allocations in FormatString, which is
a hot function in the drawing code.
@glx22
Copy link
Contributor

@glx22 glx22 commented Jul 7, 2021

I am not sure why, but SCC_NEWGRF_PRINT_WORD_STRING_ID got triggered repeatedly in world generation.

It's because some industries have an integrated station, and that implies a call to get the station name.

Loading

src/strings.cpp Outdated
break;
}

case SCC_NEWGRF_PRINT_WORD_STRING_ID: {
StringID substr = args->GetInt32(SCC_NEWGRF_PRINT_WORD_STRING_ID);
str_stack.push(GetStringPtr(substr));
buff = FormatString(buff, GetStringPtr(substr), args, last, case_index, game_script, dry_run);
case_index = next_substr_case_index;
Copy link
Contributor

@rubidium42 rubidium42 Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong now as this case_index was what was used to format the string, and now it's using an old value. Other calls seems to be passing next_substr_case_index.

Loading

Copy link
Contributor

@rubidium42 rubidium42 Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes SCC_SET_CASE changing next_substr_case_index while processing the substring does not influence any later strings, although in the past it did.
I do not know enough about the NewGRF text stack to say anything useful the effects of that. Are NewGRFs depending on this behaviour? Or is the behaviour maybe just plain wrong as it is now? I do not know. I do know it gets changed, and that makes me feel uncomfortable.

Loading

Copy link
Contributor Author

@fsimonis fsimonis Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that as well, so thanks for bringing it up.

The use-case here looks actually very similar to the one of the SCC_INDUSTRY_NAME, which passes next_substr_case_index to the call to FormatString and then resets it to 0.
I'll change this.

OpenTTD/src/strings.cpp

Lines 1372 to 1388 in e99d594

case SCC_INDUSTRY_NAME: { // {INDUSTRY}
const Industry *i = Industry::GetIfValid(args->GetInt32(SCC_INDUSTRY_NAME));
if (i == nullptr) break;
if (_scan_for_gender_data) {
/* Gender is defined by the industry type.
* STR_FORMAT_INDUSTRY_NAME may have the town first, so it would result in the gender of the town name */
StringParameters tmp_params(nullptr, 0, nullptr);
buff = FormatString(buff, GetStringPtr(GetIndustrySpec(i->type)->name), &tmp_params, last, next_substr_case_index);
} else {
/* First print the town name and the industry type name. */
int64 args_array[2] = {i->town->index, GetIndustrySpec(i->type)->name};
StringParameters tmp_params(args_array);
buff = FormatString(buff, GetStringPtr(STR_FORMAT_INDUSTRY_NAME), &tmp_params, last, next_substr_case_index);
}
next_substr_case_index = 0;

Loading

@@ -818,15 +817,10 @@ static char *FormatString(char *buff, const char *str_arg, StringParameters *arg
WChar b = '\0';
uint next_substr_case_index = 0;
char *buf_start = buff;
std::stack<const char *, std::vector<const char *>> str_stack;
Copy link
Contributor

@rubidium42 rubidium42 Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the dry-run FormatString calls that are now going to be added to the processing. You call it, rightfully, with dry_run... however that means that when we alread did the dry run after this has run for the main string, you will still call it for all the sub strings in the STRINL and PRINT_WORD_STRING_ID cases. Note that most other calls to FormatString in this function declare their own set of parameters to work on that do not have type information.
So the question is: are those allocations really that expensive that it warrants doing multiple dry runs for the same string?
It might be simplifying things code wise and seemingly allocation wise, but does that not potentially increase the computational complexity of this?

Loading

src/strings.cpp Outdated Show resolved Hide resolved
Loading
src/strings.cpp Outdated Show resolved Hide resolved
Loading
if (str_stack.empty()) break;
const char *&str = str_stack.top();
if ((b = Utf8Consume(&current_str_arg)) == '\0') break;
const char *&str = current_str_arg;
Copy link
Contributor

@rubidium42 rubidium42 Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be some messing about so the value in the stack gets updated properly. However, now there is only one value current_str_arg, so can't const char *current_str_arg = str_arg become const char *str = str_arg, remove this line and pass str instead of current_str_arg to Utf8Consume? That seems to simplify this complicated construct a lot.

Loading

Copy link
Contributor Author

@fsimonis fsimonis Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str is used 26 times in the function, thus changing it will introduce a lot of noise in this PR.
I rather change this in a separate clean-up PR.

Loading

Copy link
Member

@LordAro LordAro Aug 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it as a separate commit, and I'd have no issues with it being in this PR

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants