Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always parse as UTF-8. #270

Merged
merged 2 commits into from
Nov 19, 2018
Merged

Always parse as UTF-8. #270

merged 2 commits into from
Nov 19, 2018

Conversation

dvander
Copy link
Member

@dvander dvander commented Nov 19, 2018

This is a ~13% compile time improvement on SourceMod plugins.

Mysteriously, it wasn't enough to parse three times: the compiler
also pre-scans the entire text of source/include files to determine
if each uses UTF-8. This is particularly mysterious as the result
of pre-scanning for UTF-8 is that it will allow parsing more UTF-8
characters. Why bother scanning in the first place?

One reason is to remove the byte order mark (BOM), which this patch
retains by scanning the first three bytes. Otherwise, it seems this code
can be removed. It's possible upstream had more dependent logic and we
removed it long ago.

@dvander dvander force-pushed the split-headers-p2 branch 2 times, most recently from ee26bb3 to 9caa869 Compare November 19, 2018 06:54
This is a ~13% compile time improvement on SourceMod plugins.

Mysteriously, it wasn't enough to parse three times: the compiler
also pre-scans the entire text of source/include files to determine
if each uses UTF-8. This is particularly mysterious as the result
of pre-scanning for UTF-8 is that it will allow parsing more UTF-8
characters. Why bother scanning in the first place?

One reason is to remove the byte order mark (BOM), which this patch
retains by scanning the first three bytes. Otherwise, it seems this code
can be removed. It's possible upstream had more dependent logic and we
removed it long ago.

This patch also fixes a long-standing bug where only include files had
BOM correctly skipped, and an additional buffer overflow when parsing
BOMs.
This argument doesn't really make any sense. Anything at the top of
source files intended to be ignored, and not a comment, would be
non-standard.
@dvander dvander merged commit 2afae87 into master Nov 19, 2018
@dvander dvander deleted the split-headers-p2 branch November 19, 2018 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant