Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib][SR-7556] Re-implement string-to-integer parsing #36623

Merged
merged 11 commits into from Apr 3, 2021

Conversation

xwu
Copy link
Collaborator

@xwu xwu commented Mar 28, 2021

This PR attempts to improve code size and performance for string-to-integer parsing without introducing any additions to the ABI. As noted in the relevant bug:

Constructing an Int from a String eventually involves calling _parseASCII, which is slow and bloated if not properly specialized.

A serious of unfortunate events played out over time where this function was overly generic and so was marked @inline(__always) to be fast, then it was discovered this function was about 20KB large and the callers were all marked with @inline(never) and various @_semantics to disable specialization for size (and perhaps compilation time).

The end result was something bloated, slow, and yet is still emitted into the user module, increasing code size.

To confront this issue, this PR introduces a new _parseASCII. Instead of taking an argument of type T: IteratorProtocol where T.Element == UnsignedInteger, it expects a buffer of type UnsafeBufferPointer<UInt8>. This is made possible by the existence of the withContiguousStorageIfAvailable API on StringProtocol.

If contiguous storage is not available, then an @inline(never) fallback is called, which initializes a mutable String value and then makes use of the withUTF8 API. We consign this function to the fallback path in order to avoid creating a mutable copy if we can.

The extremely pared down implementation shown here is the result of several iterative rounds of benchmarking and simplification described below. It's tempting to consider further specializations for base 8, 10, or 16, but the possible wins would appear to be negligible without a significantly more sophisticated implementation (such as that attempted in #30094, as pointed out in the conversation below). Those are deferred to prioritize addressing some low-hanging fruit.

Resolves SR-7556.

Summary of findings

Baseline

As a baseline, I used #36625 to demonstrate what would occur if the existing implementation had its @inline(never) and @_semantics markings removed. This revealed sizable improvements in microbenchmark performance but a significant regression in code size (excerpted results below):

PERFORMANCE -O
Improvement
OLD NEW DELTA RATIO
ParseInt.IntSmall.Decimal 437 228 -47.8% 1.92x
StrToInt 1600 860 -46.2% 1.86x
ParseInt.IntSmall.UncommonRadix 481 263 -45.3% 1.83x
ParseInt.UInt64.Decimal 204 153 -25.0% 1.33x
ParseInt.UInt64.Hex 332 272 -18.1% 1.22x
ParseInt.UIntSmall.Binary 641 542 -15.4% 1.18x
CODE SIZE -O
Regression
OLD NEW DELTA RATIO
IntegerParsing.o ⚠️ 56677 88581 +56.3% 0.64x
RangeReplaceableCollectionPlusDefault.o 5442 6902 +26.8% 0.79x
CODE SIZE -O
Improvement
OLD NEW DELTA RATIO
StrToInt.o 4615 3657 -20.8% 1.26x

First attempts

The first attempts to improve upon the status quo yielded similar results to the above.

A manually specialized implementation of _parseASCII and a new _parseASCIIDigits were added which take an UnsafeBufferPointer<UInt8> argument as the source. Additionally, generic versions of the above were maintained; with this setup, I attempted to mark the duplicated implementations with different attributes in the hopes of fine tuning code size and performance.

However, even after marking the fallback generic helper functions using @inline(never), there were improvements in microbenchmarks but significant code size regressions (excerpted results below):

PERFORMANCE -O
Improvement
OLD NEW DELTA RATIO
ParseInt.IntSmall.Decimal 437 206 -52.9% 2.12x
ParseInt.IntSmall.UncommonRadix 481 229 -52.4% 2.10x
ParseInt.UInt64.Decimal 203 117 -42.4% 1.74x
StrToInt 1600 990 -38.1% 1.62x
ParseInt.UIntSmall.Binary 641 418 -34.8% 1.53x
ParseInt.UInt64.Hex 331 272 -17.8% 1.22x
CODE SIZE -O
Regression
OLD NEW DELTA RATIO
IntegerParsing.o ⚠️ 56677 90807 +60.2% 0.62x
StrToInt.o 4615 6065 +31.4% 0.76x
RangeReplaceableCollectionPlusDefault.o 5442 7070 +29.9% 0.77x
LuhnAlgoEager.o 11370 13372 +17.6% 0.85x
LuhnAlgoLazy.o 11370 13372 +17.6% 0.85x
DictionaryCompactMapValues.o 13853 15770 +13.8% 0.88x

Minimum code size

After removing all manually repeated code and further simplifying the implementation, I attempted to remove the @inline(__always) marking from FixedWidthInteger.init(_:radix:) and to test the effect of explicitly requiring partial specializations for S == String and for S == Substring using @_specialize(kind: partial, ...).

This produced a result that decreased the compiled size of the standard library itself by ~1%, as well as improvements in the code size of the IntegerParsing microbenchmarks. However, it wiped out most performance improvements at -O, except for StrToInt (excerpted results below):

PERFORMANCE -O
Regression
OLD NEW DELTA RATIO
ParseInt.UInt64.Hex 332 365 +9.9% 0.91x
PERFORMANCE -O
Improvement
OLD NEW DELTA RATIO
StrToInt 1600 950 -40.6% 1.68x
CODE SIZE -O
Regression
OLD NEW DELTA RATIO
RangeReplaceableCollectionPlusDefault.o 5442 6220 +14.3% 0.87x
LuhnAlgoEager.o 11370 12298 +8.2% 0.92x
LuhnAlgoLazy.o 11370 12298 +8.2% 0.92x
DictionaryCompactMapValues.o 13853 14714 +6.2% 0.94x
CODE SIZE -O
Improvement
OLD NEW DELTA RATIO
IntegerParsing.o ✅ 56677 55733 -1.7% 1.02x

CODE SIZE: -swiftlibs

Improvement OLD NEW DELTA RATIO
libswiftCore.dylib 3850240 3801088 -1.3% 1.01x

Inlined performance

The final form of this PR restores the @inline(__always) marking to FixedWidthInteger.init(_:radix:) (and simplifies the implementation further). Doing so produces a result where a sizable proportion of the performance benefit seen in the baseline benchmarks can be recovered with a very modest code size increase. As before, the compiled size of the standard library itself is decreased by ~1%.

This implementation relies on no @_semantics annotations and, perhaps relatedly, exhibits performance improvements at -O, -Osize, and -Onone (excerpted -O results below):

PERFORMANCE -O
Improvement
OLD NEW DELTA RATIO
StrToInt 1600 1000 -37.5% 1.60x
ParseInt.IntSmall.UncommonRadix 481 328 -31.8% 1.47x
ParseInt.IntSmall.Decimal 437 302 -30.9% 1.45x
ParseInt.UInt64.Decimal 204 166 -18.6% 1.23x
ParseInt.UIntSmall.Binary 641 585 -8.7% 1.10x
CODE SIZE -O
Regression
OLD NEW DELTA RATIO
LuhnAlgoEager.o 11370 12178 +7.1% 0.93x
LuhnAlgoLazy.o 11370 12178 +7.1% 0.93x
RangeReplaceableCollectionPlusDefault.o 5442 5804 +6.7% 0.94x
StrToInt.o 4615 4859 +5.3% 0.95x
DictionaryCompactMapValues.o 13853 14570 +5.2% 0.95x
IntegerParsing.o 🏁 56677 59557 +5.1% 0.95x

CODE SIZE: -swiftlibs

Improvement OLD NEW DELTA RATIO
libswiftCore.dylib 3850240 3801088 -1.3% 1.01x

(All versions of this PR show varying degrees of code size regressions in LuhnAlgoEager, LuhnAlgoLazy, RangeReplaceableCollectionPlusDefault, and DictionaryCompactMapValues. I have to presume that they are attributable to emitting this new implementation into the client; in the final iteration, these code size increases are the most modest yet.)

@xwu

This comment has been minimized.

@benrimmington
Copy link
Collaborator

There's also #30094 by @PatrickPijnappel

@xwu
Copy link
Collaborator Author

xwu commented Mar 28, 2021

@benrimmington It's been long enough that I'd forgotten about that PR 🤦, and the bug doesn't make mention of it. If @PatrickPijnappel wants to finish that one up, happy to set this aside.

The solution presented here is significantly less involved, and I'm curious to see what the benchmarks show. If there's sufficient incremental improvement, this could be landed without blocking a subsequent more sophisticated implementation that makes use of SWAR as @PatrickPijnappel is doing.

@xwu

This comment has been minimized.

@PatrickPijnappel
Copy link
Collaborator

I feel bad I didn't get to finishing that PR for such a long time, my work situation changed. It was very close to being merged, just stuck on a final simplification that seemed to change retain/release behavior wiping out gains. If you're interested, I'm open to collaborating on that one somehow—I can prioritize some time.

Nevertheless, if this PR delivers significant gains it makes sense to merge it first, especially since it doesn't introduce anything that needs to be maintained from an ABI perspective.

@swift-ci

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Mar 28, 2021

@PatrickPijnappel I'm also not blessed with a large amount of time these days, sadly. I was more hoping that there was some low-hanging fruit here; if the benchmarks aren't really exciting, I think I'll have to leave this work in others' hands.

@swift-ci

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu

This comment has been minimized.

@swift-ci

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Mar 29, 2021

@swift-ci benchmark

@swift-ci
Copy link
Collaborator

Performance: -O

Regression OLD NEW DELTA RATIO
NSStringConversion.UTF8 935 1041 +11.3% 0.90x (?)
ObjectiveCBridgeFromNSArrayAnyObjectToStringForced 30400 33000 +8.6% 0.92x (?)
ObjectiveCBridgeFromNSArrayAnyObjectForced 4420 4780 +8.1% 0.92x (?)
NSStringConversion.MutableCopy.LongUTF8 636 686 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ParseInt.IntSmall.Decimal 437 206 -52.9% 2.12x
ParseInt.IntSmall.UncommonRadix 481 229 -52.4% 2.10x
ParseInt.UInt64.Decimal 203 117 -42.4% 1.74x
StrToInt 1600 990 -38.1% 1.62x
ParseInt.UIntSmall.Binary 641 418 -34.8% 1.53x
StringFromLongWholeSubstring 5 4 -20.0% 1.25x
ParseInt.UInt64.Hex 331 272 -17.8% 1.22x
DictionaryCompactMapValuesOfCastValue 7452 6858 -8.0% 1.09x (?)
Data.hash.Medium 42 39 -7.1% 1.08x (?)
AngryPhonebook.Armenian.Small 877 818 -6.7% 1.07x (?)
String.replaceSubrange.ArrChar.Small 76 71 -6.6% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
IntegerParsing.o 56677 90807 +60.2% 0.62x
StrToInt.o 4615 6065 +31.4% 0.76x
RangeReplaceableCollectionPlusDefault.o 5442 7070 +29.9% 0.77x
LuhnAlgoEager.o 11370 13372 +17.6% 0.85x
LuhnAlgoLazy.o 11370 13372 +17.6% 0.85x
DictionaryCompactMapValues.o 13853 15770 +13.8% 0.88x
DriverUtils.o 129127 133481 +3.4% 0.97x

Performance: -Osize

Regression OLD NEW DELTA RATIO
RandomShuffleLCG2 416 448 +7.7% 0.93x
DictionaryKeysContainsNative 26 28 +7.7% 0.93x (?)
Array2D 6992 7520 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ParseInt.IntSmall.Decimal 512 202 -60.5% 2.53x
ParseInt.IntSmall.UncommonRadix 569 228 -59.9% 2.50x
StrToInt 1900 970 -48.9% 1.96x
ParseInt.UIntSmall.Binary 692 428 -38.2% 1.62x
ParseInt.UInt64.Decimal 223 141 -36.8% 1.58x
StringFromLongWholeSubstring 5 4 -20.0% 1.25x
DictionaryCompactMapValuesOfCastValue 7506 6858 -8.6% 1.09x
ParseInt.UInt64.Hex 325 301 -7.4% 1.08x (?)
AngryPhonebook.Armenian.Small 883 823 -6.8% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
IntegerParsing.o 51810 81283 +56.9% 0.64x
RangeReplaceableCollectionPlusDefault.o 4715 6411 +36.0% 0.74x
StrToInt.o 4404 5400 +22.6% 0.82x
LuhnAlgoEager.o 12327 13929 +13.0% 0.88x
LuhnAlgoLazy.o 12327 13929 +13.0% 0.88x
DictionaryCompactMapValues.o 12248 13804 +12.7% 0.89x
DriverUtils.o 122969 125759 +2.3% 0.98x

Performance: -Onone

Regression OLD NEW DELTA RATIO
ConvertFloatingPoint.MockFloat64ToInt64 49959 53781 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
ParseInt.UIntSmall.Binary 22822 17027 -25.4% 1.34x
ParseInt.UInt64.Decimal 6356 4824 -24.1% 1.32x
ParseInt.UInt64.Hex 5656 4566 -19.3% 1.24x
StrToInt 42680 35060 -17.9% 1.22x
LuhnAlgoEager 4764 4263 -10.5% 1.12x (?)
DictionaryCompactMapValuesOfCastValue 55080 49410 -10.3% 1.11x
RangeReplaceableCollectionPlusDefault 6984 6364 -8.9% 1.10x (?)
ParseInt.IntSmall.UncommonRadix 11331 10336 -8.8% 1.10x (?)
String.replaceSubrange.Substring.Small 87 81 -6.9% 1.07x (?)
ParseInt.IntSmall.Decimal 10017 9359 -6.6% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@xwu

This comment has been minimized.

@xwu xwu marked this pull request as draft March 29, 2021 13:04
@swift-ci

This comment has been minimized.

@xwu

This comment has been minimized.

@swift-ci

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Apr 1, 2021

@swift-ci test Linux platform

@xwu
Copy link
Collaborator Author

xwu commented Apr 1, 2021

@swift-ci test macOS platform

@xwu
Copy link
Collaborator Author

xwu commented Apr 1, 2021

@swift-ci benchmark

@swift-ci

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Apr 1, 2021

@swift-ci benchmark

@swift-ci
Copy link
Collaborator

swift-ci commented Apr 1, 2021

Build failed
Swift Test OS X Platform
Git Sha - 92d492f

@swift-ci
Copy link
Collaborator

swift-ci commented Apr 1, 2021

Performance: -O

Regression OLD NEW DELTA RATIO
String.data.Medium 95 115 +21.1% 0.83x (?)
FlattenListFlatMap 4763 5273 +10.7% 0.90x (?)
NSError 145 160 +10.3% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
StrToInt 1430 920 -35.7% 1.55x
ParseInt.IntSmall.Decimal 392 270 -31.1% 1.45x
ParseInt.IntSmall.UncommonRadix 432 299 -30.8% 1.44x
Breadcrumbs.MutatedUTF16ToIdx.ASCII 4 3 -25.0% 1.33x
Breadcrumbs.MutatedIdxToUTF16.ASCII 4 3 -25.0% 1.33x
ParseInt.UInt64.Decimal 184 145 -21.2% 1.27x
ParseInt.UIntSmall.Binary 575 517 -10.1% 1.11x
ParseInt.UInt64.Hex 299 273 -8.7% 1.10x
FindString.Loop1.Substring 455 424 -6.8% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
LuhnAlgoEager.o 11370 12162 +7.0% 0.93x
LuhnAlgoLazy.o 11370 12162 +7.0% 0.93x
RangeReplaceableCollectionPlusDefault.o 5442 5788 +6.4% 0.94x
DictionaryCompactMapValues.o 13853 14554 +5.1% 0.95x
StrToInt.o 4615 4843 +4.9% 0.95x
IntegerParsing.o 56677 59477 +4.9% 0.95x

Performance: -Osize

Regression OLD NEW DELTA RATIO
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii 346 406 +17.3% 0.85x (?)
FlattenListFlatMap 3446 3800 +10.3% 0.91x (?)
DropFirstAnyCollectionLazy 79438 86611 +9.0% 0.92x (?)
DropLastAnyCollectionLazy 26992 29307 +8.6% 0.92x (?)
SuffixAnyCollectionLazy 26343 28487 +8.1% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
StrToInt 1700 930 -45.3% 1.83x
ParseInt.IntSmall.UncommonRadix 510 307 -39.8% 1.66x
ParseInt.IntSmall.Decimal 459 291 -36.6% 1.58x
ParseInt.UInt64.Decimal 197 151 -23.4% 1.30x
ParseInt.UIntSmall.Binary 621 520 -16.3% 1.19x
DictionaryLiteral 3670 3310 -9.8% 1.11x (?)
DictionaryCompactMapValuesOfCastValue 6696 6210 -7.3% 1.08x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
RangeReplaceableCollectionPlusDefault.o 4715 5511 +16.9% 0.86x
LuhnAlgoEager.o 12327 13079 +6.1% 0.94x
LuhnAlgoLazy.o 12327 13079 +6.1% 0.94x
DictionaryCompactMapValues.o 12248 12951 +5.7% 0.95x
IntegerParsing.o 51810 54751 +5.7% 0.95x
StrToInt.o 4404 4547 +3.2% 0.97x

Performance: -Onone

Regression OLD NEW DELTA RATIO
NSDictionaryCastToSwift 2490 3020 +21.3% 0.82x (?)
StringBuilderWithLongSubstring 3920 4710 +20.2% 0.83x (?)
RandomDoubleLCG 39156 42382 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ParseInt.UIntSmall.Binary 20519 9198 -55.2% 2.23x
ParseInt.UInt64.Decimal 5793 2622 -54.7% 2.21x
StrToInt 38590 18730 -51.5% 2.06x
ParseInt.UInt64.Hex 5133 2645 -48.5% 1.94x
ParseInt.IntSmall.UncommonRadix 10196 5620 -44.9% 1.81x
ParseInt.IntSmall.Decimal 9002 5291 -41.2% 1.70x
DictionaryCompactMapValuesOfCastValue 49410 37746 -23.6% 1.31x
LuhnAlgoEager 4304 3752 -12.8% 1.15x (?)
LuhnAlgoLazy 4218 3788 -10.2% 1.11x (?)
RangeReplaceableCollectionPlusDefault 5884 5356 -9.0% 1.10x (?)

Code size: -swiftlibs

Improvement OLD NEW DELTA RATIO
libswiftCore.dylib 3850240 3784704 -1.7% 1.02x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

@xwu
Copy link
Collaborator Author

xwu commented Apr 1, 2021

@milseman I think this is ready.

(The failed macOS tests are also failing in other PRs--e.g., #36669--suggesting they're unrelated to this change.)

@xwu

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Apr 1, 2021

@swift-ci please smoke test windows

@xwu

This comment has been minimized.

@swift-ci

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Apr 2, 2021

@swift-ci smoke test

@xwu

This comment has been minimized.

3 similar comments
@xwu

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu

This comment has been minimized.

@xwu
Copy link
Collaborator Author

xwu commented Apr 2, 2021

Ugh, really?

@swift-ci smoke test macOS platform

@xwu
Copy link
Collaborator Author

xwu commented Apr 2, 2021

@swift-ci test macOS platform

@xwu
Copy link
Collaborator Author

xwu commented Apr 2, 2021

@swift-ci smoke test Windows platform

@xwu
Copy link
Collaborator Author

xwu commented Apr 2, 2021

@milseman Ship it?

@xwu xwu requested a review from milseman April 3, 2021 00:08
Copy link
Contributor

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants