[string] String(decoding:as:) fast path for withContiguousStorageIfAvailable #30729

milseman · 2020-03-31T01:08:42Z

Switch String(decoding:as) and other entry points to call
withContiguousStorageIfAvailable rather than use _HasContiguousBytes.

rdar://problem/59148099, SR-12125

rdar://problem/59148099

milseman · 2020-03-31T01:08:55Z

@swift-ci please benchmark

swift-ci · 2020-03-31T02:05:51Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubFromNSDate	6150	6630	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	900	200	-77.8%	4.50x
DataToStringSmall	2800	2100	-25.0%	1.33x (?)
DataToStringMedium	5050	4250	-15.8%	1.19x (?)
DataToStringLargeUnicode	6850	5950	-13.1%	1.15x (?)
ObjectiveCBridgeStubToNSDate2	610	550	-9.8%	1.11x (?)
EqualStringSubstring	45	41	-8.9%	1.10x (?)
SortStringsUnicode	3110	2860	-8.0%	1.09x
EqualSubstringSubstring	44	41	-6.8%	1.07x (?)
LessSubstringSubstring	44	41	-6.8%	1.07x
EqualSubstringSubstringGenericEquatable	44	41	-6.8%	1.07x (?)
EqualSubstringString	44	41	-6.8%	1.07x
LessSubstringSubstringGenericComparable	44	41	-6.8%	1.07x

Code size: -O

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode.o	13436	13059	-2.8%	1.03x
DataBenchmarks.o	85647	84701	-1.1%	1.01x

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	900	150	-83.3%	6.00x
DataToStringSmall	2800	2100	-25.0%	1.33x
DataToStringMedium	5050	4200	-16.8%	1.20x (?)
DataToStringLargeUnicode	6950	5950	-14.4%	1.17x (?)
DataCountMedium	28	25	-10.7%	1.12x (?)
CharacterLiteralsLarge	111	100	-9.9%	1.11x (?)
ObjectiveCBridgeStubToNSStringRef	125	113	-9.6%	1.11x (?)
EqualSubstringSubstring	45	41	-8.9%	1.10x
LessSubstringSubstring	45	41	-8.9%	1.10x (?)
EqualStringSubstring	45	41	-8.9%	1.10x (?)
EqualSubstringSubstringGenericEquatable	45	41	-8.9%	1.10x (?)
LessSubstringSubstringGenericComparable	45	41	-8.9%	1.10x
SortStringsUnicode	3180	2930	-7.9%	1.09x (?)
EqualSubstringString	44	41	-6.8%	1.07x
CharacterLiteralsSmall	345	322	-6.7%	1.07x (?)

Code size: -Osize

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode.o	11551	11191	-3.1%	1.03x
DataBenchmarks.o	74343	73461	-1.2%	1.01x

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	1400	750	-46.4%	1.87x
UTF8Decode_InitDecoding_ascii	464	344	-25.9%	1.35x
DataToStringMedium	5950	4950	-16.8%	1.20x (?)
DataToStringLargeUnicode	7750	6600	-14.8%	1.17x (?)
EqualSubstringString	53	48	-9.4%	1.10x (?)
UTF8Decode_InitDecoding	289	264	-8.7%	1.09x (?)
EqualSubstringSubstringGenericEquatable	51	47	-7.8%	1.09x (?)
LessSubstringSubstringGenericComparable	51	47	-7.8%	1.09x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Catfish-Man

Wonderful!

milseman · 2020-03-31T18:13:28Z

@swift-ci please benchmark

weissi

Awesome, thank you! @milseman shouldn't we add a test thought that validates we're doing the right thing?

swift-ci · 2020-03-31T18:42:33Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Set.isSuperset.Seq.Empty.Int	42	50	+19.0%	0.84x
Set.isDisjoint.Seq.Int.Empty	43	51	+18.6%	0.84x
Set.isDisjoint.Box.Empty	87	102	+17.2%	0.85x (?)
Set.isDisjoint.Seq.Box.Empty	79	89	+12.7%	0.89x
Set.subtracting.Empty.Box	8	9	+12.5%	0.89x (?)
ArrayLiteral2	69	76	+10.1%	0.91x (?)
StringHasSuffixAscii	1320	1450	+9.8%	0.91x (?)
Set.isStrictSubset.Seq.Int.Empty	112	123	+9.8%	0.91x
Set.isSubset.Int.Empty	47	51	+8.5%	0.92x (?)
Set.isSubset.Seq.Int.Empty	114	123	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	550	100	-81.8%	5.50x
DataToStringSmall	1900	1450	-23.7%	1.31x (?)
DataToStringMedium	3800	3150	-17.1%	1.21x
MapReduceClass2	22	19	-13.6%	1.16x (?)
DataToStringLargeUnicode	4700	4100	-12.8%	1.15x (?)
RemoveWhereMoveInts	19	17	-10.5%	1.12x (?)
RandomShuffleLCG2	464	416	-10.3%	1.12x
AngryPhonebook.ASCII.Small	12	11	-8.3%	1.09x
ArraySetElement	284	262	-7.7%	1.08x (?)
RemoveWhereFilterInts	28	26	-7.1%	1.08x (?)
PrefixWhileSequence	177	165	-6.8%	1.07x (?)

Code size: -O

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode.o	13436	13059	-2.8%	1.03x
DataBenchmarks.o	85647	84701	-1.1%	1.01x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
DataCountSmall	15	17	+13.3%	0.88x
Set.isSuperset.Seq.Empty.Int	47	52	+10.6%	0.90x (?)
DataCreateEmptyArray	1100	1200	+9.1%	0.92x (?)
ObjectiveCBridgeStubToNSDate2	330	360	+9.1%	0.92x (?)
Set.isStrictSubset.Seq.Int.Empty	120	130	+8.3%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	550	100	-81.8%	5.50x
DataCreateMedium	6300	4100	-34.9%	1.54x
DataToStringSmall	1900	1500	-21.1%	1.27x
DataToStringMedium	3700	3150	-14.9%	1.17x (?)
Array2D	4544	3984	-12.3%	1.14x (?)
DataToStringLargeUnicode	4550	4050	-11.0%	1.12x (?)
AngryPhonebook.ASCII.Small	12	11	-8.3%	1.09x (?)
ArrayPlusEqualFiveElementCollection	4810	4440	-7.7%	1.08x (?)
CharacterLiteralsSmall	226	209	-7.5%	1.08x (?)
ArraySetElement	283	262	-7.4%	1.08x (?)
CharacterLiteralsLarge	74	69	-6.8%	1.07x (?)

Code size: -Osize

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode.o	11551	11191	-3.1%	1.03x
DataBenchmarks.o	74343	73461	-1.2%	1.01x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ParseInt.UInt32.Hex	6272	6972	+11.2%	0.90x

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	850	400	-52.9%	2.12x
UTF8Decode_InitDecoding_ascii	281	223	-20.6%	1.26x
DataToStringSmall	2200	1800	-18.2%	1.22x
DataToStringMedium	4000	3500	-12.5%	1.14x (?)
DataToStringLargeUnicode	4850	4450	-8.2%	1.09x (?)
StringToDataEmpty	750	700	-6.7%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

milseman · 2020-03-31T19:39:11Z

@weissi

shouldn't we add a test thought that validates we're doing the right thing?

There shouldn't be observable behavior differences from this change, just performance. But I do agree we should add a benchmark. I'll work on that.

weissi · 2020-04-01T10:11:49Z

@milseman what about a LIT test like the ones I wrote when I last made sure more things are hitting the fast paths? https://github.com/apple/swift/pull/21743/files#diff-ba4d75f225736a97fc5a116d6ecf0821R1

milseman · 2020-04-01T18:40:35Z

@swift-ci please test

milseman · 2020-04-01T18:40:41Z

@swift-ci please benchmark

swift-ci · 2020-04-01T18:43:16Z

Build failed
Swift Test Linux Platform
Git Sha - 4192fef9ec1e34d4427f2a10de5981e5fe319525

swift-ci · 2020-04-01T18:44:25Z

Build failed
Swift Test OS X Platform
Git Sha - 4192fef9ec1e34d4427f2a10de5981e5fe319525

milseman · 2020-04-01T18:46:31Z

what about a LIT test like the ones I wrote when I last made sure more things are hitting the fast paths?

Doh, I think you're right. The benchmarks I added are probably still useful (especially the non-contiguous one and the contrast between the two), but a check on the SIL is nicer to enforce this.

swift-ci · 2020-04-01T19:11:02Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Set.subtracting.Empty.Box	7	9	+28.6%	0.78x
Set.isDisjoint.Seq.Int.Empty	43	52	+20.9%	0.83x
Set.isSuperset.Seq.Empty.Int	42	50	+19.0%	0.84x
ArrayLiteral2	67	79	+17.9%	0.85x (?)
Set.isDisjoint.Seq.Box.Empty	78	91	+16.7%	0.86x
Set.isDisjoint.Box.Empty	88	102	+15.9%	0.86x
DataSubscriptSmall	13	15	+15.4%	0.87x
Set.isSubset.Seq.Int.Empty	111	128	+15.3%	0.87x (?)
UTF8Decode_InitDecoding	145	167	+15.2%	0.87x
Set.isStrictSubset.Seq.Int.Empty	106	122	+15.1%	0.87x (?)
Set.isSubset.Int.Empty	47	51	+8.5%	0.92x (?)
Set.isStrictSuperset.Seq.Empty.Int	154	166	+7.8%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	550	100	-81.8%	5.50x
Data.append.Sequence.64kB.Count.RE.I	44	21	-52.3%	2.10x
Data.append.Sequence.64kB.Count.RE	44	21	-52.3%	2.10x (?)
Data.append.Sequence.809B.Count.RE	92	66	-28.3%	1.39x
Data.append.Sequence.809B.Count.RE.I	92	66	-28.3%	1.39x
DataAppendSequence	9100	6600	-27.5%	1.38x
DataToStringSmall	1900	1450	-23.7%	1.31x
Dictionary4	193	151	-21.8%	1.28x (?)
Dictionary4OfObjects	223	190	-14.8%	1.17x (?)
DataToStringMedium	3750	3200	-14.7%	1.17x (?)
DataCountSmall	15	13	-13.3%	1.15x
DataCountMedium	15	13	-13.3%	1.15x
RemoveWhereMoveInts	19	17	-10.5%	1.12x (?)
Data.append.Sequence.64kB.Count.I	29	26	-10.3%	1.12x (?)
Data.append.Sequence.64kB.Count	29	26	-10.3%	1.12x (?)
DataToStringLargeUnicode	4600	4150	-9.8%	1.11x (?)
AngryPhonebook.ASCII.Small	12	11	-8.3%	1.09x (?)
LazilyFilteredArrayContains	21200	19500	-8.0%	1.09x (?)
DataSetCountSmall	63	58	-7.9%	1.09x (?)
PrefixWhileAnySeqCRangeIter	193	178	-7.8%	1.08x (?)
ArraySetElement	284	262	-7.7%	1.08x (?)
FlattenListFlatMap	5967	5513	-7.6%	1.08x (?)
RandomShuffleLCG2	448	416	-7.1%	1.08x (?)
Data.append.Sequence.809B.Count	70	65	-7.1%	1.08x (?)
PrefixWhileSequence	180	168	-6.7%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
UTF8Decode_InitFromCustom_contiguous	181	183	182	—
UTF8Decode_InitFromCustom_contiguous_ascii	212	216	214	—
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	241	247	245	—
UTF8Decode_InitFromCustom_noncontiguous	392	397	394	—
UTF8Decode_InitFromCustom_noncontiguous_ascii	902	908	905	—
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	582	591	586	—

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode.o	13468	25298	+87.8%	0.53x

Improvement	OLD	NEW	DELTA	RATIO
DataBenchmarks.o	69973	69027	-1.4%	1.01x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
Dictionary4	200	256	+28.0%	0.78x
Dictionary4OfObjects	262	324	+23.7%	0.81x
UTF8Decode_InitDecoding	144	167	+16.0%	0.86x
Set.isStrictSubset.Seq.Int.Empty	111	128	+15.3%	0.87x
Set.isDisjoint.Seq.Int.Empty	46	53	+15.2%	0.87x
Set.isSubset.Seq.Int.Empty	108	123	+13.9%	0.88x (?)
UTF8Decode_InitFromData	152	171	+12.5%	0.89x (?)
Set.subtracting.Empty.Box	8	9	+12.5%	0.89x (?)
Set.isStrictSuperset.Seq.Empty.Int	155	173	+11.6%	0.90x (?)
Set.isSuperset.Seq.Empty.Int	46	51	+10.9%	0.90x (?)
UTF8Decode	284	312	+9.9%	0.91x (?)
Set.isDisjoint.Int.Empty	53	58	+9.4%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	550	100	-81.8%	5.50x
DataToStringSmall	1900	1500	-21.1%	1.27x
DataToStringMedium	3700	3150	-14.9%	1.17x
Array2D	4544	4000	-12.0%	1.14x (?)
DataToStringLargeUnicode	4650	4100	-11.8%	1.13x (?)
RandomShuffleLCG2	496	448	-9.7%	1.11x (?)
ObjectiveCBridgeStringIsEqualAllSwift	53	48	-9.4%	1.10x (?)
UTF8Decode_InitDecoding_ascii_as_ascii	219	202	-7.8%	1.08x (?)
ArrayPlusEqualFiveElementCollection	4810	4440	-7.7%	1.08x (?)
PrefixWhileAnySeqCntRange	198	183	-7.6%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
UTF8Decode_InitFromCustom_contiguous	180	186	183	—
UTF8Decode_InitFromCustom_contiguous_ascii	210	212	211	—
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	244	250	246	—
UTF8Decode_InitFromCustom_noncontiguous	318	318	318	—
UTF8Decode_InitFromCustom_noncontiguous_ascii	672	680	675	—
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	718	718	718	—

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode.o	11535	23698	+105.4%	0.49x

Improvement	OLD	NEW	DELTA	RATIO
DataBenchmarks.o	57526	56676	-1.5%	1.01x

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	850	400	-52.9%	2.12x
UTF8Decode_InitDecoding_ascii	287	213	-25.8%	1.35x
DataToStringSmall	2200	1750	-20.5%	1.26x
DataToStringMedium	4100	3550	-13.4%	1.15x (?)
DataToStringLargeUnicode	5050	4450	-11.9%	1.13x (?)
StringFromLongWholeSubstring	10	9	-10.0%	1.11x
ObjectiveCBridgeStringIsEqualAllSwift	53	49	-7.5%	1.08x (?)
ObjectiveCBridgeStringIsEqual2	194	180	-7.2%	1.08x (?)
ArrayAppendAsciiSubstring	33660	31320	-7.0%	1.07x (?)
ArrayAppendLatin1Substring	33984	31644	-6.9%	1.07x (?)
ArrayAppendUTF16Substring	33516	31320	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
UTF8Decode_InitFromCustom_contiguous	190	194	192	—
UTF8Decode_InitFromCustom_contiguous_ascii	266	275	271	—
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	268	270	269	—
UTF8Decode_InitFromCustom_noncontiguous	31180	31336	31233	—
UTF8Decode_InitFromCustom_noncontiguous_ascii	95907	97210	96429	—
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	97034	97365	97224	—

Code size: -swiftlibs

✅	Benchmark Check Report
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii` name is 51 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii` name is 54 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii` name is 45 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii` name is 42 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

milseman · 2020-04-01T23:11:46Z

That's a little all over the place.

@swift-ci please benchmark

swift-ci · 2020-04-01T23:35:00Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
Set.subtracting.Empty.Box	7	9	+28.6%	0.78x
Set.isSuperset.Seq.Empty.Int	42	50	+19.0%	0.84x
Set.isDisjoint.Seq.Int.Empty	43	51	+18.6%	0.84x (?)
Set.isDisjoint.Seq.Box.Empty	78	91	+16.7%	0.86x (?)
Set.isStrictSubset.Seq.Int.Empty	106	123	+16.0%	0.86x
UTF8Decode_InitDecoding	144	167	+16.0%	0.86x
Set.isDisjoint.Box.Empty	88	102	+15.9%	0.86x
DataSubscriptSmall	13	15	+15.4%	0.87x
Set.isSubset.Seq.Int.Empty	111	128	+15.3%	0.87x (?)
ArrayLiteral2	67	77	+14.9%	0.87x (?)
Set.isStrictSubset.Int.Empty	46	52	+13.0%	0.88x (?)
Data.init.Sequence.2047B.Count.I	54	60	+11.1%	0.90x (?)
Set.isStrictSuperset.Seq.Empty.Int	152	167	+9.9%	0.91x (?)
Data.init.Sequence.2049B.Count.I	55	60	+9.1%	0.92x (?)
Set.isSubset.Int.Empty	47	51	+8.5%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	550	100	-81.8%	5.50x
Data.append.Sequence.64kB.Count.RE.I	44	21	-52.3%	2.10x
Data.append.Sequence.64kB.Count.RE	44	21	-52.3%	2.10x
Data.append.Sequence.809B.Count.RE.I	93	65	-30.1%	1.43x
Data.append.Sequence.809B.Count.RE	92	65	-29.3%	1.42x
DataAppendSequence	9200	6600	-28.3%	1.39x
FlattenListFlatMap	5928	4401	-25.8%	1.35x (?)
Dictionary4	192	151	-21.4%	1.27x (?)
DataToStringSmall	1900	1500	-21.1%	1.27x
DataCountSmall	16	13	-18.7%	1.23x
Dictionary4OfObjects	224	190	-15.2%	1.18x (?)
DataToStringMedium	3750	3200	-14.7%	1.17x
Data.append.Sequence.64kB.Count	29	25	-13.8%	1.16x
DataCountMedium	15	13	-13.3%	1.15x (?)
RemoveWhereMoveInts	19	17	-10.5%	1.12x (?)
Data.append.Sequence.64kB.Count.I	29	26	-10.3%	1.12x (?)
PrefixWhileSequence	180	163	-9.4%	1.10x (?)
PrefixWhileAnySeqCRangeIter	193	177	-8.3%	1.09x (?)
PrefixWhileAnySeqCntRange	195	179	-8.2%	1.09x (?)
ArraySetElement	284	261	-8.1%	1.09x (?)

Added	MIN	MAX	MEAN	MAX_RSS
UTF8Decode_InitFromCustom_contiguous	175	176	175	—
UTF8Decode_InitFromCustom_contiguous_ascii	200	203	202	—
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	226	231	229	—
UTF8Decode_InitFromCustom_noncontiguous	379	383	380	—
UTF8Decode_InitFromCustom_noncontiguous_ascii	869	906	887	—
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	562	566	565	—

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode.o	13468	25298	+87.8%	0.53x

Improvement	OLD	NEW	DELTA	RATIO
DataBenchmarks.o	69973	69027	-1.4%	1.01x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
Dictionary4	200	255	+27.5%	0.78x
Dictionary4OfObjects	261	325	+24.5%	0.80x
Set.isDisjoint.Seq.Int.Empty	46	53	+15.2%	0.87x (?)
UTF8Decode_InitDecoding	143	164	+14.7%	0.87x
Set.isStrictSubset.Seq.Int.Empty	111	126	+13.5%	0.88x
Set.isSubset.Seq.Int.Empty	108	122	+13.0%	0.89x
Set.isStrictSuperset.Seq.Empty.Int	155	174	+12.3%	0.89x
Set.isSuperset.Seq.Empty.Int	46	51	+10.9%	0.90x (?)
UTF8Decode	284	309	+8.8%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	550	100	-81.8%	5.50x
DataToStringSmall	1900	1450	-23.7%	1.31x
DataToStringMedium	3700	3150	-14.9%	1.17x
Array2D	4544	4000	-12.0%	1.14x (?)
UTF8Decode_InitDecoding_ascii_as_ascii	210	187	-11.0%	1.12x (?)
DataToStringLargeUnicode	4600	4100	-10.9%	1.12x
ArrayAppendLazyMap	4670	4210	-9.9%	1.11x (?)
MapReduceAnyCollection	237	216	-8.9%	1.10x (?)
MapReduce	239	218	-8.8%	1.10x (?)
StringUTF16Builder	230	210	-8.7%	1.10x (?)
AngryPhonebook.ASCII.Small	12	11	-8.3%	1.09x (?)
ArraySetElement	283	261	-7.8%	1.08x (?)
ArrayPlusEqualFiveElementCollection	4810	4440	-7.7%	1.08x (?)
ObjectiveCBridgeStringIsEqualAllSwift	53	49	-7.5%	1.08x (?)
Set.subtracting.Empty.Int	28	26	-7.1%	1.08x (?)
PrefixWhileAnySeqCntRange	197	184	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
UTF8Decode_InitFromCustom_contiguous	173	175	174	—
UTF8Decode_InitFromCustom_contiguous_ascii	197	201	198	—
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	223	225	224	—
UTF8Decode_InitFromCustom_noncontiguous	303	307	306	—
UTF8Decode_InitFromCustom_noncontiguous_ascii	631	661	645	—
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	688	694	691	—

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode.o	11535	23698	+105.4%	0.49x

Improvement	OLD	NEW	DELTA	RATIO
DataBenchmarks.o	57526	56676	-1.5%	1.01x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
Set.isDisjoint.Int.Empty	374	409	+9.4%	0.91x (?)
Set.subtracting.Seq.Empty.Box	369	403	+9.2%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataToStringEmpty	850	400	-52.9%	2.12x
UTF8Decode_InitDecoding_ascii	294	213	-27.6%	1.38x
DataToStringSmall	2250	1750	-22.2%	1.29x
DataToStringMedium	4050	3550	-12.3%	1.14x (?)
StringFromLongWholeSubstring	10	9	-10.0%	1.11x
DataToStringLargeUnicode	4850	4450	-8.2%	1.09x (?)
ArrayAppendLatin1Substring	33984	31644	-6.9%	1.07x (?)
ArrayAppendAsciiSubstring	33624	31320	-6.9%	1.07x (?)
ArrayAppendUTF16Substring	33516	31320	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
UTF8Decode_InitFromCustom_contiguous	188	191	189	—
UTF8Decode_InitFromCustom_contiguous_ascii	275	275	275	—
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	253	260	256	—
UTF8Decode_InitFromCustom_noncontiguous	29447	31166	30155	—
UTF8Decode_InitFromCustom_noncontiguous_ascii	91890	98340	95259	—
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	94629	98169	96737	—

Code size: -swiftlibs

✅	Benchmark Check Report
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii` name is 51 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii` name is 54 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_noncontiguous_ascii` name is 45 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous_ascii` name is 42 characters long. _{Benchmark name should not be longer than 40 characters.}
⛔️⏱	`UTF8Decode_InitFromCustom_contiguous_ascii` has setup overhead of 36 μs (17.9%). _{Move initialization of benchmark data to the setUpFunction registered in BenchmarkInfo.}
⛔️🔤	`UTF8Decode_InitFromCustom_contiguous` name doesn`t conform to benchmark naming convention. _{See http://bit.ly/BenchmarkNaming}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: 6-Core Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

milseman · 2020-04-02T00:05:18Z

Ugh, so with this change, release-no-assertions pass the test, but release-with-assertions have different IRGen behavior. Calls are now denoted as tail calls. Most vexingly,

define swiftcc { i64, %swift.bridge* } @"$s22utf8_decoding_fastpath16decodeUMBPAsUTF8ySSSrys5UInt8VGF"(i64 %0, i64 %1) #0 {
  %3 = tail call swiftcc { i64, %swift.bridge* } @"$s22utf8_decoding_fastpath15decodeUBPAsUTF8ySSSRys5UInt8VGF"(i64 %0, i64 %1) #0
  ret { i64, %swift.bridge* } %3
}

Meaning that in assertion builds, the UMBP version just calls the UMB version (which is actually fine and a good idea, it's probably that there's an assertion preventing some other optimization from messing this up).

milseman · 2020-04-02T00:30:40Z

... and code size regressions for the benchmark file is not great, since that will be conflated with the new benchmarks I'm adding. I'm going to spin those off because I think I have a change that will also improve code size.

atrick · 2020-04-02T03:50:20Z

@milseman you could try converting those tests to .sil to see if that helps. I'm not sure why they were .ll tests to begin with.

I personally don't think the benchmark code size numbers are meaningful. They might correlate with real code size or might just be random. SCK code size is relevant. I think "swift-ci compiler performance" or something like that gets you aggregate code size for the SCK. We don't have any way to get a per project breakdown without running it locally. This takes time, which is why I haven't checked in anything that significantly perturbs codegen in the past year!

weissi · 2020-04-02T12:27:04Z

@milseman awesome. Definitely agree that the benchmarks are super useful but SIL checks are just more stable. Regarding the tail calls to other functions: I've added -swiftmergefunc-threshold=0 to get around that in my tests, then it doesn't merge similar functions anymore.

milseman · 2020-04-02T17:43:49Z

@atrick It's more like I saw that both the fast-path and the path that makes an Array out of the code units were always-inline, and Array initializer loops is probably also always inline. I haven't checked the disassembly, but this would be tragic if so.

edit: That is only relevant if whether the collection supports withContiguousStorageIfAvailable is not inlinable. I still think it's a good thing to hand outline the Array construction path for compilation time at the very least.

atrick · 2020-04-02T20:16:31Z

@milseman I think it's a good idea to hand-outline the slow paths.

If you see problems where always-inline functions are inlined into slow paths, I think that should be fixed by adding an @inline(_optimize) attribute, which would not inline into slow paths, but would only override size-based heuristics, then migrating most of the stdlib functions to that annotation instead. That requires first adding parser support, then when everyone has the new compiler starting to use the attribute.

milseman · 2020-04-05T23:40:01Z

@swift-ci please benchmark

milseman · 2020-04-08T16:03:28Z

For those following along, migrating these test cases to SIL is pending @atrick's SIL optimizer changes. The optimizer in unable to constant-fold the metatype comparison, and thus the entirety of the code is present in SIL form (LLVM later optimizes this). It is important to fold that comparison at the SIL level to accurately represent the actual program being compiled. Since 99+% of the time these generics are known at compilation time, failing to eliminate the branch to the slow path inside this always-inline function grossly distorts any size-based heuristics.

milseman · 2020-04-09T01:00:37Z

Update: UnsafeRawBufferPointer (and mutable, and slices) cannot implement withContiguousStorageIfAvailable as that would create a typed pointer from untyped memory.

The longer-term solution to avoid having two checks is to add a withContiguousRawStorageIfAvailable that runs a closure over UnsafeRawBufferPointer. It can have a default implementation that calls withContiguousStorageIfAvailable and constructs the raw pointer from that (so existing wCSIA clients get this new path for free). Then, we can run only over the new API.

For now, I'll have to keep the _HasContiguousBytes call and just add a wCSIA fast path. Luckily with @atrick's SIL peephole optimizations, that can fold away at the SIL level. _HasContiguousBytes is a known closed set of conformances so that can (in theory) fold away. The only downside would be a generic function that calls String(decoding:as:) would have code size for both paths, but luckily that isn't too large, and is made up for by me outlining the slow Array-creating path.

karwa · 2020-04-09T12:13:21Z

@milseman Yes, that’s what I implemented in #22028

Unfortunately, basically the entire internal (ABI-exposed) String API works in terms of typed pointers. So supporting URBP would require duplicating all of those entry points.

Even though _HasContiguousBytes vends a raw pointer (so URBP can conform), the String initialiser makes a typed pointer using assumingMemoryBound, which might be an invalid binding. So the old code technically has UB, if I understand memory binding correctly (which I might not).

milseman · 2020-04-09T17:16:50Z

@karwa Doh, I forgot about that. I like most of the changes in that PR, but we'll want to do a little extra to preserve the ABI. For this PR I'll try to add the wCSIA fast-path.

Unfortunately, basically the entire internal (ABI-exposed) String API works in terms of typed pointers. So supporting URBP would require duplicating all of those entry points.

We can change that by just having them remove the typed-ness and call raw internal functions. It is always type safe to go from typed to untyped, it's the reverse that's tricky.

Even though _HasContiguousBytes vends a raw pointer (so URBP can conform), the String initialiser makes a typed pointer using assumingMemoryBound, which might be an invalid binding. So the old code technically has UB, if I understand memory binding correctly (which I might not).

This would expose the potential for UB if there was an existing type binding for that memory, that is if there were any typed pointers that weren't over UInt8. IIUC (please correct me @atrick), additionally those types pointers would have to be non-trivial in this specific case, because the code we're running is self contained (no aliasing potential).

So in this case, since the set of _HasContiguousBytes is closed, the one UB scenario would be: if a caller of String(decoding:as:) were to pass in a URBP that held non-trivially-bound memory. So we should probably fix this by changing our ABI entry points to be over raw memory. We should add a public withRawContiguousStorageIfAvailable (as you did privately in your PR) going through SE. We'll want to do this in an ABI preserving fashion.

Let me get this performance optimization in now, and we can try to get the right ABI in place.

atrick · 2020-04-09T17:23:35Z

TLDR: @milseman's plan is good

Ultimately, I think it would be better if the character decoding utilities did not require typed pointers. Ideally, there would be a different abstraction over a raw buffer and an encoding type.

However, that is not necessary in order to avoid undefined behavior. Internally, the String implementation can expose a typed pointer for the duration of a closure. As long as we know that arbitrary code does not access the same memory using a pointer with a different type, then it's safe. The String implementation can guarantee this because it does not execute any arbitrary user code during decoding. To make this 100% safe, we eventually need some compiler support (URBP.withMemoryRebound), but for now we can use assumingMemoryBound(to:) as a placeholder. It will be safe within reason here for various reasons. Eventually, I'll vet those
uses of assumingMemoryBound(to:) and convert them into a URBP.withMemoryRebound.

Sorry, I don't have time to show example code, but hopefully you get the idea. The important thing is that we have multiple viable approaches to the String implementation. The real question is how to allow 3rd party Collections to participate in fast character decoding, and other fast paths the require contiguous storage.

There has never been any question in my mind that we need something like withContiguousRawStorageIfAvailabe (or withBytesIfAvailable) as a lowest common denominator to handle any byte-buffer-like collection. Note: Any upcoming ByteBuffer APIs will need this. If we had to choose between the raw form and the existing wCSIA, we would need to choose the raw form. We can always create a typed view over raw bytes, we just can't use the strictly typed UnsafePointer/UBP to do that. On the other hand any collection that already implements wCSIA will automatically provide the raw form via a default implementation.

Switch String(decoding:as) and other entry points to call withContiguousStorageIfAvailable rather than use _HasContiguousBytes.

Outline the cold, non-contiguous UTF-8 path from String(decoding:as:), saving ~40 bytes (33%) of code size (x86_64 and arm64) from every call site where the contiguity check cannot be constant folded away.

UnsafeRawBufferPointer cannot implement withContiguousStorageIfAvailable because doing so would potentially create a typed pointer from untyped data.

milseman · 2020-04-09T20:39:54Z

@swift-ci please benchmark

swift-ci · 2020-04-09T21:36:08Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
DictionaryBridgeToObjC_Access	920	1095	+19.0%	0.84x (?)
ArrayAppendLazyMap	6090	7180	+17.9%	0.85x
UTF8Decode_InitFromCustom_noncontiguous_ascii_as_ascii	1005	1174	+16.8%	0.86x (?)
LazilyFilteredArrayContains	36600	41500	+13.4%	0.88x
StringBuilderWithLongSubstring	1630	1790	+9.8%	0.91x (?)

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous_ascii	981	293	-70.1%	3.35x
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	1017	471	-53.7%	2.16x
UTF8Decode_InitFromCustom_contiguous	456	257	-43.6%	1.77x
FlattenListLoop	5286	4700	-11.1%	1.12x (?)
RemoveWhereMoveInts	36	33	-8.3%	1.09x (?)
RemoveWhereSwapInts	64	59	-7.8%	1.08x (?)
Array2D	7520	6944	-7.7%	1.08x (?)
MapReduce	371	343	-7.5%	1.08x
MapReduceClass2	40	37	-7.5%	1.08x (?)
ArrayPlusEqualFiveElementCollection	8436	7807	-7.5%	1.08x (?)
ArrayInClass	1625	1510	-7.1%	1.08x (?)
DistinctClassFieldAccesses	325	302	-7.1%	1.08x (?)
MapReduceAnyCollection	397	369	-7.1%	1.08x (?)
FlattenListFlatMap	9562	8895	-7.0%	1.07x (?)

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
String.data.Empty	38	46	+21.1%	0.83x (?)
String.data.Small	41	49	+19.5%	0.84x (?)
ArrayAppendLazyMap	6980	7900	+13.2%	0.88x (?)
DataToStringEmpty	850	950	+11.8%	0.89x (?)
FloatingPointPrinting_Float_description_small	5076	5616	+10.6%	0.90x

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous_ascii	1005	282	-71.9%	3.56x
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	1153	433	-62.4%	2.66x
UTF8Decode_InitFromCustom_contiguous	457	255	-44.2%	1.79x
FlattenListLoop	5184	3993	-23.0%	1.30x (?)
CharacterLiteralsLarge	111	100	-9.9%	1.11x (?)
RemoveWhereMoveInts	37	34	-8.1%	1.09x (?)
Array2D	7520	6944	-7.7%	1.08x (?)
ArrayPlusEqualFiveElementCollection	8325	7696	-7.6%	1.08x (?)
RemoveWhereSwapInts	67	62	-7.5%	1.08x
ArrayInClass	1625	1510	-7.1%	1.08x (?)
DistinctClassFieldAccesses	326	303	-7.1%	1.08x (?)
RandomTree.insert.Unmanaged.fast	207	193	-6.8%	1.07x (?)
MapReduceAnyCollection	433	404	-6.7%	1.07x
MapReduce	435	406	-6.7%	1.07x (?)
CharacterLiteralsSmall	345	322	-6.7%	1.07x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
UTF8Decode.o	24154	24826	+2.8%	0.97x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
String.data.LargeUnicode	127	153	+20.5%	0.83x (?)
DataToStringEmpty	1400	1600	+14.3%	0.88x (?)
Diffing.Same	8	9	+12.5%	0.89x (?)
String.data.Empty	42	47	+11.9%	0.89x (?)
String.data.Small	45	49	+8.9%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode_InitFromCustom_contiguous_ascii	127185	489	-99.6%	260.09x
UTF8Decode_InitFromCustom_contiguous_ascii_as_ascii	127032	503	-99.6%	252.55x
UTF8Decode_InitFromCustom_contiguous	40216	303	-99.2%	132.73x

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

milseman · 2020-04-09T23:18:30Z

Interesting that restoring the _HasContiguousBytes check dropped the Data benchmark gains.

@Catfish-Man, it seems highly likely that the check isn't getting eliminated, even at the LLVM IR level. That might be an interesting thing to explore for some easy Data->String wins.

milseman · 2020-04-09T23:34:51Z

The style of the benchmarks make them frustratingly hard to investigate to crawl through the disassembly, but I found it does not remove the check:

_$s14DataBenchmarks6string_4fromySi_10Foundation0A0VtF
...
0000000000008f75         call       _$s10Foundation4DataV15_RepresentationOWOy  ; _$s10Foundation4DataV15_RepresentationOWOy
0000000000008f7a         lea        rdi, qword [_$ss19_HasContiguousBytes_pMD]  ; argument #1 for method ___swift_instantiateConcreteTypeFromMangledName, _$ss19_HasContiguousBytes_pMD
0000000000008f81         call       ___swift_instantiateConcreteTypeFromMangledName ; ___swift_instantiateConcreteTypeFromMangledName
0000000000008f86         mov        r8d, 0x6

atrick · 2020-04-10T00:39:25Z

@milseman I'm not sure what decodeUBPAsUTF8 is. I don't think any @semantic calls are involved, so I can't think of a known pass pipeline problem. If a function marked @inline(always) is not being inlined, it sounds like the optimizer should be fixed ASAP because that's your last resort. You should file a bug, but if you need it to be fixed before merging this PR we should find someone who has time to look at the -Xllvm -debug-only=sil-inliner output.

milseman · 2020-04-10T17:31:21Z

Currently working around the issue that Substring's wCSIA closure is not being inlined, despite the Builtin.onFastPath call.

milseman · 2020-04-10T18:40:41Z

@swift-ci please test

milseman · 2020-04-10T18:41:39Z

test/SILOptimizer/utf8_decoding_fastpath.swift

+//
+// NOTE: The SIL optimizer cannot currently fold away a (UTF16.self ==
+// UTF8.self) metatype comparison, so we have to disabel the check-not for UTF-8
+// construction :-(


@atrick should metatype inequality be foldable as well?

It should be easy, just didn't fall out naturally from handling equality. You have to also check that there's no potential relationship between the type values. If you file a bug I or someone else can circle around to it.

swift-ci · 2020-04-10T18:43:23Z

Build failed
Swift Test Linux Platform
Git Sha - f041354a830e3e32b415af91b35327124b2c823d

swift-ci · 2020-04-10T18:44:22Z

Build failed
Swift Test OS X Platform
Git Sha - f041354a830e3e32b415af91b35327124b2c823d

weissi · 2022-05-04T14:51:59Z

fixed.

milseman requested a review from Catfish-Man March 31, 2020 01:08

Catfish-Man approved these changes Mar 31, 2020

View reviewed changes

milseman force-pushed the merge_me_if_available branch from 4192fef to 97426c5 Compare March 31, 2020 18:11

milseman changed the title ~~[string] Change String(decoding:as:) to use withContiguousStorageIfAv…~~ [string] _HasContiguousBytes -> withContiguousStorageIfAvailable Mar 31, 2020

milseman requested a review from weissi March 31, 2020 18:13

weissi approved these changes Mar 31, 2020

View reviewed changes

milseman force-pushed the merge_me_if_available branch from 97426c5 to b19e658 Compare April 1, 2020 18:40

milseman requested a review from Catfish-Man April 1, 2020 18:41

Catfish-Man approved these changes Apr 1, 2020

View reviewed changes

atrick mentioned this pull request Apr 8, 2020

Add Builtin.is_same_metatype to SILCombine. #30892

Merged

milseman added 4 commits April 9, 2020 13:38

[gardening] Delete Trailing Whitespace

19b332c

[string] _HasContiguousBytes -> withContiguousStorageIfAvailable

c263100

Switch String(decoding:as) and other entry points to call withContiguousStorageIfAvailable rather than use _HasContiguousBytes.

[string] Outline cold path from initializer

e536ad2

Outline the cold, non-contiguous UTF-8 path from String(decoding:as:), saving ~40 bytes (33%) of code size (x86_64 and arm64) from every call site where the contiguity check cannot be constant folded away.

[string] Restore _HasContiguousBytes for untyped storage

ae224ca

UnsafeRawBufferPointer cannot implement withContiguousStorageIfAvailable because doing so would potentially create a typed pointer from untyped data.

milseman force-pushed the merge_me_if_available branch from f041354 to ae224ca Compare April 9, 2020 20:39

milseman changed the title ~~[string] _HasContiguousBytes -> withContiguousStorageIfAvailable~~ [string] String(decoding:as:) fast path for withContiguousStorageIfAvailable Apr 10, 2020

milseman added 2 commits April 10, 2020 11:39

[string] Move wCSIA check higher than _HasContiguousBytes

d02f5bc

[string] Convert IR tests to SIL tests

38fce16

milseman commented Apr 10, 2020

View reviewed changes

milseman merged commit c1ab69e into apple:master Apr 10, 2020

weissi mentioned this pull request Jun 10, 2020

investigate ByteBufferView supporting fast String(decoding:as:) as a workaround for SR-12125 apple/swift-nio#1379

Closed

milseman mannequin mentioned this pull request Mar 31, 2020

[SR-12125] String(decoding:from:) doesn't try withContiguousStorageIfAvailable #54560

Closed

CodaFi mentioned this pull request Aug 1, 2020

[SR-12126] String(decoding:as:) allocates if passed an ArraySlice<UInt8> #54561

Closed

milseman deleted the merge_me_if_available branch May 4, 2022 16:56

[string] String(decoding:as:) fast path for withContiguousStorageIfAvailable #30729

[string] String(decoding:as:) fast path for withContiguousStorageIfAvailable #30729

Conversation

milseman commented Mar 31, 2020 • edited

milseman commented Mar 31, 2020

swift-ci commented Mar 31, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Catfish-Man left a comment

Choose a reason for hiding this comment

milseman commented Mar 31, 2020

weissi left a comment

Choose a reason for hiding this comment

swift-ci commented Mar 31, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

milseman commented Mar 31, 2020

weissi commented Apr 1, 2020

milseman commented Apr 1, 2020

milseman commented Apr 1, 2020

swift-ci commented Apr 1, 2020

swift-ci commented Apr 1, 2020

milseman commented Apr 1, 2020

swift-ci commented Apr 1, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

milseman commented Apr 1, 2020

swift-ci commented Apr 1, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

milseman commented Apr 2, 2020

milseman commented Apr 2, 2020

atrick commented Apr 2, 2020

weissi commented Apr 2, 2020

milseman commented Apr 2, 2020 • edited

atrick commented Apr 2, 2020 • edited

milseman commented Apr 5, 2020

milseman commented Apr 8, 2020

milseman commented Apr 9, 2020

karwa commented Apr 9, 2020 • edited

milseman commented Apr 9, 2020

atrick commented Apr 9, 2020

milseman commented Apr 9, 2020

swift-ci commented Apr 9, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

milseman commented Apr 9, 2020

milseman commented Apr 9, 2020

atrick commented Apr 10, 2020

milseman commented Apr 10, 2020

milseman commented Apr 10, 2020

milseman Apr 10, 2020

Choose a reason for hiding this comment

atrick Apr 10, 2020

Choose a reason for hiding this comment

swift-ci commented Apr 10, 2020

swift-ci commented Apr 10, 2020

weissi commented May 4, 2022

milseman commented Mar 31, 2020 •

edited

milseman commented Apr 2, 2020 •

edited

atrick commented Apr 2, 2020 •

edited

karwa commented Apr 9, 2020 •

edited