UnicodeScalar operators. #2439

johnno1962 · 2024-01-23T18:49:41Z

Hi Apple,

Some ideas being explored on this thread in Swift evolution trying to validate the use of a protocol extension for avoiding having to use UInt8(ascii:) all the time for low level code. The TL;DR is that this change tidies up the new Lexer code in Cursor.swift considerably but I've not been able to quantify any performance regression for a Release build. For details on how it was benchmarked see the remainder of the character-literals branch.

Cheers

ahoppen · 2024-01-24T20:03:40Z

Just to make sure I understand this correctly (I haven’t looked at the code yet):

Is this a PR you would like to merge to swift-syntax as-is or is it primarily intended as an example to see how code would look like if the code pitched in https://forums.swift.org/t/single-quoted-character-literals-why-yes-again/61898/88 was in the standard library?
To test runtime performance of swift-syntax with these changes, I suggest you build the SwiftParserCLI package and run swift-parser-cli performance-test.

johnno1962 · 2024-01-24T21:24:46Z

Thanks for this info @ahoppen,

At this stage I'm looking to validate the concept with the perfect test project swift-syntax. Your benchmarking is more exacting than mine and I'm seeing this as a baseline with the original code

% swift build -c release --package-path SwiftParserCLI
% ./SwiftParserCLI/.build/release/swift-parser-cli performance-test --directory ../swift/test --iterations 10
Time: 742.353892326355ms
Instructions: 10026665060.6

And this for the code I suggested:

Time: 996.0793972015381ms
Instructions: 12108098163.4

Which I guess is not good news for this approach. There is a compromise using this extension instead of the operators:

extension UInt8: ExpressibleByUnicodeScalarLiteral {
    /// Make UInt8 expressible by "c" (probably not worth it)
    @_transparent
    public init(unicodeScalarLiteral value: UnicodeScalar) {
        self.init(value.value)
    }
}

Which yields something in the middle.

Time: 875.1207947731018ms
Instructions: 11375745753.9

I imagine it would be difficult to justify a one-time aesthetic code tidy-up which in any way degraded performance.

johnno1962 · 2024-01-24T22:35:14Z

An update, by changing @_transparent to @inline(__always) in my extensions I get the following results:

Time: 745.8992958068848ms
Instructions: 10250689930.1

For the "ExpressableBy" extension alternative I mentioned:

Time: 739.4865036010742ms
Instructions: 10204277326.7

i.e. no slow down which is much more encouraging!

johnno1962 · 2024-01-25T19:56:14Z

Hi @ahoppen, I've been verifying a few more things, for example, that build time is not affected by the new code. Also, as I noted in the evolution post, for a Debug build, performance is about 30% down though this is relative to Debug builds being about 10x slower anyway so perhaps this would be less noticeable.

Debug, original code:
Time: 58800.06802082062ms
Instructions: 787493993112.0

Debug, using this PR:
Time: 86279.06596660614ms
Instructions: 1140467798820.0

If it's of interest to you, I'd like to present this PR for merging into swift-syntax now. Although my eventual aim is to make the extensions that facilitate direct comparisons between integers and strings (UnicodeScalars) available in the std library it would be useful if they were thoroughly exercised first in another project and this would also help make the case when I eventually pitch to the stdlib. Waiting until they were available in stdlib would only introduce a delay of a number of years before the new coding style could be adopted. I've checked that the PR builds with a toolchain that also includes the new operators in it's stdlib.

Over to you (unless you have any other changes you'd like me to make.)

ahoppen · 2024-01-26T03:21:11Z

My preference would be to not take this PR. It makes the code harder to read because it deviates from how UInt8 comparisons are don in any other Swift codebase. Apart from that preference, I also think that any kind of compile-time or build-time regression is not acceptable.

johnno1962 · 2024-01-26T11:27:37Z

Thanks @ahoppen, I quite understand a position: "Why would I make a change to the code making it more difficult to understand while at the same time making it run slower while I'm debugging"!

I've pushed a final commit using a simpler ExpressibleByUnicodeScalarLiteral extension only (which introduces a. warning with development versions of the compiler). This brings Release and Debug, build and run time performance, before and after the PR to be almost identical. If you don't feel the refactor is clearer however I guess there isn't much I can do about that.

Debug before:
% time swift build -c debug --package-path SwiftParserCLI
Build complete! (21.25s)
swift build -c debug --package-path SwiftParserCLI 57.91s user 6.08s system 294% cpu 21.750 total
Build complete! (19.49s)
swift build -c debug --package-path SwiftParserCLI 59.07s user 5.73s system 326% cpu 19.867 total
% ./SwiftParserCLI/.build/debug/swift-parser-cli performance-test --directory ../swift/test --iterations 1
Time: 59025.82097053528ms
Instructions: 787613934024.0
Time: 60317.728996276855ms
Instructions: 787577146883.0
Time: 59705.91497421265ms
Instructions: 787433747372.0

Debug after PR:
% time swift build -c debug --package-path SwiftParserCLI
Build complete! (21.92s)
swift build -c debug --package-path SwiftParserCLI 59.02s user 7.42s system 295% cpu 22.466 total
Build complete! (21.61s)
swift build -c debug --package-path SwiftParserCLI 60.11s user 6.79s system 304% cpu 21.998 total
% ./SwiftParserCLI/.build/debug/swift-parser-cli performance-test --directory ../swift/test --iterations 1
Time: 58846.640944480896ms
Instructions: 791415934358.0
Time: 58897.018909454346ms
Instructions: 792114724719.0
Time: 58796.53799533844ms
Instructions: 792059332230.0

Release before:
% time swift build -c release --package-path SwiftParserCLI
Build complete! (171.27s)
swift build -c release --package-path SwiftParserCLI 249.60s user 24.43s system 159% cpu 2:51.81 total
Build complete! (163.93s)
swift build -c release --package-path SwiftParserCLI 248.68s user 19.04s system 162% cpu 2:44.37 total
% ./SwiftParserCLI/.build/release/swift-parser-cli performance-test --directory ../swift/test --iterations 10
Time: 741.7734026908875ms
Instructions: 10027644813.8
Time: 741.2122011184692ms
Instructions: 10030413083.4
Time: 739.5735025405884ms
Instructions: 10027683834.2

Release after PR:
% time swift build -c release --package-path SwiftParserCLI
Build complete! (160.12s)
swift build -c release --package-path SwiftParserCLI 248.15s user 15.97s system 164% cpu 2:40.65 total
Build complete! (159.21s)
swift build -c release --package-path SwiftParserCLI 247.30s user 16.22s system 165% cpu 2:39.65 total
% ./SwiftParserCLI/.build/release/swift-parser-cli performance-test --directory ../swift/test --iterations 10
Time: 745.3409910202026ms
Instructions: 10204463495.4
Time: 742.9486036300659ms
Instructions: 10198957963.7
Time: 742.7903056144714ms
Instructions: 10200009868.6

Unfortunately, using an ExpressibleBy extension could never be in the standard library as it is a bit of a loose cannon and would allow nonsense expressions such as "a" * "a" to be valid in an integer context. So, I guess the space is well and truly explored. Thanks for your time, the performance-test benchmark has been a great help.

ahoppen · 2024-01-26T21:52:54Z

Oh, this looks a lot better now. And I just checked that

func testMyStuff(x: UInt8) {
    switch x {
    case "A", "B", "C",
      "D", "E", "F",
      "G", "H", "I",
      "J", "K", "L",
      "M", "N", "O",
      "P", "Q", "R",
      "S", "T", "U",
      "V", "W", "X",
      "Y", "Z",
      "a", "b", "c",
      "d", "e", "f",
      "g", "h", "i",
      "j", "k", "l",
      "m", "n", "o",
      "p", "q", "r",
      "s", "t", "u",
      "v", "w", "x",
      "y", "z",
      "_":
      print("x")
    default:
      break
    }
}

compiles down to the same IR as when using UInt8(ascii:), so there shouldn’t be any runtime performance regression.

johnno1962 · 2024-01-26T22:09:44Z

Yes, things are shaping up better now. Were you looking at the last commit where I was able to tune the operator approach? Looking at the code more closely it was the comparison operations on an optional that were the problem. If you stick to concrete types rather than protocols the speed regression disappears. The last commit may even be a percentage point or two faster than the baseline!

ahoppen · 2024-01-27T00:09:46Z

Yes, I looked at the last version of the PR that only adds the operator overloads.

I think we’re good to take this but I would prefer a couple minor changes:

I would only define the ==, != and ~= operators. <, > seem to only be used once and <=, >= and - seem to not be used at all. To avoid adding more symbols to operator lookup when not necessary I would prefer to remove them
Could you reformat the switch statements because each case can now hold a lot more than three characters while staying in the 160 column limit.

johnno1962 · 2024-01-27T00:41:07Z

Great! I've made the changes you asked for, let me know if there is anything else. I've been able to produce a toolchain with these operators and using @_alwaysEmitIntoClient they seem to backport fine:

Release performance-test:
./SwiftParserCLI/.build/release/swift-parser-cli performance-test --directory ../swift/test --iterations 10
Time: 762.2112989425659ms
Instructions: 10422817218.5
Time: 761.7305994033813ms
Instructions: 10423392805.5
Time: 765.9548997879028ms
Instructions: 10415461468.1

Debug performance-test:
% ./SwiftParserCLI/.build/debug/swift-parser-cli performance-test --directory ../swift/test --iterations 1
Time: 56992.75600910187ms
Instructions: 781180707632.0
Time: 56739.298939704895ms
Instructions: 781062066976.0
Time: 56700.37305355072ms
Instructions: 781040261203.0

A good result, slightly slower for a release build using the toolchain but perhaps that can be fine tuned later. The version you're using should be faster. Thanks for your patience!

ahoppen · 2024-01-27T00:56:07Z

Oh, I only now spotted that there is a performance regression in release builds I didn’t count the zeros correctly in 10027644813.8 and 10204463495.4 from your last measurement.

To me, a requirement for this PR is that it compiles to the exact same code as before and doesn’t have any performance regression. We have jumped through bigger hoops to get a 2% performance improvement and it would be a shame to loose them for something that fairly minor and local like this.

johnno1962 · 2024-01-27T01:15:15Z

Fair enough, you may be good to go now. I don't know about the instruction count but we're down into the 730's.

% ./SwiftParserCLI/.build/release/swift-parser-cli performance-test --directory ../swift/test --iterations 10
Time: 737.7013087272644ms
Instructions: 10257822390.4
Time: 738.1468057632446ms
Instructions: 10261760931.5
Time: 740.6180024147034ms
Instructions: 10257334543.6
Time: 739.7215962409973ms
Instructions: 10258337438.3
Time: 739.1029000282288ms
Instructions: 10258197234.8
Time: 740.1492953300476ms
Instructions: 10254890075.9

The toolchain regression seems to be related to @_alwaysEmitIntoClient and I can look at that later.

ahoppen · 2024-01-27T01:19:30Z

I don’t understand how @_alwaysEmitIntoClient would have any effect but 🤷🏽

Regarding instruction counts: I found that they are a very stable way of measuring performance and are usually what I use because there’s so much less noise. And especially because we are aiming to produce the same binary after the change, there’s no noise created by potential delays when waiting for memory (which might not increment the instruction count), so I think we should evaluate performance based on instruction count.

johnno1962 · 2024-01-27T01:42:44Z

OK, leave it with me but if you spot anything obvious in the operator code like my last change let me know.

johnno1962 · 2024-01-29T11:14:04Z

Good Morning @ahoppen,

I took a long look at these rogue instruction counts over the weekend. I approached this by copying the main branch version of Sources/SwiftParser/Lexer/Cursor.swift to one side then fetching my branch for this PR and copying the copy of Cursor.swift back into the repo. I then slowly worked though discarding the diffs one by one progressively reinstating the proposed changes. What I found was the instruction count slowly built up till you reach a certain point with two or three hunks remaining to revert where it it suddenly jumped up to the values you're seeing with even the smallest change. It seems like there is something non-linear (a fixed size optimisation window or page size or something - chose your explanation) controlling the instruction count. All the while I had the impression the run time execution elapsed time was decreasing ever so slightly. So, it seems it is possible code can be executing more instructions and yet executes more quickly in real time which has to be the measurement to keep an eye on even if it is more variable.

So, all I can do is gather statistics and document this conclusion is valid. Looking first at build times (using the reps.* scripts I checked into my character-literals branch I get the following results:

Release build time before:
Time: 165.696s Δ3.528 2.13%
Time: 164.148s Δ2.171 1.32%
Release build time after:
Time: 164.308s Δ2.959 1.80%
Time: 163.419s Δ4.524 2.77%

The Δ figure is the standard deviation across multiple runs and the % figure the deviation divided by the mean which is a normalised measure of variability. So, given how variable build times are there is no evidence of any significant difference.

Turning to the run-time performance results I'm seeing the following:

Release runtime performance before:
Time: 742.507ms Δ4.979 0.67%
Instructions: 10028272804.338 Δ2477806.458 0.02%
Time: 741.837ms Δ3.593 0.48%
Instructions: 10028178965.391 Δ2867472.287 0.03%

Release runtime performance after:
Time: 738.265ms Δ3.064 0.41%
Instructions: 10259635879.697 Δ2625566.019 0.03%
Time: 737.977ms Δ3.025 0.41%
Instructions: 10258652211.847 Δ2532919.354 0.02%

Debug runtime performance before:
Time: 58729.210ms Δ67.024 0.11%
Instructions: 787565050168.200 Δ183473904.498 0.02%

Debug runtime performance after:
Time: 57993.601ms Δ85.047 0.15%
Instructions: 793125877908.400 Δ256072306.405 0.03%

You can see how much more variable the time measurements are and yet if you run enough repetitions (100 in this case) there does seem to be a detectable improvement in real life performance despite a ~2% increase in instruction counts.

With respect to @_alwaysEmitIntoClient it wasn't making a difference in the end as the 20ms slow down I was seeing seems to be inherent in preparing a toolchain from the Swift sources and comparing it to that of an actual Xcode release.

TBH this PR is in better shape than I anticipated. I'd expected to be having to argue for the refactor in the face of build or run time speed regressions (however small) but I've been unable to detect any evidence of either (if anything quite the opposite). I'd like to think this would be enough evidence for a high level of assurance merging it wouldn't be a mistake.

ahoppen · 2024-01-29T16:57:37Z

That is interesting but I would trust the instruction counts here more than the time, because:

Execution time is easily influenced by external factors like general CPU temperature or whether more processes are running in the background. In my experience it’s quite easy to get systematic errors here
Based on the PR, I don’t see how it could speed up execution time. In the best case it should compile down to the same assembly as the original code.
My statistical knowledge is a little rusty but the difference between execution time is within the standard deviation while the difference in instruction count is well without the standard deviation. I believe this indicates that the difference in execution time might be due to statistical fluctuations while the difference in instruction count is statistically significant.

I think what should be investigated here (and I think would also be important if this became a language features), is why it compiles down to different binary code and what can be done there to make sure it’s a transparent change as far as compilation is concerned.

johnno1962 · 2024-01-30T13:20:19Z

More numbers for today after reverting a couple of commits. The most import change was reverting to an annotation of @_transparent instead of @inline(__always) for the operators. I also reverted the only "optimisation" to Cursor.swift I had made to check for nil values separately from switch statements. You should find the instruction counts are in line with your expectations now while wall clock performance has remained slightly improved over the baseline.

Release runtime performance before PR:
Time: 742.507ms Δ4.979 0.67%
Instructions: 10028272804.338 Δ2477806.458 0.02%
Time: 741.837ms Δ3.593 0.48%
Instructions: 10028178965.391 Δ2867472.287 0.03%
Time: 742.815ms Δ4.806 0.65%
Instructions: 10027972755.874 Δ2491993.943 0.02%
Time: 741.211ms Δ3.782 0.51%
Instructions: 10027619991.396 Δ2432832.541 0.02%

Release runtime performance after PR:
Time: 739.074ms Δ4.193 0.57%
Instructions: 10021806841.520 Δ2597281.399 0.03%
Time: 737.766ms Δ1.611 0.22%
Instructions: 10020808717.379 Δ2679588.421 0.03%
Time: 738.697ms Δ3.448 0.47%
Instructions: 10021543757.248 Δ2826853.241 0.03%
Time: 738.266ms Δ1.921 0.26%
Instructions: 10021309207.203 Δ2554471.708 0.03%

Debug runtime performance before PR:
Time: 58729.210ms Δ67.024 0.11%
Instructions: 787565050168.200 Δ183473904.498 0.02%
Time: 59045.758ms Δ896.555 1.52%
Instructions: 787474677956.600 Δ144497624.230 0.02%

Debug runtime performance after PR:
Time: 57762.088ms Δ115.130 0.20%
Instructions: 793731368393.700 Δ201101277.253 0.03%
Time: 57961.811ms Δ228.974 0.40%
Instructions: 793822097838.000 Δ253472432.254 0.03%

ahoppen

Oh, nice that the instruction counts are the same now 🎉 One minor comment, otherwise looks good to me.

And could you squash your commits? Just makes for a nicer git history https://github.com/apple/swift-syntax/blob/main/CONTRIBUTING.md#authoring-commits

ahoppen · 2024-01-30T16:44:35Z

Sources/SwiftParser/Lexer/UnicodeScalarExtensions.swift

+    /// Basic equality operators
+    @_transparent
+    static func == (i: Self, s: Unicode.Scalar) -> Bool {


The doc comment applies only to the == function, so Basic equality operators doesn’t really make sense.

johnno1962 · 2024-01-30T17:16:57Z

Doc comment edited and duly squashed.

ahoppen · 2024-01-31T17:48:51Z

@swift-ci Please test

ahoppen · 2024-01-31T18:53:01Z

@swift-ci Please test Windows

johnno1962 · 2024-01-31T20:30:52Z

@ahoppen, I have a commit with the indentation fixed. Do you want me to --amend it onto this PR?

johnno1962 · 2024-01-31T23:06:40Z

I've force pushed the indentation fix if someone wants to @swift-ci Please test again

ahoppen · 2024-01-31T23:20:32Z

@swift-ci Please test

And just in case you were wondering, only contributors with commit access can trigger CI https://github.com/apple/swift-syntax/blob/main/CONTRIBUTING.md#review-and-ci-testing

ahoppen · 2024-02-01T00:13:55Z

@swift-ci Please test

ahoppen · 2024-02-01T17:47:48Z

@swift-ci Please test

ahoppen · 2024-02-01T17:53:55Z

@swift-ci Please test Windows

johnno1962 · 2024-02-01T21:39:12Z

Excellent, thanks @ahoppen. I love It when a plan comes together.

johnno1962 requested review from ahoppen and bnbarham as code owners January 23, 2024 18:49

johnno1962 force-pushed the pr-back branch 2 times, most recently from ffc5dc1 to 74161fd Compare January 27, 2024 15:27

johnno1962 force-pushed the pr-back branch from 7bc206f to e7fd73f Compare January 30, 2024 16:28

ahoppen reviewed Jan 30, 2024

View reviewed changes

johnno1962 force-pushed the pr-back branch from 9dd1e69 to 058b294 Compare January 30, 2024 17:15

ahoppen approved these changes Jan 31, 2024

View reviewed changes

johnno1962 force-pushed the pr-back branch from 058b294 to 921106d Compare January 31, 2024 23:05

UnicodeScalar operators.

9f7ee9b

johnno1962 force-pushed the pr-back branch from 921106d to 9f7ee9b Compare February 1, 2024 08:36

ahoppen enabled auto-merge February 1, 2024 17:53

ahoppen merged commit 114a6a1 into apple:main Feb 1, 2024
3 checks passed

This was referenced Feb 20, 2024

Simple operators for character value comparisons. apple/swift#71749

Open

Operators for UInt8 comparisons to unicode scalars apple/swift-evolution#2329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeScalar operators. #2439

UnicodeScalar operators. #2439

johnno1962 commented Jan 23, 2024 •

edited

ahoppen commented Jan 24, 2024

johnno1962 commented Jan 24, 2024 •

edited

johnno1962 commented Jan 24, 2024 •

edited

johnno1962 commented Jan 25, 2024 •

edited

ahoppen commented Jan 26, 2024

johnno1962 commented Jan 26, 2024

ahoppen commented Jan 26, 2024

johnno1962 commented Jan 26, 2024

ahoppen commented Jan 27, 2024

johnno1962 commented Jan 27, 2024

ahoppen commented Jan 27, 2024

johnno1962 commented Jan 27, 2024 •

edited

ahoppen commented Jan 27, 2024

johnno1962 commented Jan 27, 2024

johnno1962 commented Jan 29, 2024 •

edited

ahoppen commented Jan 29, 2024

johnno1962 commented Jan 30, 2024

ahoppen left a comment

ahoppen Jan 30, 2024

johnno1962 commented Jan 30, 2024 •

edited

ahoppen commented Jan 31, 2024

ahoppen commented Jan 31, 2024

johnno1962 commented Jan 31, 2024

johnno1962 commented Jan 31, 2024

ahoppen commented Jan 31, 2024

ahoppen commented Feb 1, 2024

ahoppen commented Feb 1, 2024

ahoppen commented Feb 1, 2024

johnno1962 commented Feb 1, 2024

UnicodeScalar operators. #2439

UnicodeScalar operators. #2439

Conversation

johnno1962 commented Jan 23, 2024 • edited

ahoppen commented Jan 24, 2024

johnno1962 commented Jan 24, 2024 • edited

johnno1962 commented Jan 24, 2024 • edited

johnno1962 commented Jan 25, 2024 • edited

ahoppen commented Jan 26, 2024

johnno1962 commented Jan 26, 2024

ahoppen commented Jan 26, 2024

johnno1962 commented Jan 26, 2024

ahoppen commented Jan 27, 2024

johnno1962 commented Jan 27, 2024

ahoppen commented Jan 27, 2024

johnno1962 commented Jan 27, 2024 • edited

ahoppen commented Jan 27, 2024

johnno1962 commented Jan 27, 2024

johnno1962 commented Jan 29, 2024 • edited

ahoppen commented Jan 29, 2024

johnno1962 commented Jan 30, 2024

ahoppen left a comment

Choose a reason for hiding this comment

ahoppen Jan 30, 2024

Choose a reason for hiding this comment

johnno1962 commented Jan 30, 2024 • edited

ahoppen commented Jan 31, 2024

ahoppen commented Jan 31, 2024

johnno1962 commented Jan 31, 2024

johnno1962 commented Jan 31, 2024

ahoppen commented Jan 31, 2024

ahoppen commented Feb 1, 2024

ahoppen commented Feb 1, 2024

ahoppen commented Feb 1, 2024

johnno1962 commented Feb 1, 2024

johnno1962 commented Jan 23, 2024 •

edited

johnno1962 commented Jan 24, 2024 •

edited

johnno1962 commented Jan 24, 2024 •

edited

johnno1962 commented Jan 25, 2024 •

edited

johnno1962 commented Jan 27, 2024 •

edited

johnno1962 commented Jan 29, 2024 •

edited

johnno1962 commented Jan 30, 2024 •

edited