-
Notifications
You must be signed in to change notification settings - Fork 719
let the stdlib enumerate the Sequence elements #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Motivation: Very unfortunately [SR-7602](https://bugs.swift.org/browse/SR-7602) Swift's String doesn't work very well on UTF-8 encoded data. Fortunately (but unfairly see point (3) in [SR-7602](https://bugs.swift.org/browse/SR-7602)) however there is now a contiguous String storage for ASCII-only Strings that _only the stdlib_ can access. To benefit from the stdlib's capabilities of enumerating the bytes faster than we can, we should use UnsafeMutableBufferPointer#initialize(from: Bytes). Thanks @airspeedswift for pointing this out. The performance gains on a recent Swift version are noticeable: after this patch: ``` no-net_http1_10k_reqs_1_conn: 0.225782990455627, 0.228317022323608, 0.225206971168518, 0.22549307346344, 0.223728060722351, 0.225827932357788, 0.222437977790833, 0.221745014190674, 0.220004916191101, 0.222965002059937 http1_10k_reqs_1_conn: 0.688782095909119, 0.682757973670959, 0.689703941345215, 0.708670020103455, 0.691922068595886, 0.687381982803345, 0.681567072868347, 0.684866905212402, 0.683283090591431, 0.699701905250549 http1_10k_reqs_100_conns: 0.782313942909241, 0.779283046722412, 0.773630023002625, 0.776782989501953, 0.774596929550171, 0.775121927261353, 0.773631930351257, 0.784304022789001, 0.795854926109314, 0.786051988601685 ``` before this patch: ``` no-net_http1_10k_reqs_1_conn: 0.232797980308533, 0.236240983009338, 0.237086057662964, 0.236070990562439, 0.236560940742493, 0.236254930496216, 0.234392046928406, 0.235393047332764, 0.234088897705078, 0.245710015296936 http1_10k_reqs_1_conn: 0.76228404045105, 0.788464903831482, 0.747326016426086, 0.753327012062073, 0.735787987709045, 0.727090001106262, 0.720335006713867, 0.721843004226685, 0.719871044158936, 0.728644967079163 http1_10k_reqs_100_conns: 0.841314911842346, 0.884468078613281, 0.854345917701721, 0.847344040870667, 0.847607016563416, 0.841101050376892, 0.814589977264404, 0.818693995475769, 0.832610964775085, 0.828818082809448 ``` Modifications: use UnsafeMutableBufferPointer#initialize(from: Bytes) to enumerate String's bytes. Result: faster but still not fast enough, we need SR-7602 to be addressed
| var (iterator, idx) = UnsafeMutableBufferPointer(start: base, count: underestimatedByteCount).initialize(from: bytes) | ||
| assert(idx == underestimatedByteCount) | ||
| while let b = iterator.next() { | ||
| assert(S.self != String.UTF8View.self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@milseman I'm assuming String.UTF8View does never underestimate its count, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it is equivalent to count because that's the default behavior and it doesn't override it. That's likely to change at some point (CC @moiseev). But, I think it can give a very good underestimate by just returning the number of code units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lukasa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection here.
CC @milseman / @airspeedswift / @tanner0101
Motivation:
Very unfortunately SR-7602
Swift's String doesn't work very well on UTF-8 encoded data. Fortunately
(but unfairly see point (3) in
SR-7602) however there is now a
contiguous String storage for ASCII-only Strings that only the stdlib
can access. To benefit from the stdlib's capabilities of enumerating the
bytes faster than we can, we should use UnsafeMutableBufferPointer#initialize(from: Bytes).
Thanks @airspeedswift for pointing this out.
The performance gains on a recent Swift version are noticeable:
after this patch:
before this patch:
Modifications:
use UnsafeMutableBufferPointer#initialize(from: Bytes) to enumerate
String's bytes.
Result:
faster but still not fast enough, we need SR-7602 to be addressed