-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[se-0405] Implement API additions #68419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@swift-ci please test |
@swift-ci please smoke test linux platform |
@swift-ci please test linux platform |
@swift-ci please smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs another look - particularly when it comes to the stack buffer.
stdlib/public/core/String.swift
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates an Array (which will likely require resizing as transcoded code-units are appended to it -- the ._validate
function used by the contiguous path estimates 3-4x as many bytes of UTF8 out as code-units in), then allocates separate String storage and writes the result to it.
Have you tried performing a dry-run of the transcoding and measuring the required capacity, then allocating String storage and writing directly to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh we can't do a dry run because this takes a Sequence, not a Collection 😔.
But it's pointless - we're not going to do this in a single pass; we're just going to copy internally anyway. This API should've taken a Collection, IMO.
We can dispatch based on whether or not the type is a collection, but the optimiser does a poor job specialising it (#62264). I'd still suggest we do that, though, and let the optimiser catch up. The existential wrapping and unwrapping overhead is likely less than allocating and copying everything to an array, and one day the compiler will just eliminate that overhead entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a bit of improvement to be done to the internal transcoding machinery, and this slowest path will then be adapted to use that. It would be better to allocate a string buffer directly and resize that, but it is not possible at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it is fairly common in C APIs (e.g. wcsrtombs
, WideCharToMultiByte
, etc) to perform a dry run by specifying the output buffer as NULL
, getting the length, and converting in to an appropriately-sized buffer. That's why I suggested it.
Resizing is less than ideal because it's quadratic. Array mitigates this somewhat with an over-allocation strategy that scales geometrically, but that is also wasteful, and I don't think String employs the same strategy (?).
For Sequence, where we can only make one pass, we unfortunately have to transcode in to some kind of resizing buffer. For Collection, we can just do two passes, guaranteeing no resizing and lower memory water-mark.
stdlib/public/core/String.swift
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth checking this later (after writing). Checking whether contiguous UTF8 is ASCII is super-cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking after writing would potentially mean re-loading parts of a large string buffer that has already been expunged from cache, and that wouldn't be cheap. A chunked transcoding API would be much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, modern processors have several megabytes of cache -- far larger than most UTF8 strings (and if your string is several megabytes, this isn't going to be significant either way). The ASCII check on contiguous data can process 8 bytes in a single instruction (more if we SIMDize it), so I figured it may (or may not; the only way to know is to test it) be better to keep the transcoding loop tighter and perform a separate pass for this analysis.
Chunked transcoding might be a good middle-ground, though. That's a fair point.
stdlib/public/core/String.swift
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is difficult to read. Might I suggest:
let contiguousResult: String?? = ...
switch contiguousResult {
case .some(.some(let newString)):
self = newString
return
case .some(.none):
return nil
default:
break // source is non-contiguous.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the switch alternative to be a particularly easy read either. I tried to improve readability by using fastidiously named bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generic function - transcoding will ultimately be performed by a user-defined type, and if that type were to produce more than 4 bytes of UTF8 for its custom code-units, this would over-write the buffer and corrupt the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toy example, showing that the Unicode.Encoding
API allows this. This encoding is UTF8, except that every time the real UTF8 would parse a single scalar from its code-units, this encoding parses 5 repetitions of that scalar:
https://gist.github.com/karwa/ece9cdcf8c66613fdeea85bcd8b2cea8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We should bounds-check here. We should write directly into a string buffer, reallocating when appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only part that benefits from contiguous storage, because validateUTF8
and _allASCII
require an unsafe buffer pointer. The second part of the function (the slow path, which transcodes) doesn't need contiguous storage at all, and is basically duplicated code from the initialiser.
I would suggest splitting in to separate functions, and making the call to the contiguous UTF8/ASCII one directly from the initialiser. Since the initialiser is inlinable, I'd expect the compiler could specialise this to a direct call to the UTF8/ASCII fast-path in most cases.
The comment that I made above about performing a dry run and writing directly to string storage would then apply to the second function (the slow path), and we could remove the transcoding in the initialiser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I forgot to respond to this earlier. It is superficially true that the two slow paths are similar at this time, but we have intentionally made the _validate
function non-inlinable in order to have room to improve it using API that is not yet public. In particular, the current public API doesn't allow for chunked decoding, which would be a significant speedup in a generic context, by significantly amortizing the function call overhead. Getting there from here will be in future work.
bbe0524
to
db516fc
Compare
@swift-ci please test |
Co-authored-by: Ben Rimmington <me@benrimmington.com>
db516fc
to
c36b79a
Compare
Rebased to pick up the new ABI checker |
49d6ed2
to
7a60811
Compare
@swift-ci please test |
d2244d7
to
80961df
Compare
@swift-ci please test |
stdlib/public/core/String.swift
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When print
ing an optional, or when debugPrint
ing, the \0
character will be output:
"Ca\0fé"
All examples will have a compiler warning:
Expression implicitly coerced from 'String?' to 'Any'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples don't intend to debugPrint
, but it slipped my mind that printing through Optional
would do that. Perhaps I should change them to something like print(valid ?? "nil")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(valid == "Ca\0fé")
// Prints "true"
print(invalid == nil)
// Prints "true"
@swift-ci please smoke test |
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to hold up this PR, but it would make sense to add some benchmarks so that if we can improve the code in the future (e.g. skipping an intermediary allocation) we'd see the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point in time we were worried about the heap size of these objects. Does this change it? If so, would it make sense to put in _countAndFlags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't consider changing _countAndFlags
, that is a good thought. As is, 1 byte is added to the class's stored properties, so it goes from 32 to 33 bytes on 64-bit platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to allocate a normal _StringStorage
instance of appropriate size and write into that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to do that as a next step. The __StringStorage
class doesn't currently have anything resembling an appropriate initializer, though.
08b750f
to
ac47533
Compare
@swift-ci please test |
API additions (and tests) from SE-0405, namely:
The API renaming is in #68423.
rdar://114999766