-
-
Notifications
You must be signed in to change notification settings - Fork 423
Fix Issue 18279 - rt.util.utf does not properly reserve buffer in toUTF16/toUTF16z #2053
Conversation
Thanks for your pull request, @n8sh! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Some tips to help speed things up:
Bear in mind that large or tricky changes may require multiple rounds of review and revision. Please see CONTRIBUTING.md for more information. Bugzilla references
|
Thanks for your PR. I guess a unittest is not really feasible, is it? |
src/rt/util/utf.d
Outdated
r.length = slen; | ||
r.length = 0; | ||
if (!__ctfe) | ||
r.reserve(slen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserve still does a lot if slen is zero. Maybe better to return early for that case. Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... reserve probably shouldn't be expensive if request is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can just return ""
in that case, or "".ptr
for z versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserve probably shouldn't be expensive if request is 0
reserve returns the allocated block size, so unfortunately, hard to avoid looking up block info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay will add a check for slen == 0
@n8sh in the future, if you could post a link to the PR on bugzilla when you open it, so others don't duplicate your work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix one issue and I think it’s good
src/rt/util/utf.d
Outdated
// Reserve still does a lot if slen is zero. | ||
// Return early for that case. | ||
if (0 == slen) | ||
return &""w[0];// Terminating null is auto-included at end of string literals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t right. Return ””w.ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do that I get a complaint about the code not being @safe
. I'll assume it's okay to mark toUTF16z
as @trusted
since toUTF16
already is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not @safe
, as the intention is to return a null-terminated array, which is inherently unsafe. Not only that, but index 0 doesn't exist, so you potentially would have a RangeError
thrown (it should be what happens, since that is the point of making you index the first element in @safe
code).
Where is the code that is calling this, complaining that it's not @safe
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, it was marked @safe
. It shouldn't have been. ugh...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, even though it's not really @safe
, the best option may be to use:
return &"\0"w[0];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not @safe, as the intention is to return a null-terminated array, which is inherently unsafe.
I don't think there is a problem: a pointer to something known to exist (in the DATA segment or in GC memory) is fine in @safe code, as this compiles:
{
static const char a = 'a';
const char* p = &a;
}
You are not supposed to do any arithmetics on p and accessing *p should be fine. So "a".ptr could be accepted in @safe code, even "".ptr as literals are guaranteed to be zero-terminated. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically @safe
, but disingenuous. The point is to return a null-terminated string, not a pointer to a single char
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "a".ptr could be accepted in @safe code
Just noticed that this was still accepted as deprecated in dmd 2.078, but no longer in master. I cannot find it in https://dlang.org/deprecate.html, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://dlang.org/changelog/pending.html#ptr-safe-end-of-deprecation
When I made dlang/dmd#7688, I didn't realize it wasn't on the deprecations page. We really need to have more structure in our deprecation process :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's technically @safe, but disingenuous.
I agree. The syntax &a[0]
is not nice, especially with literals, but it makes range checks kind of explicit. Let's hope the compiler elides them if possible.
This one is pretty obvious, regardless of the safety debate, so I'm just going to merge it. |
The offending pattern:
The apparent intention is to preallocate the buffer but this idiom doesn't work as pointed out by Dmitry Olshansky: "On append the slice is not the tail so it will reallocate."