Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Fix Issue 18279 - rt.util.utf does not properly reserve buffer in toUTF16/toUTF16z #2053

Merged
merged 1 commit into from
Jan 30, 2018

Conversation

n8sh
Copy link
Member

@n8sh n8sh commented Jan 22, 2018

The offending pattern:

    wchar[] r;
    size_t slen = s.length;
    r.length = slen;
    r.length = 0;
    // ... followed by code that appends to r with ~=

The apparent intention is to preallocate the buffer but this idiom doesn't work as pointed out by Dmitry Olshansky: "On append the slice is not the tail so it will reallocate."

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @n8sh! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
18279 rt.util.utf does not properly reserve buffer in toUTF16/toUTF16z

@dlang-bot dlang-bot added the Bug Fix Include reference to corresponding bugzilla issue label Jan 22, 2018
@rainers
Copy link
Member

rainers commented Jan 27, 2018

Thanks for your PR. I guess a unittest is not really feasible, is it?

r.length = slen;
r.length = 0;
if (!__ctfe)
r.reserve(slen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve still does a lot if slen is zero. Maybe better to return early for that case. Same below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... reserve probably shouldn't be expensive if request is 0.

Copy link
Member

@schveiguy schveiguy Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can just return "" in that case, or "".ptr for z versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reserve probably shouldn't be expensive if request is 0

reserve returns the allocated block size, so unfortunately, hard to avoid looking up block info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will add a check for slen == 0

@schveiguy
Copy link
Member

@n8sh in the future, if you could post a link to the PR on bugzilla when you open it, so others don't duplicate your work

Copy link
Member

@schveiguy schveiguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix one issue and I think it’s good

// Reserve still does a lot if slen is zero.
// Return early for that case.
if (0 == slen)
return &""w[0];// Terminating null is auto-included at end of string literals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t right. Return ””w.ptr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do that I get a complaint about the code not being @safe. I'll assume it's okay to mark toUTF16z as @trusted since toUTF16 already is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not @safe, as the intention is to return a null-terminated array, which is inherently unsafe. Not only that, but index 0 doesn't exist, so you potentially would have a RangeError thrown (it should be what happens, since that is the point of making you index the first element in @safe code).

Where is the code that is calling this, complaining that it's not @safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, it was marked @safe. It shouldn't have been. ugh...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, even though it's not really @safe, the best option may be to use:

return &"\0"w[0];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not @safe, as the intention is to return a null-terminated array, which is inherently unsafe.

I don't think there is a problem: a pointer to something known to exist (in the DATA segment or in GC memory) is fine in @safe code, as this compiles:

{
	static const char a = 'a';
	const char* p = &a;
}

You are not supposed to do any arithmetics on p and accessing *p should be fine. So "a".ptr could be accepted in @safe code, even "".ptr as literals are guaranteed to be zero-terminated. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically @safe, but disingenuous. The point is to return a null-terminated string, not a pointer to a single char.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "a".ptr could be accepted in @safe code

Just noticed that this was still accepted as deprecated in dmd 2.078, but no longer in master. I cannot find it in https://dlang.org/deprecate.html, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dlang.org/changelog/pending.html#ptr-safe-end-of-deprecation
When I made dlang/dmd#7688, I didn't realize it wasn't on the deprecations page. We really need to have more structure in our deprecation process :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically @safe, but disingenuous.

I agree. The syntax &a[0] is not nice, especially with literals, but it makes range checks kind of explicit. Let's hope the compiler elides them if possible.

@schveiguy
Copy link
Member

This one is pretty obvious, regardless of the safety debate, so I'm just going to merge it.

@dlang-bot dlang-bot merged commit 5063348 into dlang:master Jan 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Fix Include reference to corresponding bugzilla issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants