-
-
Notifications
You must be signed in to change notification settings - Fork 423
Make encode reusable in Phobos #2404
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request and interest in making D better, @edi33416! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + druntime#2404" |
24dc693
to
c91b394
Compare
* Returns: | ||
* The length of the encoded character (a number between `1` and `4` for | ||
* `char[4]` buffers and a number between `1` and `2` for `wchar[2]` buffers) | ||
* or `0` in case of failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea. Let's define this function for all inputs (no contracts, no assert(isValidDChar)
) and have it return 0 upon bad dchar. Then higher-level functions can throw exceptions, use replacement char etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, instead of returning a length, return a slice to the buffer, or null
on bad dchar
, because the next thing people will need to do is slice that buffer.
Returning null
and not an empty slice is important, because otherwise if (auto s = encode(...))
will always be true
.
Lastly, since this piece of code is highly performance sensitive, did you inspect the assembly to make sure out
does not get default-initialized on function entry ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to pass buf
as a regular char[]
to push the allocation choice to the caller? If the caller wants to pass a static array, the array can be sliced to be able to pass it as char[]
.
} | ||
do | ||
{ | ||
char[4] buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= void;
} | ||
do | ||
{ | ||
wchar[2] buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= void;
* Encodes character c and appends it to array s[]. | ||
*/ | ||
nothrow pure @safe | ||
void encode(ref char[] s, dchar c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of the following functions because:
- They are an overload of the previous
encode
functions, but do something fundamentally different - They rely on the GC. We should be crafting functions that do not allocate, but instead send the characters to a sink or an output range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly: #2404 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this the right order of arguments for UFCS? Would it be better to pass c
first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sink normally is the first argument, as in:
sink.put(c);
@edi33416 Are you willing to take this to the finish line? |
This PR adds two overloads to
encode
so those can be called fromstd.utf.encode
(see here)This PR depends on #2403