Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Make encode reusable in Phobos #2404

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

edi33416
Copy link
Contributor

This PR adds two overloads to encode so those can be called from std.utf.encode (see here)

This PR depends on #2403

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @edi33416! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2404"

@edi33416 edi33416 changed the title Make encode reusable in Phobos [DON'T PULL] Make encode reusable in Phobos Dec 11, 2018
@edi33416 edi33416 changed the title [DON'T PULL] Make encode reusable in Phobos Make encode reusable in Phobos Dec 18, 2018
* Returns:
* The length of the encoded character (a number between `1` and `4` for
* `char[4]` buffers and a number between `1` and `2` for `wchar[2]` buffers)
* or `0` in case of failure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Let's define this function for all inputs (no contracts, no assert(isValidDChar)) and have it return 0 upon bad dchar. Then higher-level functions can throw exceptions, use replacement char etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, instead of returning a length, return a slice to the buffer, or null on bad dchar, because the next thing people will need to do is slice that buffer.
Returning null and not an empty slice is important, because otherwise if (auto s = encode(...)) will always be true.
Lastly, since this piece of code is highly performance sensitive, did you inspect the assembly to make sure out does not get default-initialized on function entry ?

Copy link
Contributor

@jacob-carlborg jacob-carlborg Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to pass buf as a regular char[] to push the allocation choice to the caller? If the caller wants to pass a static array, the array can be sliced to be able to pass it as char[].

}
do
{
char[4] buf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= void;

}
do
{
wchar[2] buf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= void;

* Encodes character c and appends it to array s[].
*/
nothrow pure @safe
void encode(ref char[] s, dchar c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the following functions because:

  1. They are an overload of the previous encode functions, but do something fundamentally different
  2. They rely on the GC. We should be crafting functions that do not allocate, but instead send the characters to a sink or an output range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly: #2404 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this the right order of arguments for UFCS? Would it be better to pass c first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sink normally is the first argument, as in:

sink.put(c);

@dlang-bot dlang-bot added Needs Rebase needs a `git rebase` performed stalled labels May 27, 2021
@RazvanN7
Copy link
Contributor

RazvanN7 commented Nov 5, 2021

@edi33416 Are you willing to take this to the finish line?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Needs Rebase needs a `git rebase` performed stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants