New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of byUTF for ubyte[] argument #7249
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request and interest in making D better, @vporton! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + phobos#7249" |
What happens for a range of |
@@ -4354,6 +4354,9 @@ if (isSomeChar!C) | |||
// hellö as a range of `ubyte`s, which are UTF-8 | |||
assert((cast(ubyte[]) [0x68, 0x65, 0x6c, 0x6c, 0xC3, 0xB6]).byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6])); | |||
|
|||
assertThrown((cast(ubyte[]) [0xC3, 0x28]).byUTF!char()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs an import std.exception : assertThrown;
.
std/utf.d
Outdated
@@ -60,7 +60,7 @@ $(TR $(TD Miscellaneous) $(TD | |||
+/ | |||
module std.utf; | |||
|
|||
import std.exception; // basicExceptionCtors | |||
import std.exception; // basicExceptionCtors, assertThrown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work because tests are extracted. Please import it inside the unittest block.
I read the discussion on the other PR and AIUI it called for something that took |
assert("𐐷".byUTF!dchar().equal([0x00010437])); | ||
|
||
// hellö as a range of `ubyte`s, which are UTF-8 | ||
assert((cast(ubyte[]) [0x68, 0x65, 0x6c, 0x6c, 0xC3, 0xB6]).byUTF!char().equal(['h', 'e', 'l', 'l', 0xC3, 0xB6])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, but I think you should test (directly) converting an ubyte
range to utf-16 and/or utf-32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also dealing with invalid UTF-8 in the ubyte
, both with exceptions and replacement characters.
The added unittest explains it: You pass in a range of |
Perhaps this should also accept Of course, this can be implemented later on just as well. |
ping @vporton |
For the reasons outlined in the discussion of that pull request, we concluded that we need to be able to call
byUTF
on the argument of typeubyte[]
. This PR implements exactly that.I remind that this is necessary:
char
s two times: when it is created and when converted bybyUTF
(this simplifies programming and improves efficiency).