replace UnicodeException with UnicodeError #1279
Conversation
- first step to make invalid UTF encodings in strings a programming error (ensuring valid encoding has to be done when converting raw input data to string) - attributes for rt.util.utf - added staticError helper to throw @nogc Errors
The outcome of Issue 14519 – [Enh] foreach on strings should return replacementDchar rather than throwing is that the programmer has to ensure that strings contains valid unicode data (e.g. by validating raw data as done by readText) and any algorithm working on strings should assert that but not repeat the validation. |
@@ -566,7 +565,7 @@ Checks to see if string is well formed or not. $(D S) can be an array | |||
of $(D char), $(D wchar), or $(D dchar). Throws a $(D UtfException) | |||
if it is not. Use to check all untrusted input for correctness. | |||
*/ | |||
void validate(S)(in S s) | |||
void validate(S)(in S s) pure nothrow @safe @nogc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readText
calls validate
, which used to throw UnicodeException
, so won't this break validating text in readText
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it's a different validate, but this one shouldn't be nothrow and we still need UTFExcwption for validation.
BTW the signature of validate should become inout(char)[] validate(input(ubyte)[] str);
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still unsure about the whole ubyte
-for-ASCII idea. I think it's an interesting direction iff we improve Phobos and the language to better support it as well (e.g. allowing comparison of ubyte[]
and string literals, making s.startsWith("foo")
work with ubyte[] s
, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "ubyte-for-ASCII"? ASCII is a subset of UTF8, char
is fine for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant non-UTF 8-bit encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well s.startsWith("foo".representation)
would work.
You can't generally mix string
with ubyte[]
because one needs to use autodecoding, while the other can't.
Useful in and out of itself. Break it out? |
Done, #1325. |
Do we have nothrow @nogc validation? |
Please rebase, #1325 was pulled. |
Any updates on this? @MartinNowak |
It's really to wrong approach to start this transition by replacing some exceptions in druntime. |
a programming error (ensuring valid encoding has to be
done when converting raw input data to string)