Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to!string doesn't throw on invalid UTF sequence #9906

Open
dlangBugzillaToGithub opened this issue Jun 8, 2011 · 4 comments
Open

to!string doesn't throw on invalid UTF sequence #9906

dlangBugzillaToGithub opened this issue Jun 8, 2011 · 4 comments

Comments

@dlangBugzillaToGithub
Copy link

andrej.mitrovich (@AndrejMitrovic) reported this on 2011-06-08T11:41:02Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=6125

CC List

Description

I'm not sure if this is a bug or wanted behavior:
    auto x = to!string(cast(char)255);

That won't throw. But this will:
    auto x = to!string(cast(char)255);  // or try 128
    auto z = toUTF8(x);  // throws

I've had this example code translated from C:

    foreach (y; 0 .. 16)
    foreach (x; 0 .. 16)
    {
        auto buffer = to!string(cast(char)(16 * x + y));
        auto result = buffer.toUTF16z;  // call to utf16z for the winapi
    }

Essentially the code builds a table of characters that it prints out. But it doesn't seem to take into account invalid UTF8 code points.

This leads me to another question, how does one iterate through valid UTF code points, starting from 0? Is there a Phobos function that does that?
@dlangBugzillaToGithub
Copy link
Author

andrej.mitrovich (@AndrejMitrovic) commented on 2016-08-27T21:55:57Z

-----
import std.conv;
import std.stdio;

void main()
{
    auto x = to!string(cast(char)255);
    writeln(x);
}
-----

Outputs:
[Decode error - output not utf-8]

I think the to!() routines should be UTF safe so the call to to!string above should throw an exception. Is this right Andrei?

@dlangBugzillaToGithub
Copy link
Author

andrei (@andralex) commented on 2016-10-14T16:55:25Z

Well since it doesn't throw we may as well make it nothrow :o) and use the replacement char, or add an overload. I'll bootcamp this.

@dlangBugzillaToGithub
Copy link
Author

lucia.mcojocaru commented on 2016-11-21T13:36:26Z

Is this a Windows specific bug?

I tested the following on Linux 64:
  1 import std.conv;
  2 import std.stdio;
  3 import std.utf;
  4 
  5 void main()
  6 {
  7     auto x = to!string(cast(char)191);
  8     auto z = toUTF8(x);
  9     writeln(x);
 10 
 11 
 12     foreach (y; 0 .. 16)
 13         foreach (r; 0 .. 16)
 14         {
 15             auto buffer = to!string(cast(char)(16 * r + y));
 16             auto b = toUTF8(buffer);
 17             writeln(b);
 18 //            auto result = buffer.toUTF16z;  // call to utf16z for the winapi
 19         }
 20 }


Only the commented line throws:
core.exception.UnicodeException@src/rt/util/utf.d(292): invalid UTF-8 sequence

@dlangBugzillaToGithub
Copy link
Author

bugzilla (@WalterBright) commented on 2019-12-11T14:18:40Z

The original bug isn't windows specific. I don't know if the example from Lucia Cojocaru can be considered the same bug...

@LightBender LightBender removed the P3 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants