Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.string.indexOf wrong result with bad unicode #9766

Open
dlangBugzillaToGithub opened this issue Nov 23, 2018 · 0 comments
Open

std.string.indexOf wrong result with bad unicode #9766

dlangBugzillaToGithub opened this issue Nov 23, 2018 · 0 comments

Comments

@dlangBugzillaToGithub
Copy link

dlang-bugzilla (@CyberShadow) reported this on 2018-11-23T22:39:35Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=19428

Description

//////////////////// test.d ///////////////////
import std.algorithm.comparison;
import std.range;
import std.string;

void main()
{
    assert(indexOf(
            only('\uFFFD', '\uFFFD', '\uFFFD'),
            "\x83\x84\x85",
            CaseSensitive.yes) == -1);
}
///////////////////////////////////////////////

Looks like it's replacing bad Unicode with replacement characters under the hood.

This becomes worse when something causes the same thing to happen to the haystack, as in this unit test:

https://github.com/dlang/phobos/blob/9bfc82130c0e4af4d1dc95bb261570c6e4f6f5d8/std/string.d#L887-L903

Note that this unittest is incorrectly annotated as nothrow/@nogc. We can't use the kind of decoding that substitutes errors with replacement characters, as that will introduce bugs like these.
@LightBender LightBender removed the P3 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants