Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 7515 #478

Closed
wants to merge 1 commit into from
Closed

Fix issue 7515 #478

wants to merge 1 commit into from

Conversation

jmdavis
Copy link
Member

@jmdavis jmdavis commented Mar 6, 2012

I created an adjusted version of translate which is ASCII-only and
renamed maketrans to makeTrans with some minor changes. Instead of
deprecating the old translate and maketrans, maketrans is now a
deprecated alias of makeTrans, and the new ASCII-only translate should
be compatible with the old one.

The one major change which might break code is that I made it so that
it uses arrays of length 128 rather than 256, since all valid ASCII
characters have a 0 in their most significant bit, making 128 in all
rather than 256. As long as code is using maketrans (as it should be),
that's not an issue, since it'll just automatically use makeTrans, which
uses 128. But if anyone constructed one on their own for some reason, it
would now be the wrong length. I don't consider that to be a high enough
risk to really be a concern though.

I created an adjusted version of translate which is ASCII-only and
renamed maketrans to makeTrans with some minor changes. Instead of
deprecating the old translate and maketrans, maketrans is now a
deprecated alias of makeTrans, and the new ASCII-only translate should
be compatible with the old one.

The one major change which _might_ break code is that I made it so that
it uses arrays of length 128 rather than 256, since all valid ASCII
characters have a 0 in their most significant bit, making 128 in all
rather than 256. As long as code is using maketrans (as it should be),
that's not an issue, since it'll just automatically use makeTrans, which
uses 128. But if anyone constructed one on their own for some reason, it
would now be the wrong length. I don't consider that to be a high enough
risk to really be a concern though.
@jmdavis
Copy link
Member Author

jmdavis commented Mar 15, 2012

Anyone reviewing this should read the comments on the bug report. Bearophile seems to think that it's normal and expected that users would alter the array returned from makeTrans and that it should support Extended ASCII. Personally, I'm divided on the matter. Ideally, it really should just deal with ASCII, since it will be slightly more efficient (albeit not by much) and Extended ASCII is supported by nothing else in Phobos. It's all designed with the idea that strings should be valid unicode. But the previous implementation did use 256 element arrays and (on purpose or otherwise) ended up supporting Extended ASCII. So, for the sake of backwards compatibility if nothing else, maybe we should just make it use 256 elements.

On a related note, at some point, we should probably better sort out how to deal with alternate encodings in Phobos (possibly with some sort of wrapper range), since std.encoding doesn't really seem to do the job. But that's a separate (albeit related) issue.

@jmdavis
Copy link
Member Author

jmdavis commented May 28, 2012

After further consideration, I'm just going to go with the 256 element version, much as I don't like it. But at least it'll avoid the function blowing up when someone accidentally uses something other than ASCII with it. As ASCII characters aren't valid as other code units, it should actually work to use the ASCII version of translate on full-on UTF strings to just translate the ASCII characters.

@jmdavis jmdavis closed this May 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant