-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8-c8 is not reversable: encode/decode mismatch #5330
Comments
From @TuxTwo issues I use this to test: use v6; use Test; for ^32 { say ""; my @data = ^20 .map({ 256.rand.Int }).list; #dd @data; my $b = Buf.new(@data); ok((my Str $u = $b.decode("utf8-c8")), "decode"); my @back = $u.encode("utf8-c8").list; #dd @back; my $n = Buf.new(@back); is-deeply($n, $b, "Data"); First issue is that the buffer returns something longer than the original (a \0 is added): # expected: Buf.new(61,29,61,200,30,99,107,150,71,11,253,134,110,27,35,227,88,140,180,158,209) # expected: Buf.new(61,2,71,91,58,252,6,247,88,58,121,32,124,129,191,126,36,222,185,109,213) The second issue is more fun, pairs are swapped: # expected: Buf.new(61,147,135,8,82,78,208,66,205,164,204,162,140,97,175,37,108,194,27,192,119) 205,164,204,162 => 204,162,205,164 |
From @Tux# expected: Buf.new(61,10,0,56,143,36,56,119,182,81,88,70,88,139,28,119,142,151,108,12,215) |
From @Tux# expected: Buf.new(61,93,12,110,139,89,42,134,251,165,68,32,104,225,44,112,194,178,75,64,243) |
From @jdvThis change at or around line 281 of MoarVM/src/strings/utf8_c8.c: - } while (++last_accept_utf8 != utf8); seems to clear up the additional trailing |
From @jdvThe other issue with swapped pairs seems to be the normalizer getting confused. I don't know enough to even suggest a fix but if one |
From @Tux# expected: Buf.new(61,185,242,97,170,122,52,182,62,236,186,222,213,63,189,203,241,176,1,149,233) |
From @zoffixznet♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥♥ TODO-fudged tests added in Raku/roast@16bcd2693d 🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁🏁 |
The RT System itself - Status changed from 'new' to 'open' |
From @nwc10On Sat Jul 16 06:03:19 2016, cpan@zoffix.com wrote:
All 6 tests trigger ASAN aborts. For example: $ ./perl6-m -Ilib 128184
|
From @jnthnOn Thu, 19 May 2016 05:32:10 -0700, hmbrand wrote:
I've done a total re-write of the UTF8-C8 decoder. The original approach turned out to be a lot too fragile, so I took a different approach. Along the way, I got it to properly handle the cases where normalization would re-order, fixing all of the examples above. It also fixes the various things that ASAN/Valgrind tripped over in the failing tests, and the tests - plus a number of new ones I've added - now come out clean under both. So far as I'm aware, this deals with the outstanding issues in utf8-c8. The tests are unfudged, though I moved them to S32-str/utf8-c8.t to make fudging of the added test cases easier (we fudge the whole file for JVM at present). /jnthn |
@jnthn - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#128184 (status was 'resolved')
Searchable as RT128184$
The text was updated successfully, but these errors were encountered: