Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793

nwc10 · 2021-05-13T10:39:55Z

My suggested plan is

smoke this
merge adb79bc into blead before RC2 (so version 2.179). This reverts the behaviour to how it was on v5.32.0
push a dev release of this branch to CPAN to see what the testers think
if happy, ship 2.180 to CPAN
v5.34.0 ships

The fix for all the bugs feels a bit too risky to put into blead at this time, but we can iterate CPAN release for Data::Dumper much faster than RCs (or v5.34.1). The plan above ensures that

There is never a behaviour regression for the version downloaded from CPAN
There is a fix immediately available to adopters of v5.34.0, for anyone that needs it

I don't think that we need to change the pure-perl output for qr// with Unicode, as (I believe) it wasn't buggy, hence i left it unchanged and adapted the tests.

This reverts the XS code change from March 2021 from commit c71f1f2: Make Data::Dumper mark regex output as UTF-8 if needed retains the new tests, but skips them for Dumpxs. The change fixed one bug, but introduced another (GH #18764). The fix for both seems a little too risky this late in the release cycle, so revert to the v5.32.0 behaviour for the v5.34.0 release itself. Both bugs will be fix with a CPAN release very soon, which likely will also be in v5.34.1

Seems that there hasn't been a CPAN release for 2.5 years.

…e place.

…vchr_buf Hence we need to set NEED_utf8_to_uvchr_buf else we don't get *any* utf8_to_uvchr_buf. Oops. :-)

These somewhat duplicate the tests in t/qr.t. It's not clear if that file is actually redundant now, or whether it tests some failure modes that this file's &TEST setup can't.

Adapted from Aaron's tests in GH #18771, with fixes for older Perl versions, and also skipped for Dumpxs for now.

This approach (and this commit message) are based on Aaron Crane's original in GH #18771. However, we leave the pure-Perl Dump unchanged (which means changing the tests somewhat), and need to handle one more corner case (\x{...} escaping a Unicode character that follows a backslash). The previous approach was to upgrade the output to the internal UTF-8 encoding when dumping a regex containing supra-Latin-1 characters. That has the disadvantage that nothing else generates wide characters in the output, or even knows that the output might be upgraded. A better approach, and one that's more consistent with the one taken for string literals, is to use `\x{…}` notation where needed. Closes #18764

nwc10 · 2021-05-14T12:52:08Z

XS code for blead unchanged. $VERSION now set as 2.179_50

Tests and ppport fun fixed for older perl versions.

rjbs · 2021-05-15T01:47:40Z

@nwc10 I've applied the bottom two commits to blead as (I think!) we discussed.

nwc10 added 2 commits May 13, 2021 08:28

Update Changes file for Data::Dumper.

adb79bc

Seems that there hasn't been a CPAN release for 2.5 years.

nwc10 requested a review from Leont May 13, 2021 10:40

This was referenced May 13, 2021

Data::Dumper: rework Unicode-in-qr support #18771

Closed

Data::Dumper: Malformed UTF-8 character since 5.33.8 #18764

Closed

nwc10 added 7 commits May 14, 2021 12:03

ppport.h's utf8_to_uvchr_buf implementation misses a NULL check in on…

66103fa

…e place.

Current ppport.h forcibly overrides older buggy versions of utf8_to_u…

69615fb

…vchr_buf Hence we need to set NEED_utf8_to_uvchr_buf else we don't get *any* utf8_to_uvchr_buf. Oops. :-)

More regression tests for perl #58608 (quoting / in qr//).

13b732e

These somewhat duplicate the tests in t/qr.t. It's not clear if that file is actually redundant now, or whether it tests some failure modes that this file's &TEST setup can't.

More tests for Unicode in qr//.

36eb7c0

Adapted from Aaron's tests in GH #18771, with fixes for older Perl versions, and also skipped for Dumpxs for now.

Document the scanning logic in Data::Dumper's dump_regexp.

a546c17

Bump Data::Dumper's $VERSION and update Changes.

e0d5a14

nwc10 force-pushed the smoke-me/nicholas/data-dumper-5340 branch from 134a51f to e0d5a14 Compare May 14, 2021 12:46

github-actions bot added the hasConflicts label May 15, 2021

nwc10 closed this May 22, 2021

nwc10 deleted the smoke-me/nicholas/data-dumper-5340 branch May 22, 2021 08:35

jkeenan added the dist-Data-Dumper issues in the dual-life blead-first Data-Dumper distribution label Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793

Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793

nwc10 commented May 13, 2021

nwc10 commented May 14, 2021

rjbs commented May 15, 2021

Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793

Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793

Conversation

nwc10 commented May 13, 2021

nwc10 commented May 14, 2021

rjbs commented May 15, 2021