Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793

Closed
wants to merge 9 commits into from

Conversation

nwc10
Copy link
Contributor

@nwc10 nwc10 commented May 13, 2021

My suggested plan is

  1. smoke this
  2. merge adb79bc into blead before RC2 (so version 2.179). This reverts the behaviour to how it was on v5.32.0
  3. push a dev release of this branch to CPAN to see what the testers think
  4. if happy, ship 2.180 to CPAN
  5. v5.34.0 ships

The fix for all the bugs feels a bit too risky to put into blead at this time, but we can iterate CPAN release for Data::Dumper much faster than RCs (or v5.34.1). The plan above ensures that

  1. There is never a behaviour regression for the version downloaded from CPAN
  2. There is a fix immediately available to adopters of v5.34.0, for anyone that needs it

I don't think that we need to change the pure-perl output for qr// with Unicode, as (I believe) it wasn't buggy, hence i left it unchanged and adapted the tests.

nwc10 added 2 commits May 13, 2021 08:28
This reverts the XS code change from March 2021 from commit c71f1f2:
    Make Data::Dumper mark regex output as UTF-8 if needed

retains the new tests, but skips them for Dumpxs.

The change fixed one bug, but introduced another (GH #18764). The fix for
both seems a little too risky this late in the release cycle, so revert to
the v5.32.0 behaviour for the v5.34.0 release itself. Both bugs will be fix
with a CPAN release very soon, which likely will also be in v5.34.1
Seems that there hasn't been a CPAN release for 2.5 years.
nwc10 added 7 commits May 14, 2021 12:03
…vchr_buf

Hence we need to set NEED_utf8_to_uvchr_buf else we don't get *any*
utf8_to_uvchr_buf. Oops. :-)
These somewhat duplicate the tests in t/qr.t. It's not clear if that file is
actually redundant now, or whether it tests some failure modes that this
file's &TEST setup can't.
Adapted from Aaron's tests in GH #18771, with fixes for older Perl versions,
and also skipped for Dumpxs for now.
This approach (and this commit message) are based on Aaron Crane's original
in GH #18771. However, we leave the pure-Perl Dump unchanged (which means
changing the tests somewhat), and need to handle one more corner case
(\x{...} escaping a Unicode character that follows a backslash).

The previous approach was to upgrade the output to the internal UTF-8
encoding when dumping a regex containing supra-Latin-1 characters. That
has the disadvantage that nothing else generates wide characters in the
output, or even knows that the output might be upgraded.

A better approach, and one that's more consistent with the one taken for
string literals, is to use `\x{…}` notation where needed.

Closes #18764
@nwc10 nwc10 force-pushed the smoke-me/nicholas/data-dumper-5340 branch from 134a51f to e0d5a14 Compare May 14, 2021 12:46
@nwc10
Copy link
Contributor Author

nwc10 commented May 14, 2021

XS code for blead unchanged. $VERSION now set as 2.179_50

Tests and ppport fun fixed for older perl versions.

@rjbs
Copy link
Member

rjbs commented May 15, 2021

@nwc10 I've applied the bottom two commits to blead as (I think!) we discussed.

@nwc10 nwc10 closed this May 22, 2021
@nwc10 nwc10 deleted the smoke-me/nicholas/data-dumper-5340 branch May 22, 2021 08:35
@jkeenan jkeenan added the dist-Data-Dumper issues in the dual-life blead-first Data-Dumper distribution label Jul 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dist-Data-Dumper issues in the dual-life blead-first Data-Dumper distribution hasConflicts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants