New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data::Dumper handles Unicode regex corner cases (GH #18614, GH #18764) #18793
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts the XS code change from March 2021 from commit c71f1f2: Make Data::Dumper mark regex output as UTF-8 if needed retains the new tests, but skips them for Dumpxs. The change fixed one bug, but introduced another (GH #18764). The fix for both seems a little too risky this late in the release cycle, so revert to the v5.32.0 behaviour for the v5.34.0 release itself. Both bugs will be fix with a CPAN release very soon, which likely will also be in v5.34.1
Seems that there hasn't been a CPAN release for 2.5 years.
This was referenced May 13, 2021
…vchr_buf Hence we need to set NEED_utf8_to_uvchr_buf else we don't get *any* utf8_to_uvchr_buf. Oops. :-)
These somewhat duplicate the tests in t/qr.t. It's not clear if that file is actually redundant now, or whether it tests some failure modes that this file's &TEST setup can't.
Adapted from Aaron's tests in GH #18771, with fixes for older Perl versions, and also skipped for Dumpxs for now.
This approach (and this commit message) are based on Aaron Crane's original in GH #18771. However, we leave the pure-Perl Dump unchanged (which means changing the tests somewhat), and need to handle one more corner case (\x{...} escaping a Unicode character that follows a backslash). The previous approach was to upgrade the output to the internal UTF-8 encoding when dumping a regex containing supra-Latin-1 characters. That has the disadvantage that nothing else generates wide characters in the output, or even knows that the output might be upgraded. A better approach, and one that's more consistent with the one taken for string literals, is to use `\x{…}` notation where needed. Closes #18764
nwc10
force-pushed
the
smoke-me/nicholas/data-dumper-5340
branch
from
May 14, 2021 12:46
134a51f
to
e0d5a14
Compare
XS code for blead unchanged. Tests and ppport fun fixed for older perl versions. |
@nwc10 I've applied the bottom two commits to blead as (I think!) we discussed. |
jkeenan
added
the
dist-Data-Dumper
issues in the dual-life blead-first Data-Dumper distribution
label
Jul 5, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
My suggested plan is
The fix for all the bugs feels a bit too risky to put into blead at this time, but we can iterate CPAN release for Data::Dumper much faster than RCs (or v5.34.1). The plan above ensures that
I don't think that we need to change the pure-perl output for
qr//
with Unicode, as (I believe) it wasn't buggy, hence i left it unchanged and adapted the tests.