Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBC: Blead Breaks Cpanel::JSON::XS #19842

Closed
cjg-cguevara opened this issue Jun 8, 2022 · 17 comments
Closed

BBC: Blead Breaks Cpanel::JSON::XS #19842

cjg-cguevara opened this issue Jun 8, 2022 · 17 comments
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s)

Comments

@cjg-cguevara
Copy link

This is a bug report for perl from "Carlos Guevara" carlos@carlosguevara.com,
generated with the help of perlbug 1.42 running under perl 5.37.1.


[Please describe your issue here]

BBC: Blead Breaks Cpanel::JSON::XS

Please see http://matrix.cpantesters.org/?dist=Cpanel::JSON::XS

[Please do not change anything below this line]


Flags:
category=core
severity=low

Site configuration information for perl 5.37.1:

Configured by cpan at Tue Jun 7 03:40:36 EDT 2022.

Summary of my perl5 (revision 5 version 37 subversion 1) configuration:
Commit id: c31ae9a
Platform:
osname=openbsd
osvers=7.1
archname=OpenBSD.amd64-openbsd
uname='openbsd cjg-openbsd7 7.1 generic#3 amd64 '
config_args='-des -Dprefix=/bin/perl-blead -Dscriptdir=/bin/perl-blead/bin -Dusedevel -Duse64bitall'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
Compiler:
cc='cc'
ccflags ='-fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
optimize='-O2'
cppflags='-fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='OpenBSD Clang 13.0.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags ='-Wl,-E -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/lib/clang/13.0.0/lib /usr/lib /usr/local/lib
libs=-lpthread -lm -lutil -lc
perllibs=-lpthread -lm -lutil -lc
libc=/usr/lib/libc.so.96.1
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags=' '
cccdlflags='-DPIC -fPIC '
lddlflags='-shared -fPIC -L/usr/local/lib -fstack-protector-strong'


@inc for perl 5.37.1:
/home/cpan/bin/perl-blead/lib/site_perl/5.37.1/OpenBSD.amd64-openbsd
/home/cpan/bin/perl-blead/lib/site_perl/5.37.1
/home/cpan/bin/perl-blead/lib/5.37.1/OpenBSD.amd64-openbsd
/home/cpan/bin/perl-blead/lib/5.37.1


Environment for perl 5.37.1:
HOME=/home/cpan
LANG (unset)
LANGUAGE (unset)
LC_ALL=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/cpan/bin/perl-blead/bin:/home/cpan/bin:/home/cpan/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/usr/games
PERL_BADLANG (unset)
SHELL=/usr/local/bin/bash

@hvds
Copy link
Contributor

hvds commented Jun 8, 2022

This appears to be a result of 7008caa, which removed the long-deprecated utf8n_to_uvuni:

t/00_load.t ................ 1/1 Can't load '/src/perl/Cpanel-JSON-XS-4.29/blib/arch/auto/Cpanel/JSON/XS/XS.so'
  for module Cpanel::JSON::XS: /src/perl/Cpanel-JSON-XS-4.29/blib/arch/auto/Cpanel/JSON/XS/XS.so:
  undefined symbol: utf8n_to_uvuni at lib/perl5/5.37.1/x86_64-linux/DynaLoader.pm line 206.
 at t/00_load.t line 3.
Compilation failed in require at t/00_load.t line 3.
BEGIN failed--compilation aborted at t/00_load.t line 3.
t/00_load.t ................ Dubious, test returned 255 (wstat 65280, 0xff00)

I don't know when or why that function was deprecated, or what replaces it, so I don't know whether it's just down to Reini to fix this or whether we need to reconsider the removal. @khwilliamson?

@jkeenan jkeenan added the BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) label Jun 9, 2022
@khwilliamson
Copy link
Contributor

Reini should fix it.

It looks like its replacement was added in 5.7.1, and is ported via ppport.h to 5.6.1
Here's the pod for the deleted function:

/*
=for apidoc utf8n_to_uvuni

Instead use L</utf8_to_uvchr_buf>, or rarely, L</utf8n_to_uvchr>.

This function was useful for code that wanted to handle both EBCDIC and
ASCII platforms with Unicode properties, but starting in Perl v5.20, the
distinctions between the platforms have mostly been made invisible to most
code, so this function is quite unlikely to be what you want. If you do need
this precise functionality, use instead
C<L<NATIVE_TO_UNI(utf8_to_uvchr_buf(...))|/utf8_to_uvchr_buf>>
or C<L<NATIVE_TO_UNI(utf8n_to_uvchr(...))|/utf8n_to_uvchr>>.

=cut
*/

@leonerd
Copy link
Contributor

leonerd commented Jun 10, 2022

I have copypasted the relevant summaries above into an upstream bug report - rurban/Cpanel-JSON-XS#196

@jkeenan
Copy link
Contributor

jkeenan commented Jun 11, 2022

Likely also affected: http://matrix.cpantesters.org/?dist=JSON::DWIW http://matrix.cpantesters.org/?dist=JSON::XS http://matrix.cpantesters.org/?dist=Unicode::UTF8

If we're getting failures on multiple CPAN modules with different maintainers, then perhaps we should re-think the breaking commit? Or at least try to develop a patching strategy that these different maintainers could follow.

@Leont
Copy link
Contributor

Leont commented Jun 13, 2022

I don't understand the reason of the deprecation, let alone if it was correct.

Am I correct to conclude that:

  1. This distinction between utf8n_to_uvuni and utf8n_to_uvchr only matters on EBCDIC platforms, on ASCII platforms they will do exactly the same
  2. On EBCDIC utf8n_to_uvuni will decode UTF-EBCDIC, and utf8n_to_uvchr will decode UTF-8

What is entirely unclear to me is what changed in v5.20 that would matter for this function. perl5200delta didn't give me the answer either.

In either case I'm kind of surprised by this deprecation, given how little it's in the way of core maintenance.

@khwilliamson
Copy link
Contributor

khwilliamson commented Jun 18, 2022

You are right about number 1; wrong on 2. They both operate on UTF-EBCDIC on EBCDIC platforms, and both on UTF-8 on ASCIIish ones. The difference is that uvchr knows, for example, that 65 should map to 193. (ASCII 'A' to EBCDIC 'A'). uvuni does no mapping

My presumption has been that when uvchr was introduced in 5.7.1, that new code was supposed to use it, and uvuni was kept around for backwards compatibility and for specialist purposes, notably Encode. I presume it was me who deprecated uvuni, and I thought I was following along with the original intent. A module using uvuni is pretty much guaranteed to fail on EBCDIC boxes. It's no extra trouble to type 'chr' instead of 'uni' and remove that failure path.

Further, most code shouldn't be using either of these, but instead, utf8_touvchr_buf (By the time that was invented, I suspect, it was realized that the uvuni wasn't the way to go, so there is no 'utf8_to_uvuni_buf' in existence.) The thinking on what to do about malformed multibyte input has evolved over the years, from doing a panic at the most extreme, to now changing it to REPLACEMENT CHARACTERs and continuing on. The code I looked it appeared to be using uvuni so that it could handle malformed input without the core crashing (but I didn't look closely). Best practice now is to use utf8_to_uvchr_buf..

I don't believe that core maintenance is the only reason to deprecate and remove functions. Another is to maximize the chances that a module will work more portably and safely,. The industry has learned various lessons about security holes with UTF-8. (and hence UTF-EBCDIC). That is why I have backported those lessons in Devel::PPPort. Even if a release has a functioning (on well-formed input) version of these functions, ppport.h overrides them with something containing the latest security fixes.

Module writers shouldn't have to be burdened by the complexities of these kinds of things, as long as they follow a few principles. One of those is to use uvchr not uvuni

We may actually have to revert this removal because of the unavailability of some of the maintainers of high-in-the-CPAN-river modules. I went to submit a patch to Cpanel::JSON::XS, and found the source had already been fixed.

But I'd like to continue ithe removal from core for a development release or two. I believe we can afford to do so at this point in the development cycle, and it could be a wake up call for people who are actively maintaining their modules that not only is this out-moded code (since 5.7.1!), but utf8_to_uvchr_buf() is probably what you want to change to. I can write a paragraph or two summary of the issue to send to such authors.for their consideration

@andk
Copy link
Contributor

andk commented Jun 22, 2022

Also affected: SMUELLER/Data-Dumper-Limited-0.03.tar.gz
Fail report: http://www.cpantesters.org/cpan/report/589d1bea-f16a-11ec-b166-c683b201f4e1
At the time of this writing, the matrix http://matrix.cpantesters.org/?dist=Data-Dumper-Limited did not yet link to the fail report.
Hat tip to @tsee, you may want to take note.

@andk
Copy link
Contributor

andk commented Jun 22, 2022

utf8n_to_uvuni is not mentioned in perl5371delta

@andk
Copy link
Contributor

andk commented Jun 22, 2022

Also affected: DFARRELL/URI-Encode-XS-0.11.tar.gz
Fail report: http://www.cpantesters.org/cpan/report/130f58e8-f202-11ec-9014-eacfbebd8324
At the time of this writing, the matrix http://matrix.cpantesters.org/?dist=URI-Encode-XS+0.11 did not yet link to this fail report.
Greets to @dnmfarrell, you may want to take note.

@andk
Copy link
Contributor

andk commented Jun 22, 2022

Also affected: JKUTEJ/XML-Char-0.04.tar.gz
Fail report: http://www.cpantesters.org/cpan/report/be5af526-f26c-11ec-89c7-0668d637e479
At the time of this writing, the matrix http://matrix.cpantesters.org/?dist=XML-Char+0.04 did not yet link to this fail report.
Greets to @jozef , you may want to take note.

@andk
Copy link
Contributor

andk commented Jun 23, 2022

Also affected: SADAHIRO/Lingua-AR-MacArabic-0.20.tar.gz
Fail report: http://www.cpantesters.org/cpan/report/9c8dcfe2-f283-11ec-a091-62f6d737e479
Matrix: http://matrix.cpantesters.org/?dist=Lingua-AR-MacArabic
I'll open a ticket on RT to notify SADAHIRO [time passes] XRef: https://rt.cpan.org/Ticket/Display.html?id=143472

@andk
Copy link
Contributor

andk commented Jun 25, 2022

Also affected: SADAHIRO/Unicode-Transform-0.51.tar.gz
http://www.cpantesters.org/cpan/report/54089bbe-f411-11ec-862e-73fe3cc9ca3c

@andk
Copy link
Contributor

andk commented Jun 25, 2022

Also affected: SADAHIRO/Lingua-KO-MacKorean-0.20.tar.gz
http://www.cpantesters.org/cpan/report/49078db0-f443-11ec-9445-dae741c9ca3c

@khwilliamson
Copy link
Contributor

I have reverted this in f847c0b. Distributions should convert away from using this function, but the drop-in replacement remains cumbersome to use, and there really isn't yet a good API for this functionality. I will investigate and propose some replacements

@jkeenan
Copy link
Contributor

jkeenan commented Jun 28, 2022

The reversion in f847c0b cleared up most, but not all, of the BBC failures reported in this ticket.

While installing some other modules against blead, I noticed that there were still test failures in Data-Dumper-Limited and XML-Char. Since those failures were caused by the same commit, I have reported them both in #19897.

I think this ticket is closable now. Does anyone see a reason for keeping it open?

@jkeenan jkeenan added the Closable? We might be able to close this ticket, but we need to check with the reporter label Jun 28, 2022
@jkeenan
Copy link
Contributor

jkeenan commented Jul 10, 2022

The reversion in f847c0b cleared up most, but not all, of the BBC failures reported in this ticket.

While installing some other modules against blead, I noticed that there were still test failures in Data-Dumper-Limited and XML-Char. Since those failures were caused by the same commit, I have reported them both in #19897.

I think this ticket is closable now. Does anyone see a reason for keeping it open?

No one has argued for keeping this ticket open. Closing.

@jkeenan jkeenan closed this as completed Jul 10, 2022
@jkeenan jkeenan removed the Closable? We might be able to close this ticket, but we need to check with the reporter label Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s)
Projects
None yet
Development

No branches or pull requests

7 participants