Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perldoc in 5.16.0 required groff upgrade but now misdisplays asterisks #12142

Open
p5pRT opened this issue May 28, 2012 · 14 comments
Open

perldoc in 5.16.0 required groff upgrade but now misdisplays asterisks #12142

p5pRT opened this issue May 28, 2012 · 14 comments

Comments

@p5pRT
Copy link
Collaborator

@p5pRT p5pRT commented May 28, 2012

Migrated from rt.perl.org#113406 (status was 'open')

Searchable as RT113406$

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @jkeenan

Created by @jkeenan

I upgraded from Perl 5.12 or Perl 5.14 to Perl 5.16.0 on three machines
in the
last week​: two Linux and one Darwin. As I began to use 'perldoc' on these
machines, I noticed that warnings were being generated before the POD was
formatted. These warnings reported that my version of 'groff' was
out-of-date. Accordingly, I got the 'grotty' package from gnu.org and
compiled, built and installed the code from source.

##########
$ groff --version
GNU groff version 1.21
Copyright (C) 2009 Free Software Foundation, Inc.
GNU groff comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of groff and its subprograms
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

called subprograms​:

GNU grops (groff) version 1.21
GNU troff (groff) version 1.21
#########

Once I completed this upgrade, 'perldoc' worked without generating warnings.
However, I now observe a regression in perldoc's performance. I (and
others)
have often written POD like this​:

##########
=head2 Comparing Two Lists Held in Arrays

=over 4

=item *

Given two lists​:

  @​Llist = qw(abel abel baker camera delta edward fargo golfer);
  @​Rlist = qw(baker camera delta delta edward fargo golfer hilton);

=item *

##########

That is, POD where the '=item' directive is followed by a wordspace and an
asterisk. Prior to 5.16.0 and the groff upgrade, perldoc always
rendered this
simply as an asterisk.

Now, however, it's representing this as follows​:

##########
Comparing Two Lists Held in Arrays
  <C2><B7> Given two lists​:

  @​Llist = qw(abel abel baker camera delta edward fargo golfer);
  @​Rlist = qw(baker camera delta delta edward fargo golfer
hilton);

  <C2><B7> Get those items which appear at least once in both lists
(their intersection).
##########

That is, the '*' is being represented by 'perldoc' as '<C2><B7>'. To
me, this
is a regression. What is wrong and how can it be fixed?
##########

Perl Info

Flags:
     category=docs
     severity=medium

Site configuration information for perl 5.16.0:

Configured by jimk at Sun May 20 20:01:26 EDT 2012.

Summary of my perl5 (revision 5 version 16 subversion 0) configuration:

   Platform:
     osname=darwin, osvers=8.11.0, archname=darwin-2level
     uname='darwin macintosh-8.local 8.11.0 darwin kernel version 
8.11.0: wed oct 10 18:26:00 pdt 2007; root:xnu-792.24.17~1release_ppc 
power macintosh powerpc '
     config_args='-des'
     hint=recommended, useposix=true, d_sigaction=define
     useithreads=undef, usemultiplicity=undef
     useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
     use64bitint=undef, use64bitall=undef, uselongdouble=undef
     usemymalloc=n, bincompat5005=undef
   Compiler:
     cc='cc', ccflags ='-fno-common -DPERL_DARWIN -fno-strict-aliasing 
-pipe -I/usr/local/include -I/opt/local/include',
     optimize='-O3',
     cppflags='-fno-common -DPERL_DARWIN -fno-strict-aliasing -pipe 
-I/usr/local/include -I/opt/local/include'
     ccversion='', gccversion='4.0.1 (Apple Computer, Inc. build 5250)', 
gccosandvers=''
     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
     alignbytes=8, prototype=define
   Linker and Libraries:
     ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' 
-L/usr/local/lib -L/opt/local/lib'
     libpth=/usr/local/lib /opt/local/lib /usr/lib
     libs=-ldbm -ldl -lm -lc
     perllibs=-ldl -lm -lc
     libc=, so=dylib, useshrplib=false, libperl=libperl.a
     gnulibc_version=''
   Dynamic Linking:
     dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
     cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup 
-L/usr/local/lib -L/opt/local/lib'

Locally applied patches:



@INC for perl 5.16.0:
     /usr/local/lib/perl5/site_perl/5.16.0/darwin-2level
     /usr/local/lib/perl5/site_perl/5.16.0
     /usr/local/lib/perl5/5.16.0/darwin-2level
     /usr/local/lib/perl5/5.16.0
     /usr/local/lib/perl5/site_perl/5.14.2
     /usr/local/lib/perl5/site_perl/5.14.0
     /usr/local/lib/perl5/site_perl/5.12.0
     /usr/local/lib/perl5/site_perl/5.10.1
     /usr/local/lib/perl5/site_perl/5.10.0
     /usr/local/lib/perl5/site_perl
     .


Environment for perl 5.16.0:
 
DYLD_LIBRARY_PATH=/Users/jimk/work/pseudoinstall/lib:/Users/jimk/gitwork/parrot/blib/lib
     HOME=/Users/jimk
     LANG (unset)
     LANGUAGE (unset)
     LD_LIBRARY_PATH (unset)
     LOGDIR (unset)
 
PATH=/usr/local/bin:/opt/local/bin:/opt/local/sbin:/usr/local/bin:/opt/local/bin:/opt/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/Users/jimk/bin:/Users/jimk/bin/perl:/Users/jimk/bin/c:/Users/jimk/bin/shell:/sw/lib:/sw/bin:/Users/jimk/bin:/Users/jimk/bin/perl:/Users/jimk/bin/c:/Users/jimk/bin/shell:/sw/lib:/sw/bin
     PERL_BADLANG (unset)
     SHELL=/bin/bash

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From tchrist@perl.com

James E Keenan (via RT) <perlbug-followup@​perl.org> wrote
  on Sun, 27 May 2012 20​:42​:21 PDT​:

Now, however, it's representing this as follows​:

Comparing Two Lists Held in Arrays
<C2><B7> Given two lists​:

         @&#8203;Llist = qw\(abel abel baker camera delta edward fargo golfer\);
         @&#8203;Rlist = qw\(baker camera delta delta edward fargo golfer hilton\);

 \<C2>\<B7>   Get those items which appear at least once in both lists

(their intersection).

That is, the '*' is being represented by 'perldoc' as '<C2><B7>'. To
me, this is a regression. What is wrong and how can it be fixed?

"\xC2\xB7" is simply the UTF-8 representation of U+00B7 MIDDLE DOT, which
means it is just fine.

I suggest you have something else wrong.

But I don't know. I wouldn't use perldoc if you paid me. It works
just fine using the regular toolchain, meaning either pod2text
or pod2man|nroff -man.

--tom

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @cpansprout

On Sun May 27 20​:42​:21 2012, jkeen@​verizon.net wrote​:

I upgraded from Perl 5.12 or Perl 5.14 to Perl 5.16.0 on three
machines
in the
last week​: two Linux and one Darwin. As I began to use 'perldoc' on
these
machines, I noticed that warnings were being generated before the POD
was
formatted. These warnings reported that my version of 'groff' was
out-of-date. Accordingly, I got the 'grotty' package from gnu.org and
compiled, built and installed the code from source.

##########
$ groff --version
GNU groff version 1.21
Copyright (C) 2009 Free Software Foundation, Inc.
GNU groff comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of groff and its subprograms
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

called subprograms​:

GNU grops (groff) version 1.21
GNU troff (groff) version 1.21
#########

Once I completed this upgrade, 'perldoc' worked without generating
warnings.
However, I now observe a regression in perldoc's performance. I (and
others)
have often written POD like this​:

##########
=head2 Comparing Two Lists Held in Arrays

=over 4

=item *

Given two lists​:

 @&#8203;Llist = qw\(abel abel baker camera delta edward fargo golfer\);
 @&#8203;Rlist = qw\(baker camera delta delta edward fargo golfer hilton\);

=item *

##########

That is, POD where the '=item' directive is followed by a wordspace
and an
asterisk. Prior to 5.16.0 and the groff upgrade, perldoc always
rendered this
simply as an asterisk.

Now, however, it's representing this as follows​:

##########
Comparing Two Lists Held in Arrays
<C2><B7> Given two lists​:

         @&#8203;Llist = qw\(abel abel baker camera delta edward fargo

golfer);
@​Rlist = qw(baker camera delta delta edward fargo golfer
hilton);

 \<C2>\<B7>   Get those items which appear at least once in both

lists
(their intersection).
##########

That is, the '*' is being represented by 'perldoc' as '<C2><B7>'. To
me, this
is a regression. What is wrong and how can it be fixed?
##########

It sounds to me as though perldoc is assuming you have a UTF-8
environment when you don’t. So UTF-8 gets fed to less, and less sees it
as just bytes, because it hasn’t been told via the LANG env var to treat
its input otherwise. That sounds like a perldoc bug.

--

Father Chrysostomos

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @jkeenan

On Sun May 27 21​:15​:14 2012, tom christiansen wrote​:

"\xC2\xB7" is simply the UTF-8 representation of U+00B7 MIDDLE DOT,
which
means it is just fine.

I suggest you have something else wrong.

But I don't know. I wouldn't use perldoc if you paid me. It works
just fine using the regular toolchain, meaning either pod2text
or pod2man|nroff -man.

Neither of those solutions are really satisfactory.

pod2text displays the asterisks as asterisks, but, of course, at the
expense of (a) losing all the man-like formatting; and (b) requiring
that I pipe the content to 'less'​:

pod2text List/Compare/Functional.pm |less

######### (excerpt)
Comparing Two Lists Held in Arrays
  * Given two lists​:

  @​Llist = qw(abel abel baker camera delta edward fargo golfer);
  @​Rlist = qw(baker camera delta delta edward fargo golfer hilton);

  * Get those items which appear at least once in both lists (their
  intersection).

  @​intersection = get_intersection( [ \@​Llist, \@​Rlist ] );
##########

pod2man piped to nroff -man displays the asterisks as open circles,
which is acceptable. But now it gets =head2 wrong (on top of being many
more keystrokes)​:

pod2man List/Compare/Functional.pm |nroff -man |less

########## (excerpt)
ESC[1mComparing Two Lists Held in ArraysESC[0m
  o Given two lists​:

  @​Llist = qw(abel abel baker camera delta edward fargo golfer);
  @​Rlist = qw(baker camera delta delta edward fargo golfer hilton);

  o Get those items which appear at least once in both lists (their
  intersection).

  @​intersection = get_intersection( [ \@​Llist, \@​Rlist ] );
##########

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @jkeenan

On Sun May 27 22​:56​:22 2012, sprout wrote​:

It sounds to me as though perldoc is assuming you have a UTF-8
environment when you don’t. So UTF-8 gets fed to less, and less sees it
as just bytes, because it hasn’t been told via the LANG env var to treat
its input otherwise. That sounds like a perldoc bug.

I have not made any recent changes to my 'locale' settings on these
machines. For instance, on Darwin I have​:

$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

... and on Linux I have​:

$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @Leont

On Mon, May 28, 2012 at 3​:04 PM, James E Keenan via RT
<perlbug-followup@​perl.org> wrote​:

I have not made any recent changes to my 'locale' settings on these
machines.  For instance, on Darwin I have​:

$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

... and on Linux I have​:

$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Both C and POSIX are ASCII locales, AFAIK, so it seems Father
Chrysostomos' diagnosis is right.

Leon

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @jkeenan

On Mon May 28 06​:25​:41 2012, LeonT wrote​:

Both C and POSIX are ASCII locales, AFAIK, so it seems Father
Chrysostomos' diagnosis is right.

Leon, thanks for your quick response.

It's safe to assume that there will be many other people who, when they
try out 'perldoc' in 5.16, obey the recommendation to upgrade groff and
who subsequently get unexpected, less attractive 'perldoc' output than
they did with 5.14 and earlier.

Since I both upgraded 'perldoc' (via upgrading Perl) *and* upgraded
groff, it seems that the first question to ask is, "What output would I
have gotten if I had ignored the warnings and *not* upgraded groff?"
If, indeed, an assumption of UTF-8 input is being made, where is it
being made​: in perldoc, groff or someplace else?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From tchrist@perl.com

pod2man List/Compare/Functional.pm |nroff -man |less

Which is what the man program effectively does. Which is what I use.

--tom

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 28, 2012

From @jkeenan

On Mon May 28 06​:02​:08 2012, jkeenan wrote​:

pod2man piped to nroff -man displays the asterisks as open circles,
which is acceptable. But now it gets =head2 wrong (on top of being many
more keystrokes)​:

pod2man List/Compare/Functional.pm |nroff -man |less

########## (excerpt)
ESC[1mComparing Two Lists Held in ArraysESC[0m
o Given two lists​:

      @&#8203;Llist = qw\(abel abel baker camera delta edward fargo golfer\);
      @&#8203;Rlist = qw\(baker camera delta delta edward fargo golfer

hilton);

 o   Get those items which appear at least once in both lists \(their
     intersection\)\.

      @&#8203;intersection = get\_intersection\( \[ \\@&#8203;Llist\, \\@&#8203;Rlist \] \);

##########

This is also the output I get if I use the older, system 'perldoc',
which is presumably interacting with the upgrade 'groff'​:

##########
$ /usr/bin/perldoc -V
Perldoc v3.13, under perl v5.008006 for darwin

$ perldoc -V
Perldoc v3.17, under perl v5.016000 for darwin

$ perldoc -l List​::Compare​::Functional
/usr/local/lib/perl5/site_perl/5.10.0/List/Compare/Functional.pm

$ /usr/bin/perldoc `perldoc -l List​::Compare​::Functional`
##########
ESC[1mComparing Two Lists Held in ArraysESC[0m

  o Given two lists​:

  @​Llist = qw(abel abel baker camera delta edward fargo golfer);
  @​Rlist = qw(baker camera delta delta edward fargo golfer hilton);

  o Get those items which appear at least once in both lists (their
intersection).

  @​intersection = get_intersection( [ \@​Llist, \@​Rlist ] )
##########

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented May 29, 2012

From @mrallen1

From​: Father Chrysostomos via RT <perlbug-followup@​perl.org>

On Sun May 27 20​:42​:21 2012, jkeen@​verizon.net wrote​:
That is, the '*' is being represented by 'perldoc' as '<C2><B7>'.  To
me, this
is a regression.  What is wrong and how can it be fixed?
It sounds to me as though perldoc is assuming you have a UTF-8
environment when you don’t.  So UTF-8 gets fed to less, and less sees it
as just bytes, because it hasn’t been told via the LANG env var to treat
its input otherwise.  That sounds like a perldoc bug.

Yes perldoc makes an implicit assumption (since about November or so) that it is

going to process UTF8 data in and send UTF8 data out.

Like most assumptions it gets things wrong sometimes.  There's already a ticket open in the
Pod-Perldoc RT queue for this issue.  I am planning to release a new version of Pod-Perldoc
to address it. 

Hopefully it can go into 5.16.1.

Thanks.

--Mark


via perlbug​:  queue​: perl5 status​: open
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=113406

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Oct 20, 2012

From @jkeenan

On Mon May 28 18​:47​:46 2012, mallen wrote​:

From​: Father Chrysostomos via RT <perlbug-followup@​perl.org>

On Sun May 27 20​:42​:21 2012, jkeen@​verizon.net wrote​:
That is, the '*' is being represented by 'perldoc' as '<C2><B7>'.
To
me, this
is a regression.  What is wrong and how can it be fixed?
It sounds to me as though perldoc is assuming you have a UTF-8
environment when you don’t.  So UTF-8 gets fed to less, and less
sees it
as just bytes, because it hasn’t been told via the LANG env var to
treat
its input otherwise.  That sounds like a perldoc bug.

Yes perldoc makes an implicit assumption (since about November or so)
that it is

going to process UTF8 data in and send UTF8 data out.

Like most assumptions it gets things wrong sometimes.  There's already
a ticket open in the
Pod-Perldoc RT queue for this issue.  I am planning to release a new
version of Pod-Perldoc
to address it. 

https://rt.cpan.org/Ticket/Display.html?id=77465 is still open. In that
ticket, Chris Nehren describes a workaround and suggests a solution.
But nothing appears to have been done on it.

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Jan 15, 2013

From @jkeenan

On Sat Oct 20 14​:56​:25 2012, jkeenan wrote​:

https://rt.cpan.org/Ticket/Display.html?id=77465 is still open. In that
ticket, Chris Nehren describes a workaround and suggests a solution.
But nothing appears to have been done on it.

I have again posted in the Pod-Perldoc bug queue about this issue.

@p5pRT
Copy link
Collaborator Author

@p5pRT p5pRT commented Dec 14, 2017

From zefram@fysh.org

[rt.cpan.org #77465] has been closed without resolving this issue.
It seems to have resolved a related issue.

I have opened [rt.cpan.org #123878] regarding this issue, formulated as
Pod​::Perldoc​::ToMan requesting UTF-8 output from groff when this is not
justified by the environment. Other perldoc formatters have equivalent
problems too; I haven't raised those issues yet.

Although the problem is located in a CPAN-upstream module, it's serious
enough that I think this core ticket should remain open to track it.
This issue directly impacts the user experience of requesting
documentation from the perl core distro.

-zefram

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.