Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-portable use of nroff #1153

Closed
g-branden-robinson opened this issue Nov 27, 2023 · 15 comments
Closed

non-portable use of nroff #1153

g-branden-robinson opened this issue Nov 27, 2023 · 15 comments

Comments

@g-branden-robinson
Copy link
Contributor

Regarding commit 5bf8b3d ...

Unfortunately nroff -c -Tascii is not very portable. Heirloom doctools nroff (and its ancestor DWB nroff) don't support either option. (Well, they support the -T option, but the option argument ascii is a mystery to them.)

This might not be a problem. If you're assuming groff anyway, then I would do this instead.

MAN2TXT = $(NHGREP) | nroff -man -Tascii -P -cbou

What this does is ensure that no SGR escape sequences are used (-c), and overstriking is not used for boldface (-b) or underlining/"italics" (-u), nor for character composition (-o).

If you want GNU nroff to produce plain Jane ASCII, this is the way to get it. I use it in many of groff's own regression tests.

You furthermore don't need to mess with col in that circumstance. grotty(1) explains:

... In contrast to the terminal
output drivers of some other roff implementations, grotty never
outputs reverse line feeds. There is therefore no need to filter
its output through col(1).

We've been aware of the adjustment parity issue for a while, but hadn't heard from any real-world users who seemed to have a problem with it, so I thank you for giving me a use case. See Savannah #57836. Of course that's only an idea at present, so it will be approximately forever before NetHack builds can rely upon the feature.

Let me know if/how I can be of assistance.

@g-branden-robinson
Copy link
Contributor Author

g-branden-robinson commented Nov 28, 2023

This assumes that the groff options are compatible between
Linux and macOS implementations of groff.

Mac OS X stayed on groff 1.19.2 (2005) for over a decade (presumably due to groff 1.20 adopting GNU GPLv3), until finally dropping groff altogether for macOS Ventura (2022).

There has been an interface change in that time. The -P option I advised about is new to groff 1.23.0 (July 2023). (I would have mentioned that in my original report but forgot that I was the person who put it in--it was almost four years ago. Sorry. :( )

There is a significant number of groff users via Homebrew (enough that we hear from them occasionally via bug reports). Some of these have upgraded to 1.23.0 via that mechanism.

You could test for support for the -P option like this.

NROFF_HAS_P_OPTION=yes
echo | /usr/bin/nroff -P > /dev/null 2>&1 || NROFF_HAS_P_OPTION=

if [ test -n "$NROFF_HAS_P_OPTION" ]
then
   # do one thing
else
  # do another
fi

nroff - is not necessary with any nroff known to me; like many other Bell Labs Unix programs, it reads from the standard input stream by default if not given any operands.

Again, please consider me a resource for *roff issues.

nhcopier pushed a commit that referenced this issue Nov 28, 2023
Following a commit for Issue #1153, g-branden-robinson commented:
> Mac OS X stayed on _groff_ 1.19.2 for over a decade (presumably due to
> _groff_ 1.20 adopting to GNU GPLv3), until finally dropping _groff_
> altogether for macOS Ventura (2022).
>
> There _has_ been an interface change in that time.  The [`-P` option I
> advised about is new to _groff_ 1.23.0 (July 2023)]
> (https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS?h=1.23.0#n86).
> [...]
>
> There is a significant number of _groff_ users via Homebrew (enough that
> we hear from them occasionally via bug reports).  Some of these have
> upgraded to 1.23.0 via that mechanism.
> [...]
>
> `nroff -` is not necessary with any _nroff_ known to me; like many other
> Bell Labs Unix programs, it reads from the standard input stream by default
> if not given any operands.

Action taken:

1. Remove the unnecessary ' -' from the nroff command in Makefile.doc.
2. In the misc.370 file containing make snippets to include, test whether
   groff >= 1.23, and only insert the -P option for 1.23 or greater.
@nhmall
Copy link
Contributor

nhmall commented Nov 28, 2023

Follow-up commit c8f4ad9 changes it so that if the Makefile processing determines that nroff is actually groff , the additional options become:
-Tascii -P -cbou if groff version is 1.23 or greater.
or
-Tascii -cbou if groff version is less than 1.23.

That's assuming that hints/linux.370 are in use.
For example, on Linux:
sh sys/unix/setup.sh sys/unix/hints/linux.370
or
cd sys/unix ; sh setup.sh hints/linux.370 ; cd ../..

For example, on macOS:
sh sys/unix/setup.sh sys/unix/hints/macOS.370
or
cd sys/unix ; sh setup.sh hints/macOS.370 ; cd ../..

@g-branden-robinson
Copy link
Contributor Author

g-branden-robinson commented Nov 28, 2023

Hi Michael,

It chagrins me to say so but this is going to have problems too.

$ /usr/bin/nroff --version
GNU nroff (groff) version 1.22.4
$ echo | /usr/bin/nroff -Tascii -man -cbou
/usr/bin/nroff: invalid option -cbou

What the -P option new to groff 1.23.0 does is pass itself and the next argument to groff(1), the formatter's front end. That, in turn, passes the -cbou option cluster to the postprocessor, grotty(1). (-P is meant to suggest "postprocessor".)

(And as it happens, GNU nroff does not support option clusters, but that's something I'd like to fix for groff 1.24.)

While you could support 3 different scenarios, (a) a non-groff nroff, (b) nroff from groff 1.22.4 and earlier, and (c) nroff from groff 1.23.0 (or later), it might be simpler to just treat case (b) like case (a) and continue using col.

I'm sorry to be the bearer of so much complex news.

Please advise if and how I can be of assistance.

@nhmall
Copy link
Contributor

nhmall commented Nov 28, 2023

I've got a commit, at the ready, that I think will accomplish the following once the Makefile processing has been done:

  1. If groff version is 1.23 or greater
    nroff -man -Tascii -P -cbou .

  2. If groff version is less than 1.23 :
    nroff -man -Tascii -c | col

  3. If non-groff nroff:
    nroff -man | col

@g-branden-robinson: Does that seem like a reasonable result?

Edit: added the -c to result 2, since a contrived test was giving me escape sequences for color when I lied to the Makefile and had the outcome be treated like non-groff, even though it actually is still groff, for the test. It feels like we may be back rather close to where this all began. I'm striving for working commands for the three scenarios above. My only test environment is actually scenario 1.

@g-branden-robinson
Copy link
Contributor Author

g-branden-robinson commented Nov 28, 2023

Hi Michael,

That looks good (after your edit), except that I would also add the -b option to the col calls for both formatters. That is what converts the overstriking sequences like b^Hbo^Hol^Hld^Hd (bold) and _^Hu_^Hn_^Hd_^He_^Hr_^Hl_^Hi_^Hn_^e (underline a.k.a. "italics") into human-readable text.

Regarding your second update to the comment, the -c options to GNU nroff (or troff) mean the same thing (disable color); the -c option to grotty (the terminal output driver for GNU troff, which gets those options after -P) means more than that: it means to pretend that the terminal works like a typewriter. That includes color disablement, but other things, too. The grotty(1) man page attempts to cover these issues.

And I sympathize. Working around these compatibility issues is a significant pain in the butt. It's even worse because nroff output was neglected by AT&T in the 1980s. Many tbl features produce very ugly output on Documenter's Workbench (DWB; AT&T's commercial troff offering) and on its descendant, Heirloom Doctools.

Since these are man pages, I would be remiss if I didn't mention mandoc(1), which is actively maintained and renders man pages well. (Its maintainer, Ingo Schwarze, is also a groff contributor and we are conscientious about not breaking compatibility with each other.) Its downsides are that you can't rely upon it to be around, and it's no good for converting any other sort of *roff document (like the NetHack Guidebook) to text--its avowed mission is man pages only.

EDIT: I should add that you can simulate case 2 even on your groff 1.23.0 environment by just not passing the -P -cbou argument sequence to (GNU) nroff. What it spits out should look exactly like what groff 1.22.4 would produce with the same flags, maybe modulus some cosmetic changes to the groff man package's output and bug fixes to tbl(1) rendering.

@nhmall
Copy link
Contributor

nhmall commented Nov 28, 2023

Too much good discussion and activity going on for this to remain a closed issue..re-opening.

If I've followed correctly, I think we've arrived here:

  1. If groff version is 1.23 or greater:
    nroff -man -Tascii -P -cbou

  2. If groff version is less than 1.23:
    nroff -man -Tascii -c | col -b

  3. If non-groff nroff:
    nroff -man | col -b

@nhmall nhmall reopened this Nov 28, 2023
@g-branden-robinson
Copy link
Contributor Author

Yes, this looks good to me. Just don't let that trailing . creep into the actual code in case 1. If you tell nroff to format a directory, it will silently ignore you. ;-)

@nhmall
Copy link
Contributor

nhmall commented Nov 28, 2023

Just don't let that trailing . creep into the actual code in case 1.

I just thought it was a spec on my screen :)
I've edited my previous comment (again) to correct.

@pat-rankin
Copy link

Just so we don't lose sight of the goal: we want to include the lowest common denominator (plain text) in .txt files that are included in the source and binary distributions, but end users who build for themselves might want to produce versions of those files that include bold and/or color and so forth. The Makefiles probably ought to default to the latter but keep it as straightforward as possible to achieve the former since some end users might want to regenerate plain text themselves.

Just a comment about the justification padding "parity" issue.

There are two aspects: initial parity being different between formatter versions resulting in big diffs when different developers build and check in new copies of 'roff generated output. That's a pain when the same input produces different output but it could happen for any alternate tools. Automating their generation on one system was supposed to eliminate that. Updating that system has resulted in regression but that will work itself out sooner or later. (Sooner with your assistance. Thanks!)

The second aspect is that if a change adds or removes an odd number of lines, the parity for the rest of the file will be changed regardless of what it was at the start. Maybe it could be reset for each N paragraphs or some such to end up back in sync. Even that can't help with a small change provoking different page breaks and consequently big diffs for minor changes to long documents. I don't see how to improve that, other than just live with it. [We used to live with it by manually regenerating the .txt files sporadically rather than automatically for every revision. Dealing with the first aspect via automation resulted in bigger ramifications than anticipated.]

All in all, maybe not worth spending many cycles fretting over. [Something that might be more worthwhile: when inserting padding, rather than splitting the line into words and inserting spaces from left to right or right to left, make up to three passes after the split. First give priority to inserting after sentence ending punctuation, then if more padding is still needed, give priority after other punctuation, and if still needed, after arbitrary words. Each pass could operate right-to-left or vice versa or maybe alternate among passes rather than lines. Perhaps something like that has already been implemented; I'm using an old version and it clearly doesn't behave that way--which I found surprising when I noticed.]

@g-branden-robinson
Copy link
Contributor Author

g-branden-robinson commented Nov 28, 2023

Hi Pat,

I'll postpone the adjustment parity discussion because it's a distinguishable topic from the matter of portable nroff invocation.

Just so we don't lose sight of the goal: we want to include the lowest common denominator (plain text) in .txt files that are included in the source and binary distributions, but end users who build for themselves might want to produce versions of those files that include bold and/or color and so forth. The Makefiles probably ought to default to the latter

I'm not sure that's necessary. Both mandoc and man-db are capable of rendering a man page given only a file name (old school Unix man(1) commands were not so friendly). What I term Brouwer/Lucifredi man (since it seems to have no other name) stopped being shipped by Fedora/Red Hat and SuSE over 10 years ago IIRC, and was moribund well before that. For NetHack players, I would think that leaves what remains of proprietary/commercial Unix, and non-POSIX platforms that probably lack a man command altogether. (EDIT: FreeBSD man(1) is a shell script. I think macOS now uses this script, or a fork of it, as of Ventura (2022) but have no access to a box of sufficiently recent vintage to examine the situation myself.)

$ MANWIDTH=65 man doc/dlb.6 |less -R
DLB(6)                    Games Manual                   DLB(6)

NAME
     dlb - NetHack data librarian

SYNOPSIS
     dlb { xct } [ vfIC ] arguments...  [ files...  ]

DESCRIPTION
     Dlb is a file archiving tool in the spirit (and tradition)
...snip...
$ mandoc doc/dlb.6 | ul | head
DLB(6)                           Games Manual                           DLB(6)

NAME
       dlb - NetHack data librarian

SYNOPSIS
       dlb { xct } [ vfIC ] arguments...  [ files...  ]

DESCRIPTION
       Dlb is a file archiving tool in the spirit (and tradition) of tar for

On the other hand, it looks like some of the man pages, like this one, are not ready for rendering straight from the tree; I see some stuff involving string definitions that, judging by doc/dlb.txt, must be getting handled by the Makefile.

This suggests another possibility; ship some .6.in files for the man pages, have the Makefile generate .6 files from them, say by using sed(1) to replace @UNLIKELY_CHARACTER_SEQUENCES@ in the .in files, and then boom, upon any build you have man pages ready to read, in the tree or out of it.

And if you still want to preformat plain text versions, you still can. .txt would continue to depend on .6.

groff does something similar. In one place it's a bit more heavyweight; I wanted to generate two man pages, groff_man(7) and groff_man_style(7) from a single maintained document (to avoid information desync). I turned to another old classic Unix tool, m4, which gave me an excuse to learn it at last. What I wanted to do would have looked insanely hairy with sed because I would need multi-line replacements. But m4 has its own hazards--you have to be careful to quote certain English words that are part of its own language. (The ones that reliably blew up in my face were define and include.) This would silently corrupt the output, so what I did was write a Makefile target to catch my own blunders.) So I recommend sed over m4 unless you really need an elephant gun. It's not like sed is weak; it's Turing-complete, so pretty darned powerful, but I've seen its syntax discourage even hard-bitten C programmers.

but keep it as straightforward as possible to achieve the former since some end users might want to regenerate plain text themselves.

I'm happy to help, but I have a somewhat dark suspicion that few people read NetHack's man pages in any format. Or the Guidebook, for that matter... :-/ (That doesn't mean I'm not happy to help improve them.)

@Rhialto
Copy link

Rhialto commented Nov 29, 2023

Some versions of m4 have an option to make its keywords less likely to be confused with running text. But then you'd have the portability issues of different versions of m4...
This is from m4(1) on NetBSD 9.3:

     -P, --prefix-builtins
             Prefix all built-in macros with `m4_'.  For example, instead of
             writing define, use m4_define.

@g-branden-robinson
Copy link
Contributor Author

g-branden-robinson commented Nov 30, 2023

Some versions of m4 have an option to make its keywords less likely to be confused with running text. But then you'd have the portability issues of different versions of m4...

Hi Rhialto,

I'm aware of it. I didn't want to dig deeply into the issue because I didn't want to make groff's build dependencies tighter for the sake of a man page I was maintaining. Even though my approach requires one feature that wasn't in Seventh Edition Unix, the -D option, we haven't gotten any reports of breakage (and we should have, since an m4 not supporting it would crash the build upon failure to open a file named -D_groff_man_not_style).

Regards,
Branden

@g-branden-robinson
Copy link
Contributor Author

Hi Pat,

I think all that remains of this is a discussion of adjustment parity. I have some comments but they're pretty off-topic for this bug, and have much more to do with *roffs past and present than anything directly to do with NetHack.

Do you mind if I add you to the CC list of the aforementioned Savannah #57836 bug?

Regards,
Branden

@g-branden-robinson
Copy link
Contributor Author

@pat-rankin : Ping, re: adjustment parity and "Do you mind if I add you to the CC list of the aforementioned Savannah #57836 bug?"

@pat-rankin
Copy link

I didn't intend to ignore the previous question, it just worked out that way. I don't think I have enough interest to become involved in how that bug gets resolved.

I hadn't thought about whether the parity should alternate as usual after a line that didn't need any padding. Keeping it the same makes sense. That should be generalized to "a line whether every separation needed the same amount of padding."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants