Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with utf-8 man pages in view/open #1539

Closed
mc-butler opened this issue Aug 16, 2009 · 24 comments
Closed

Dealing with utf-8 man pages in view/open #1539

mc-butler opened this issue Aug 16, 2009 · 24 comments
Assignees
Labels
area: mcview mcview, the built-in text editor prio: low Minor problem or easily worked around ver: 4.7.0-pre1 Reproducible in version 4.7.0-pre1
Milestone

Comments

@mc-butler
Copy link

Important

This issue was migrated from Trac:

Origin https://midnight-commander.org/ticket/1539
Reporter dmartina (dhmartina@….es)
Mentions dmartina@….com, egmont@….com (@egmontkob)
Keywords utf8, man

Weird characters are displayed when viewing/opening man page files.

Note

Original attachments:

@mc-butler
Copy link
Author

Changed by dmartina (dhmartina@….es) on Aug 16, 2009 at 23:38 UTC (comment 1)

  • Cc set to dmartina@….com

nroff filter was run with -Tlatin1. "man" could do the job by itself:

... { zsoelim %f 2>/dev/null || cat %f; } | man -l -Tutf8 - ;; esac

(extensions file, tested in Ubuntu 8.04)

Changes in autoconf scripts are needed as this solution may not be portable to other systems

@mc-butler
Copy link
Author

Changed by angel_il (@ilia-maslakov) on Sep 22, 2009 at 11:10 UTC (comment 2)

  • Milestone changed from 4.7.0-pre3 to 4.7.0-pre4

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Oct 26, 2009 at 16:07 UTC (comment 3)

  • Milestone changed from 4.7.0-pre4 to 4.7

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Oct 29, 2011 at 16:57 UTC (comment 4)

  • Milestone changed from 4.7 to Future Releases
  • Branch state set to no branch

@mc-butler
Copy link
Author

Changed by lemzwerg (@lemzwerg) on Nov 3, 2012 at 13:21 UTC (comment 5)

Ticket #2922 gives a better solution which seems to be portable.

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 19, 2014 at 22:00 UTC (comment 6)

  • Cc changed from dmartina@….com to dmartina@….com, egmont@….com

Friendly ping :)

After 5 years, this is still an issue.

On Ubuntu 14.04, standing in <mc_source>/doc/man/ru, typing "man ./mc.1" brings up the manual correctly in less, but "mcview mc.1" (or F3 in mc) does something quite broken.

Since creating the ticket, UTF-8 became way more adopted and is definitely the standard by now. Also, systems (at least Linuxes) have upgraded their groff package to a new version that properly supports UTF-8.

You can just type "man mc" or similar in the command line, and all the accents appear correctly at least for those languages that are supported by all graphical terminal emulators nowadays: left-to-right languages without combining characters (e.g. latin, cyrillic, greek, CJK scripts).

This should work equally good, out of the box in mc in UTF-8 environments. (With other locales or legacy systems, it's a nice bonus if we can get them to work, but way less important than UTF-8 and is getting less and less important day by day.)

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 10:41 UTC (comment 7)

See also ticket #3243 comment 1.

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 13:35 UTC

Demo fix

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 13:51 UTC (comment 8)

This is a demo fix that works for me and fixes the accents on Ubuntu Trusty (man-db 2.6.7.1), in UTF-8 environment when you press F3 on a manual page file.

The whole man-zsoelim-tbl-eqn-troff-nroff-idontknowwhat pipeline is terribly complicated (I don't understand it at all), and IMO one of the worst parts of Unix system and should have died out decades ago. It didn't, so we have to live with this...

But, understanding the pipeline and starting in the middle leads to something that probably noone understands and has other subtle bugs (e.g. #2921).

So, in my opinion the best we can do is not to care about any of the internals, just use the most user-facing frontend: the "man" command. This is the command that knows how to take care of everything: invoking the correct filters, handling the charset correctly, etc.

Luckily "man" has an option ("-l") to take a local file rather than looking up the manpage along the standard manpath.

When the output is not a tty (which is the case here), "man" seems to ignore the pager and remove all formatting by default. The option "-P cat" is hence totally useless, but it's a nice safeguard against possible different man implementations, to make sure they don't mess up anything if they invoke the pager.

The environment variable MAN_KEEP_FORMATTING forces "man" to keep the formatting sequences for bold and underlined, even if the output is not a tty.

I don't know if all "man" implementation support the "-l" flag. If not, we need ugly conditions in configure. If yes, we should probably remove checking for nroff from configure, and remove manual invocations of nroff througout the source (that is, change all the code following the current patch's spirit).

We should check if we should pass -D to man to make it more robust (ignore MANOPTS). Also, we should find the option that guarantees that it produces the old-fashioned codes for bold and underlined (as it does by default) rather than real ANSI color escape sequences (which it can somehow be configured to do -- but for mc we should force not to do it).

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 14:02 UTC (comment 9)

Note that a very similar patch in #3243 causes the manpage to be formatted to match the terminal's width there, whereas in this ticket the manpage is formatted for 80 column. I don't know why.

@mc-butler
Copy link
Author

Changed by lemzwerg (@lemzwerg) on Aug 20, 2014 at 15:00 UTC (comment 10)

Regarding the troff pipeline: This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.

Basically, using man seems to be a good option. On the other hand, it's an additional dependency, but I guess that people who are going to look for man pages do have man installed...

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 15:07 UTC (comment 10.11)

Replying to lemzwerg:

This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.

I'm open to any solution that's better than mine :) If you could some up with a patch using groff rather than man, that would be great.

(This whole man pipeline has always been a mystery to me and I'm not planning to get any more familiar with it than absolutely necessary to find one working solution.)

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 17:19 UTC (comment 9.12)

Replying to egmont:

[...] whereas in this ticket the manpage is formatted for 80 column.

So, with my patch, pressing F3 on a compressed manpage formats it to 80 columns, pressing F3 on an uncompressed manpage formats it according to the terminal's width.

Seems that "man" tries to figure out the width by first looking at $COLUMN, if it's not set then querying its stdin's tty settings, finally defaulting to 80.

The solution is either to modify my patch to uncompress to a temporary file and pass that file to man rather than feeding it on its stdin, or to modify mc to set $COLUMN for its child processes.

Anyway, it's a really minor issue compared to the original bug.

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 17:29 UTC (comment 13)

Actually, "man" can take care of uncompressing the given file. This leads to the simplest possible solution for the width discrepancy, see the updated patch.

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 18:01 UTC

Demo fix v2

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Aug 20, 2014 at 18:03 UTC (comment 14)

Patch updated to make it work on Fedora 20 too. Unlike Ubuntu, Fedora's man uses the new-style ANSI color escape sequences for bold/underline rather than the backspace-overwrite sequence. To revert to the old-style backspace-overwrite sequence which is understood by mcview, a "-c" has to be passed to *roff.

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Sep 2, 2014 at 11:30 UTC (comment 15)

  • Status changed from new to accepted
  • Owner set to slavazanko

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Sep 2, 2014 at 11:33 UTC (comment 16)

  • Branch state changed from no branch to on review

Created branch 1539_utf8_man
initial [6229a775353a2e0bfca8fcc402dbf8d2630df459].

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Sep 2, 2014 at 11:33 UTC (comment 17)

  • Branch state changed from on review to approved
  • Votes set to slavazanko

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Sep 2, 2014 at 11:36 UTC (comment 18)

  • Status changed from accepted to testing
  • Resolution set to fixed
  • Branch state changed from approved to merged
  • Votes changed from slavazanko to committed-master

Merged to master. Merge [903c5c9]

@mc-butler
Copy link
Author

Changed by slavazanko (@slavaz) on Sep 2, 2014 at 11:37 UTC (comment 19)

  • Milestone changed from Future Releases to 4.8.13
  • Status changed from testing to closed

News: https://www.midnight-commander.org/wiki/NEWS-4.8.13?action=diff&version=18&old_version=17

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Sep 2, 2014 at 11:38 UTC (comment 20)

Hi Slava,

Could you please also take care of #3243? It's a very similar problem, with identical fix to this one.

There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@. Unfortunately I can't verify my patch on systems other than Ubuntu and Redhat (especially non-Linuxes).

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Sep 3, 2014 at 9:43 UTC (comment 20.21)

Replying to egmont:

There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@.

We already have check of nroff and it's flags in configure.ac (lines 62..110).

@mc-butler
Copy link
Author

Changed by egmont (@egmontkob) on Sep 3, 2014 at 10:39 UTC (comment 21.22)

Replying to andrew_b:

We already have check of nroff and it's flags in configure.ac (lines 62..110).

Yup, but I'm not using its result in my patch :( I haven't completed those bits, sorry.

That's why I think "-c" should be replaced by some placeholder in that patch. I'm not sure, I'm not an autoconf/automake magician.

@mc-butler mc-butler marked this as a duplicate of #2922 Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: mcview mcview, the built-in text editor prio: low Minor problem or easily worked around ver: 4.7.0-pre1 Reproducible in version 4.7.0-pre1
Development

No branches or pull requests

2 participants