-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with utf-8 man pages in view/open #1539
Comments
nroff filter was run with -Tlatin1. "man" could do the job by itself:
(extensions file, tested in Ubuntu 8.04)
Changes in autoconf scripts are needed as this solution may not be portable to other systems |
|
|
|
Friendly ping :)
After 5 years, this is still an issue.
On Ubuntu 14.04, standing in <mc_source>/doc/man/ru, typing "man ./mc.1" brings up the manual correctly in less, but "mcview mc.1" (or F3 in mc) does something quite broken.
Since creating the ticket, UTF-8 became way more adopted and is definitely the standard by now. Also, systems (at least Linuxes) have upgraded their groff package to a new version that properly supports UTF-8.
You can just type "man mc" or similar in the command line, and all the accents appear correctly at least for those languages that are supported by all graphical terminal emulators nowadays: left-to-right languages without combining characters (e.g. latin, cyrillic, greek, CJK scripts).
This should work equally good, out of the box in mc in UTF-8 environments. (With other locales or legacy systems, it's a nice bonus if we can get them to work, but way less important than UTF-8 and is getting less and less important day by day.) |
See also ticket #3243 comment 1. |
Demo fix |
This is a demo fix that works for me and fixes the accents on Ubuntu Trusty (man-db 2.6.7.1), in UTF-8 environment when you press F3 on a manual page file.
The whole man-zsoelim-tbl-eqn-troff-nroff-idontknowwhat pipeline is terribly complicated (I don't understand it at all), and IMO one of the worst parts of Unix system and should have died out decades ago. It didn't, so we have to live with this...
But, understanding the pipeline and starting in the middle leads to something that probably noone understands and has other subtle bugs (e.g. #2921).
So, in my opinion the best we can do is not to care about any of the internals, just use the most user-facing frontend: the "man" command. This is the command that knows how to take care of everything: invoking the correct filters, handling the charset correctly, etc.
Luckily "man" has an option ("-l") to take a local file rather than looking up the manpage along the standard manpath.
When the output is not a tty (which is the case here), "man" seems to ignore the pager and remove all formatting by default. The option "-P cat" is hence totally useless, but it's a nice safeguard against possible different man implementations, to make sure they don't mess up anything if they invoke the pager.
The environment variable MAN_KEEP_FORMATTING forces "man" to keep the formatting sequences for bold and underlined, even if the output is not a tty.
I don't know if all "man" implementation support the "-l" flag. If not, we need ugly conditions in configure. If yes, we should probably remove checking for nroff from configure, and remove manual invocations of nroff througout the source (that is, change all the code following the current patch's spirit).
We should check if we should pass -D to man to make it more robust (ignore MANOPTS). Also, we should find the option that guarantees that it produces the old-fashioned codes for bold and underlined (as it does by default) rather than real ANSI color escape sequences (which it can somehow be configured to do -- but for mc we should force not to do it). |
Note that a very similar patch in #3243 causes the manpage to be formatted to match the terminal's width there, whereas in this ticket the manpage is formatted for 80 column. I don't know why. |
Regarding the troff pipeline: This is the very reason why there exists the groff program: It constructs the necessary calls of the pipeline in the right order.
Basically, using man seems to be a good option. On the other hand, it's an additional dependency, but I guess that people who are going to look for man pages do have man installed... |
Replying to lemzwerg:
I'm open to any solution that's better than mine :) If you could some up with a patch using groff rather than man, that would be great.
(This whole man pipeline has always been a mystery to me and I'm not planning to get any more familiar with it than absolutely necessary to find one working solution.) |
Replying to egmont:
So, with my patch, pressing F3 on a compressed manpage formats it to 80 columns, pressing F3 on an uncompressed manpage formats it according to the terminal's width.
Seems that "man" tries to figure out the width by first looking at $COLUMN, if it's not set then querying its stdin's tty settings, finally defaulting to 80.
The solution is either to modify my patch to uncompress to a temporary file and pass that file to man rather than feeding it on its stdin, or to modify mc to set $COLUMN for its child processes.
Anyway, it's a really minor issue compared to the original bug. |
Actually, "man" can take care of uncompressing the given file. This leads to the simplest possible solution for the width discrepancy, see the updated patch. |
Demo fix v2 |
Patch updated to make it work on Fedora 20 too. Unlike Ubuntu, Fedora's man uses the new-style ANSI color escape sequences for bold/underline rather than the backspace-overwrite sequence. To revert to the old-style backspace-overwrite sequence which is understood by mcview, a "-c" has to be passed to *roff. |
|
Created branch 1539_utf8_man |
|
News: https://www.midnight-commander.org/wiki/NEWS-4.8.13?action=diff&version=18&old_version=17 |
Hi Slava,
Could you please also take care of #3243? It's a very similar problem, with identical fix to this one.
There's also some configure check that verifies if nroff supports -c, I haven't paid attention to that. Maybe the hardcoded -c could be replaced by some @NROFF_WHATEVER@. Unfortunately I can't verify my patch on systems other than Ubuntu and Redhat (especially non-Linuxes). |
Replying to egmont:
We already have check of nroff and it's flags in configure.ac (lines 62..110). |
Replying to andrew_b:
Yup, but I'm not using its result in my patch :( I haven't completed those bits, sorry.
That's why I think "-c" should be replaced by some placeholder in that patch. I'm not sure, I'm not an autoconf/automake magician. |
Important
This issue was migrated from Trac:
dmartina
(dhmartina@….es)dmartina@….com
,egmont@….com
(@egmontkob)utf8
,man
Weird characters are displayed when viewing/opening man page files.
Note
Original attachments:
egmont
(@egmontkob) onAug 20, 2014 at 13:35 UTC
egmont
(@egmontkob) onAug 20, 2014 at 18:01 UTC
The text was updated successfully, but these errors were encountered: