Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed UTF-8 character #15

Closed
brobr opened this issue Oct 22, 2019 · 4 comments
Closed

Malformed UTF-8 character #15

brobr opened this issue Oct 22, 2019 · 4 comments

Comments

@brobr
Copy link

brobr commented Oct 22, 2019

Hi, when I try to compile gri on a recent slackware64-current it fails with this error during compilation:

utf8 "\xF3" does not map to Unicode at /usr/share/texinfo/Texinfo/ParserNonXS.pm line 1796, line 19280.

Malformed UTF-8 character: \xf3\x70\x65\x7a (unexpected non-continuation byte 0x70, immediately after start byte 0xf3; need 4 bytes, got 1) in pattern match (m//) at /usr/share/texinfo/Texinfo/ParserNonXS.pm line 3364.
Malformed UTF-8 character (fatal) at /usr/share/texinfo/Texinfo/ParserNonXS.pm

On this system we have now:
texinfo-6.7-x86_64-1

the error is in this function-call

# This combines several regular expressions used in '_parse_texi' to
# look at what is next on the remaining part of the line.
# NOTE - this sub has an XS override
sub parse_texi_regex {
my ($line) = @
;

my ($at_command, $open_brace, $asterisk, $single_letter_command,
$separator_match, $misc_text)
= ($line =~ /^@([[:alnum:]][[:alnum:]-]*)
|^({)
|^(*)
|^@(["'~@&}{,.!? \t\n*-^`=:|/\])
|^([{}@,:\t.\f])
|^([^{}@,:\t.\n\f]+)
/x);

if ($open_brace) {
$separator_match = $open_brace;
} elsif ($asterisk) {
($misc_text) = ($line =~ /^([^{}@,:\t.\n\f]+)/);
}

return ($at_command, $open_brace, $asterisk, $single_letter_command,
$separator_match, $misc_text);
}

Would this be a gri or a texinfo problem?

(I have also sent this to bug-texinfo@gnu.org)

@dankelley
Copy link
Owner

My guess is that it's a texinfo problem, since the gri code has not changed in a long while, and has been building cleanly for years. As I recall, there have been some changes in the tex system within linux over the past year or two, but I do not have a linux box so I'm not clear on the details.

@brobr
Copy link
Author

brobr commented Oct 22, 2019

Hi,

I got this answer from texinfo (Gavin Smith)

It is because Texinfo 6.7 changed the default input encoding to UTF-8
and the input is in ISO-8859-1. The easiest way to fix this would be to
convert the input file to UTF-8, but you could also add
"@documentencoding ISO-8859-1" to the file.

Where or how do I set this for the gri-source? (I can then submit a patchfile with the source that would enable the build)

@dankelley
Copy link
Owner

My guess is that this ought to go into doc/gri.texi but I don't know of any rules as to where things go. Perhaps near the top. I no longer have a linux machine on which to test such things.

@brobr
Copy link
Author

brobr commented Oct 22, 2019

Ok with the "@documentencoding ISO-8859-1" pasted at the top of doc/gri.texi file I got the program compiled.

Thanks for the advise.

BTW independently they came up with a patch at linux.questions as well, see:
https://www.linuxquestions.org/questions/slackware-14/sbo-scripts-not-building-on-current-read-1st-post-pls-4175561999/page142.html#post6049595

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants