when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421

stevedum · 2020-08-18T21:57:56Z

if you have
NewDbProps=codeset=UTF-8
in your .llinesrc file
when llines reads a gedcom file it determines the source lang from the gedcom header and translates
the data from what's on the 1 CHAR line in the header to UTF-8. If it finds untranslatable characters
they are silently replaced with ?, essentially destroying any hints as to what the char's were.
Example, if the user thought he had an ascii gedcom file,
0 HEAD
1 CHAR ASCII
0 @i1@ INDI
1 NAME John/Sm�th
0 TRLR
where the vowel in Smith is hex ef. which happens to be a ISO-8895-1 i umlaut it changes Smith into
Sm?th silently. It would be nice if llines was corrupting a file while reading it in, it at least emitted a
message like "Warning illegal ASCII char found on line 4". Currently there is no way for the lower level translate routines to pass the info back up to the gedcom reading routines. All it would have to do is create a global counter that the translate routine could increment when it replaced a character, and the upper level routine could check the counter and emit a message. it seems iconv_trans already counts bad chars, but the count is tossed when the routine exits.

memmerto added the Area:NLS Issues with the LifeLines support for NLS (UTF-8, codesets, etc) label Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421

when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421

stevedum commented Aug 18, 2020 •

edited

when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421

when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421

Comments

stevedum commented Aug 18, 2020 • edited

stevedum commented Aug 18, 2020 •

edited