when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421
Labels
Area:NLS
Issues with the LifeLines support for NLS (UTF-8, codesets, etc)
if you have
NewDbProps=codeset=UTF-8
in your .llinesrc file
when llines reads a gedcom file it determines the source lang from the gedcom header and translates
the data from what's on the 1 CHAR line in the header to UTF-8. If it finds untranslatable characters
they are silently replaced with ?, essentially destroying any hints as to what the char's were.
Example, if the user thought he had an ascii gedcom file,
0 HEAD
1 CHAR ASCII
0 @i1@ INDI
1 NAME John/Sm�th
0 TRLR
where the vowel in Smith is hex ef. which happens to be a ISO-8895-1 i umlaut it changes Smith into
Sm?th silently. It would be nice if llines was corrupting a file while reading it in, it at least emitted a
message like "Warning illegal ASCII char found on line 4". Currently there is no way for the lower level translate routines to pass the info back up to the gedcom reading routines. All it would have to do is create a global counter that the translate routine could increment when it replaced a character, and the upper level routine could check the counter and emit a message. it seems iconv_trans already counts bad chars, but the count is tossed when the routine exits.
The text was updated successfully, but these errors were encountered: