Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when reading a gedcom file with invalid char's in it, llines trashes the file by replacing the chars with ? #421

Open
stevedum opened this issue Aug 18, 2020 · 0 comments
Labels
Area:NLS Issues with the LifeLines support for NLS (UTF-8, codesets, etc)

Comments

@stevedum
Copy link
Collaborator

stevedum commented Aug 18, 2020

if you have
NewDbProps=codeset=UTF-8
in your .llinesrc file
when llines reads a gedcom file it determines the source lang from the gedcom header and translates
the data from what's on the 1 CHAR line in the header to UTF-8. If it finds untranslatable characters
they are silently replaced with ?, essentially destroying any hints as to what the char's were.
Example, if the user thought he had an ascii gedcom file,
0 HEAD
1 CHAR ASCII
0 @i1@ INDI
1 NAME John/Sm�th
0 TRLR
where the vowel in Smith is hex ef. which happens to be a ISO-8895-1 i umlaut it changes Smith into
Sm?th silently. It would be nice if llines was corrupting a file while reading it in, it at least emitted a
message like "Warning illegal ASCII char found on line 4". Currently there is no way for the lower level translate routines to pass the info back up to the gedcom reading routines. All it would have to do is create a global counter that the translate routine could increment when it replaced a character, and the upper level routine could check the counter and emit a message. it seems iconv_trans already counts bad chars, but the count is tossed when the routine exits.

@memmerto memmerto added the Area:NLS Issues with the LifeLines support for NLS (UTF-8, codesets, etc) label Sep 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area:NLS Issues with the LifeLines support for NLS (UTF-8, codesets, etc)
Projects
None yet
Development

No branches or pull requests

2 participants