Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix TextManager.GetLocalizedTextList returning spurious ^M characters #2514

Conversation

petchema
Copy link
Collaborator

I noticed lines like that in Player.log when french localization is installed (notice the ^Ms):

Added Commoners questor: ns=233508 bk=196879 name=Vannanna Copperford building=The Toad and^M Wolf^M factionId=518

After some investigation, it comes from Internal_String.csv / TavernsA and TavernsB that are split on \n
https://github.com/Interkarma/daggerfall-unity/blob/master/Assets/Scripts/Game/TextManager.cs#L600

That issue may be specific to non-Windows operating systems that interpret \n in different ways?

Fix source:
https://ervinkosch.info/c-split-string-on-crlf-cr-or-lf/

I noticed lines like that in Player.log when french localization is
installed (notice the ^Ms):

Added Commoners questor: ns=233508 bk=196879 name=Vannanna
Copperford building=The Toad and^M Wolf^M factionId=518

After some investigation, it comes from Internal_String.csv / TavernsA
and TavernsB that are split on \n
https://github.com/Interkarma/daggerfall-unity/blob/master/Assets/Scripts/Game/TextManager.cs#L600

That issue may be specific to non-Windows operating systems that
interpret \n in different ways?

Source:
https://ervinkosch.info/c-split-string-on-crlf-cr-or-lf/
@petchema petchema added bug mac Something specific to Mac, that doesn't affect other platforms. linux Something specific to Linux, that doesn't affect other platforms. labels Apr 27, 2023
@Interkarma
Copy link
Owner

Interesting. There are many places where a list is split this way. Does it only happen with building names, or on other lists as well?

And does it happen with master CSV files also, or only with input CSV files?

@petchema
Copy link
Collaborator Author

petchema commented Apr 27, 2023

Aha, it doesn't happen with master CSV files, because while their lines are generally CRLF-terminated, line breaks inside list values are single LFs.

0107540       A   p   p   r   e   n   t   i   c   e  \n   T   h   e    
0107560   W   a   r   r   i   o   r  \n   T   h   e       L   a   d   y
0107600  \n   T   h   e       T   o   w   e   r  \n   T   h   e       A
0107620   t   r   o   n   a   c   h  \n   T   h   e       T   h   i   e
0107640   f   "  \r  \n   s   e   a   s   o   n   N   a   m   e   s   ,
0107660   "   F   a   l   l  \n   S   p   r   i   n   g  \n   S   u   m
0107700   m   e   r  \n   W   i   n   t   e   r   "  \r  \n   l   o   n
0107720   g   D   a   t   e   T   i   m   e   F   o   r   m   a   t   S
0107740   t   r   i   n   g   ,   "   {   0   :   0   0   }   :   {   1
0107760   :   0   0   }   :   {   2   :   0   0   }       o   n       {
0110000   3   }   ,       {   4   }   {   5   }       o   f       {   6
0110020   :   0   0   }   ,       3   E   {   7   }   "  \r  \n   d   a
0110040   t   e   T   i   m   e   F   o   r   m   a   t   S   t   r   i

Some text editors may not preserve that subtle property though.

@KABoissonneault
Copy link
Collaborator

Note that Git on Windows has a tendency to automatically replace CRLF with LF when you push, and LF with CRLF when you pull. This can be turned off, but is normally turned on by default nowadays. This could be one source of "corrupting" the files?

@petchema
Copy link
Collaborator Author

petchema commented Apr 27, 2023

Master CSV files are versioned in a ZIP archive, so I assume line endings are genuine.

French CSV files are exchanged using GitHub, so it's possible that this modified them compared to how they've been generated. That's a question for @Daneel53

But anyway, depending on those files having this exact mix of line endings is too brittle. How code expectations should be relaxed is the question.

@Interkarma
Copy link
Owner

Thank for you the fix! :) I've tested your change and it handles even a mix of CRLF and LF within the same list. It doesn't introduce any regressions I can find. I'm happy to merge this unless you have some additional concerns you want to look at first?

@petchema
Copy link
Collaborator Author

petchema commented Apr 28, 2023

No, I don't have anything to change in this PR.
I was just wondering if that was the right place to handle this line ending issue, or if it could/should have been handled earlier. But if you're okay with the current fix, you can merge it.

On a related note, it would help to remove the two extra ^M in master CSVs that may not be compatible with all spreadsheet programs:
https://forums.dfworkshop.net/viewtopic.php?p=65451#p65451
PS In fact that's necessary otherwise WeaponStoresB becomes an 11 items list instead of 10, and The Odd Blades becomes The Odd Blacksmith!

@Interkarma
Copy link
Owner

On a related note, it would help to remove the two extra ^M in master CSVs that may not be compatible with all spreadsheet programs.

Thank you for this. It appears the CSV export from Unity will sometimes add the closing quote on a newline - no idea why. I might have to sanitize this every time the CSV is exported.

@Interkarma
Copy link
Owner

I'll merge this one so it's being used in testing.

@Interkarma Interkarma merged commit aabc31e into Interkarma:master Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug linux Something specific to Linux, that doesn't affect other platforms. mac Something specific to Mac, that doesn't affect other platforms.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants