Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode (and [autoexec]) improvements #2422

Merged
merged 3 commits into from Apr 23, 2023
Merged

Unicode (and [autoexec]) improvements #2422

merged 3 commits into from Apr 23, 2023

Conversation

FeralChild64
Copy link
Collaborator

@FeralChild64 FeralChild64 commented Apr 19, 2023

Since my previous [autoexec]-related PR was just merged, here is another one :) As the 1st commit is just a file formatting cleanup, and the 2nd one just adds Unicode-related definitions, I suggest to review this PR commit by commit.


Unicode related improvement:

  • the engine can now also convert strings from DOS code page to UTF8
  • the engine can now handle non-normalized Unicode input

[autoexec] section improvements:

  • CONFIG -axadd <line> command now converts the line from current DOS code page to UTF-8
  • CONFIG -axtype now converts the [autoexec] section to currently set DOS code page

Remarks:

  • the easiest way to test non-normalized Unicode support is to alter encode.sh or encode.batscript (change nfc to nfkd), and execute it to generate incorrectly normalized translation files; such translations should now work
  • the Z:\AUTOEXEC.BAT is not translated from UTF-8 to DOS code page yet (only CONFIG -axtype works with UTF-8); implementing this correctly needs some significant effort - my branch https://github.com/dosbox-staging/dosbox-staging/tree/fc/autoexec-1 mostly does it, but there are still some edge cases which are not resolved yet
  • to test the CONFIG -axtype, try the following sequence of command:
KEYB pl 667
CONFIG -axadd ąćęłńóśż
CONFIG -axtype
KEYB pl 668
CONFIG -axtype
KEYB us 437
CONFIG -axtype

To type accented letters above - in keyboard layout pl press Right ALT+letter key combination. Explanation: both 667 and 668 are code pages for Polish language, but mutually incompatible. The 437 does not support Polish, most needed accented characters are absent.

@FeralChild64 FeralChild64 self-assigned this Apr 19, 2023
@FeralChild64 FeralChild64 added the enhancement New feature or enhancement of existing features label Apr 19, 2023
@FeralChild64 FeralChild64 marked this pull request as ready for review April 19, 2023 20:27
@FeralChild64 FeralChild64 changed the title Unicode and autoexec improvements Unicode (and [autoexec]) improvements Apr 19, 2023
@ThomasEricB
Copy link
Contributor

This is pretty good @FeralChild64 !

src/misc/messages.cpp Outdated Show resolved Hide resolved
src/misc/messages.cpp Outdated Show resolved Hide resolved
src/misc/programs.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
src/misc/unicode.cpp Outdated Show resolved Hide resolved
@kcgen kcgen self-requested a review April 21, 2023 19:06
@kcgen
Copy link
Member

kcgen commented Apr 21, 2023

Even though all this Unicode management is a sophisticated task, the code is extremely natural to read with very clear variable and function names, bool names, and extensive self-documenting lambas (along with all the error logging in the labdas to avoid repetition).

This is quite a masterpiece of Unicode handling @FeralChild64!

All of my comments are just minor simplifications or suggestions.

I'm not sure how to type the Polish characters (as I routinely don't use the `Alt+combos), however at least testing w/ the sanitizers and passing in batch files is OK:

2023-04-21_12-11

If there are any copy-and-paste (or bat files) I can help test with, happy to run them through if you can attach any.

@FeralChild64
Copy link
Collaborator Author

@kcgen

I'm not sure how to type the Polish characters

Right ALT + letter (has to be the right one, left ALT won't work).

Copy link
Member

@kcgen kcgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to merge; all comments addressed and passing tests. 🚀

@kcgen kcgen merged commit 608ca8f into main Apr 23, 2023
58 checks passed
@FeralChild64 FeralChild64 deleted the fc/unicode-1 branch April 30, 2023 04:22
@johnnovak johnnovak added the localisation Issues related to localisation and internationalisation label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or enhancement of existing features localisation Issues related to localisation and internationalisation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants