Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/utf 8 #3935

Merged
merged 2 commits into from
Aug 3, 2023
Merged

Fix/utf 8 #3935

merged 2 commits into from
Aug 3, 2023

Conversation

progweb
Copy link
Contributor

@progweb progweb commented Jul 31, 2023

Text rendering doesn't work with utf-8...

I can't render chinese, arabic... or special chars as éèûîà...

It seems that the string function to convert utf-8 to unicode is quiet simple.

Ref: #3933

[Note]: OpenImageIO is used in the gpx2video project.

@progweb
Copy link
Contributor Author

progweb commented Jul 31, 2023

Before:

orig

@progweb
Copy link
Contributor Author

progweb commented Jul 31, 2023

After:

patch

@lgritz
Copy link
Collaborator

lgritz commented Jul 31, 2023

Can you please base the PR on the current master so you aren't comparing across branches and seeing hundreds of files in the diff? Thanks.

@lgritz
Copy link
Collaborator

lgritz commented Jul 31, 2023

Also, would you mind please making sure to sign your commits to invoke the DCO? Thanks.

@progweb progweb force-pushed the fix/utf-8 branch 2 times, most recently from a0115a2 to 76f527f Compare July 31, 2023 19:51
@jessey-git
Copy link
Contributor

This is somewhat problematic as large parts of <codecvt> including wstring_convert and codecvt_utf8 have been deprecated since c++17 [1]. They didn't have a long life due to a variety of reasons. Once we move to c++20 you'll get warnings/errors in many environments [2].

[1] https://en.cppreference.com/w/cpp/locale/wstring_convert
[2] https://godbolt.org/z/PMn4hMK18

@apetrynet
Copy link
Contributor

I like the intent of this PR. It would be great to burn artists actual names in a slate for instance.

@lgritz
Copy link
Collaborator

lgritz commented Jul 31, 2023

@jessey-git I'm not sure exactly why it seems to pass CI for us (including the test where we do use C++20), but Godbolt certainly makes a compelling case that this is problematic.

What's the preferred approach these days?

Can the implementation we are using in Strutil::utf8_to_unicode be repaired to fix whatever shortcoming inspired this PR?

@jessey-git
Copy link
Contributor

For the windows builder it might be because it's using -DCMAKE_CXX_STANDARD=11 during the build?

For the linux builders I think it's a combination of not using a warning level that would show the warnings and/or not using -stdlib=libc++ as I did in my example. Really weird that base gcc and its std library aren't showing the deprecations...

Unfortunately I don't quite know what's state-of-the-art in terms of unicode in c++ nowadays.

@lgritz
Copy link
Collaborator

lgritz commented Jul 31, 2023

It was not me who did it, but I have a feeling that the unreliability of wstring_convert (not existing prior to C++11, and perhaps also knowing that a deprecation was on the way for C++17), is what inspired the unicode decoding from http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ being added to Strutil. The nice thing about that it that it's small and independent of the std or any other dependencies. The down side, of course, is that there appear to be some sequences it doesn't handle properly, which had never been reported prior to this PR. I still think that if this code can be repaired (or substituted with a different implementation with a license acceptable to us), that would be better than relying on std::wstring_convert. But I definitely am in favor of some kind of fix; @progweb is correct that we should render those codes properly.

@lgritz
Copy link
Collaborator

lgritz commented Jul 31, 2023

@progweb Can you supply any test case that we can try out other solutions (such as whatever made your example above)?

I noticed that the code in Strutil is not the only formulation on that web page it came from; maybe it is the only buggy one and the others work properly? But I don't have a good test.

@progweb
Copy link
Contributor Author

progweb commented Aug 1, 2023

I'm going to try to fix it with an other implementation...

Signed-off-by: Nicolas VIVIEN <nicolas@vivien.fr>
@progweb
Copy link
Contributor Author

progweb commented Aug 1, 2023

Work as expected.

STR:      t   e   s   t   :        é      è      ú      ç      à      ï      ö        €

UTF8:    %74 %65 %73 %74 %3A %20 %C3%A9 %C3%A8 %C3%BA %C3%A7 %C3%A0 %C3%AF %C3%B6 %E2%82%AC
UNICODE: %74 %65 %73 %74 %3A %20   %E9    %E8    %FA    %E7    %E0    %EF    %F6    %20AC

Signed-off-by: Larry Gritz <lg@larrygritz.com>
@lgritz
Copy link
Collaborator

lgritz commented Aug 3, 2023

I took the liberty of adding a test for this and pushing it to your tree. Once we see that it passes CI, I'll do the merge.

@lgritz lgritz merged commit 1bb1ca5 into AcademySoftwareFoundation:master Aug 3, 2023
23 checks passed
@lgritz
Copy link
Collaborator

lgritz commented Aug 3, 2023

Merged, and I'll backport to the next 2.4 release.

Note that I disable the test I added on Windows, because I just couldn't figure out how to embed the unicode on the Windows command line. Seems to work fine for Linux & MacOS. If anybody knows the right way to fix, please suggest and we can re-enable that aspect of the test for Windows as well.

lgritz added a commit to lgritz/OpenImageIO that referenced this pull request Aug 11, 2023
Text rendering doesn't work with utf-8... 

I can't render chinese, arabic... or special chars as éèûîà...

It seems that the string function to convert utf-8 to unicode is quiet
simple.

Fixes AcademySoftwareFoundation#3933 

[Note]: OpenImageIO is used in the [gpx2video
project](https://github.com/progweb/gpx2video).

---------

Signed-off-by: Nicolas VIVIEN <nicolas@vivien.fr>
Signed-off-by: Larry Gritz <lg@larrygritz.com>
Co-authored-by: Larry Gritz <lg@larrygritz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants