Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU for Unicode handling? #10

Closed
shermp opened this issue Aug 18, 2018 · 9 comments
Closed

ICU for Unicode handling? #10

shermp opened this issue Aug 18, 2018 · 9 comments

Comments

@shermp
Copy link
Contributor

shermp commented Aug 18, 2018

Hi @NiLuJe

I'm mulling the idea of adding basic freetype2 support, and was having a look at the FBInk codebase to see if I could figure out how to add support, and I've noticed your rant on Kobo's broken libc with regard to unicode support.

I notice that the Kobo firmware appears to include the ICU library (libicu*.so, vers. 4.6). Have you looked into using this library for dealing with strings in FBInk?

The API documentation for ICU 4.6.1 is here

@NiLuJe
Copy link
Owner

NiLuJe commented Aug 18, 2018

I ended up skirting the issue with libu8, and, provided no-one tries to feed us hopelessly broken encoding, that does the job just fine without having to massively rework how strings are handled ;).

(ICU is a very very large hammer to take care of the Unicode issue, and the fact that wchar_t is just hopelessly broken on Kobo probably doesn't help. Plus, the fact that some of our target devices either don't ship it, or ship wildly different versions is another thing against it, because bundling it is not an option: besides the fact that it's C++, and takes forever to build, libicudata is over 25MB in ICU 60.2 ;)).

@shermp
Copy link
Contributor Author

shermp commented Aug 18, 2018

Ah, fair enough. Carry on...

/me keeps forgetting kindles exist 😈

I read a blog post a while back, where the author advocated using UTF-8 internally, and therefore sticking with the standard *char data type. The author argued that many of the most common string operations only care about bytes, and not characters. Also, UTF-8 is a sequence of bytes, so endianess doesn't matter. I found it a rather fascinating read.

@NiLuJe
Copy link
Owner

NiLuJe commented Aug 18, 2018

That's essentially what I ended up going with ;).

I think I may have read that very same article, (if it mentioned doing sanitization/conversions at I/O boundaries, that's the one). But with the hobbled libc, I can't really do the sanitization/conversion bit, since any libc-based locale/multibyte/widechar stuff is basically borked ;).
So I'm just skipping that, and hoping really hard no-one will feed us KOI8-R or something xD.

@shermp
Copy link
Contributor Author

shermp commented Aug 18, 2018

It probably was the same article :p

I've been looking into this area a bit lately, because I'm trying to see if I can add differential support to my VHD library, and filepath strings there are encoded as UTF16BE.

Incidentally, do you know of any good cross platform C file path library?

@NiLuJe
Copy link
Owner

NiLuJe commented Aug 18, 2018

Not really, the only thing that comes to mind is C++ (namely, boost) :/.

@NiLuJe
Copy link
Owner

NiLuJe commented Aug 18, 2018

And I really don't want to say glib on the C side of things, because glib's weird, and I'm not even sure it'd do what you need ;).

@NiLuJe
Copy link
Owner

NiLuJe commented Aug 18, 2018

You might also find something interesting either in stb or some other small libs like that ;).

@shermp
Copy link
Contributor Author

shermp commented Aug 18, 2018

Thanks for the suggestions. I didn't see anything that really struck me as being suitable for my requirements (simple though they may be; path joining and normalization).

@shermp
Copy link
Contributor Author

shermp commented Aug 24, 2018

I had another look at that STB link, and noticed I had missed the stb.h file the first time around.

Oh my... that looks just about perfect :)

@shermp shermp closed this as completed Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants