ICU for Unicode handling? #10

shermp · 2018-08-18T00:48:45Z

I'm mulling the idea of adding basic freetype2 support, and was having a look at the FBInk codebase to see if I could figure out how to add support, and I've noticed your rant on Kobo's broken libc with regard to unicode support.

I notice that the Kobo firmware appears to include the ICU library (libicu*.so, vers. 4.6). Have you looked into using this library for dealing with strings in FBInk?

The API documentation for ICU 4.6.1 is here

NiLuJe · 2018-08-18T04:55:16Z

I ended up skirting the issue with libu8, and, provided no-one tries to feed us hopelessly broken encoding, that does the job just fine without having to massively rework how strings are handled ;).

(ICU is a very very large hammer to take care of the Unicode issue, and the fact that wchar_t is just hopelessly broken on Kobo probably doesn't help. Plus, the fact that some of our target devices either don't ship it, or ship wildly different versions is another thing against it, because bundling it is not an option: besides the fact that it's C++, and takes forever to build, libicudata is over 25MB in ICU 60.2 ;)).

shermp · 2018-08-18T05:20:52Z

Ah, fair enough. Carry on...

/me keeps forgetting kindles exist 😈

I read a blog post a while back, where the author advocated using UTF-8 internally, and therefore sticking with the standard *char data type. The author argued that many of the most common string operations only care about bytes, and not characters. Also, UTF-8 is a sequence of bytes, so endianess doesn't matter. I found it a rather fascinating read.

NiLuJe · 2018-08-18T05:49:53Z

That's essentially what I ended up going with ;).

I think I may have read that very same article, (if it mentioned doing sanitization/conversions at I/O boundaries, that's the one). But with the hobbled libc, I can't really do the sanitization/conversion bit, since any libc-based locale/multibyte/widechar stuff is basically borked ;).
So I'm just skipping that, and hoping really hard no-one will feed us KOI8-R or something xD.

shermp · 2018-08-18T06:01:14Z

It probably was the same article :p

I've been looking into this area a bit lately, because I'm trying to see if I can add differential support to my VHD library, and filepath strings there are encoded as UTF16BE.

Incidentally, do you know of any good cross platform C file path library?

NiLuJe · 2018-08-18T06:10:46Z

Not really, the only thing that comes to mind is C++ (namely, boost) :/.

NiLuJe · 2018-08-18T06:12:10Z

And I really don't want to say glib on the C side of things, because glib's weird, and I'm not even sure it'd do what you need ;).

NiLuJe · 2018-08-18T07:17:29Z

You might also find something interesting either in stb or some other small libs like that ;).

shermp · 2018-08-18T12:37:42Z

Thanks for the suggestions. I didn't see anything that really struck me as being suitable for my requirements (simple though they may be; path joining and normalization).

shermp · 2018-08-24T03:10:28Z

I had another look at that STB link, and noticed I had missed the stb.h file the first time around.

Oh my... that looks just about perfect :)

shermp closed this as completed Sep 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICU for Unicode handling? #10

ICU for Unicode handling? #10

shermp commented Aug 18, 2018

NiLuJe commented Aug 18, 2018 •

edited

Loading

shermp commented Aug 18, 2018

NiLuJe commented Aug 18, 2018 •

edited

Loading

shermp commented Aug 18, 2018

NiLuJe commented Aug 18, 2018

NiLuJe commented Aug 18, 2018

NiLuJe commented Aug 18, 2018 •

edited

Loading

shermp commented Aug 18, 2018

shermp commented Aug 24, 2018

ICU for Unicode handling? #10

ICU for Unicode handling? #10

Comments

shermp commented Aug 18, 2018

NiLuJe commented Aug 18, 2018 • edited Loading

shermp commented Aug 18, 2018

NiLuJe commented Aug 18, 2018 • edited Loading

shermp commented Aug 18, 2018

NiLuJe commented Aug 18, 2018

NiLuJe commented Aug 18, 2018

NiLuJe commented Aug 18, 2018 • edited Loading

shermp commented Aug 18, 2018

shermp commented Aug 24, 2018

NiLuJe commented Aug 18, 2018 •

edited

Loading

NiLuJe commented Aug 18, 2018 •

edited

Loading

NiLuJe commented Aug 18, 2018 •

edited

Loading