Closes #7656, fixes #7582.
utf8proc -> libmojibake (closes #7656, fixes #7582)
I can't afford to tag a lower milestone. ;)
I admit it feels a bit sketchy, and others might be against it, but I'd kind of like to merge this. I see it as a bug fix.
I think it's a bugfix that's got an awful lot of new functionality in it. I'm -1 to merging; I'd like to just do what we can with the serialization stuff and then see if no other bugs fall out so we can release.
Actually, the only new functionality (to the extent that this can be considered new) is in the updated Unicode 7.0.0 code point assignments. (JuliaLang/utf8proc#1) The character width stuff hasn't been put in yet.
I am also -1 to merging now. This could be done in 0.3.1.
Yeah, -1 from me too. I agree with Viral, let's get 0.3 out the door, then we can merge this and if it holds up we can have a minor release of 0.3 that includes this.
Ok, I figured most people would feel that way.
From a packaging POV this would be relatively painful to get in 0.3 or even in 0.3.1. I'd rather get to use libmojibake only in the development version for some time, hopefully upstream will eventually respond so that the fixes can be merged back into utf8proc.
@nalimilan, what's the packaging difficulty? libmojibake.a (just like libutf8proc.a) is linked directly into julia, so it does not need to be distributed as a separate library file.
(I don't think it's reasonable to wait for an upstream merge, from the sound of things.)
This is kind of related to the discussion on LLVM 3.3 vs 3.5 taken place in #6757. If I understand it correctly, LLVM is dynamically linked in the linux distribution packages. I don't know how this is handled with utf8proc and libuv.
We can link LLVM statically too. In general, distros do not like packages linking static versions of libraries that are available as packages. Giving upstream some more time and/or time for distro maintainers to package libmojibake is also a good reason to make this part of 0.3.1 or later.
libmojibake is a tiny library, so it's not nearly as bad as shipping our own LLVM.
If we wait for libmojibake to be merged upstream and/or packaged by distros, then we could easily be looking a 6-18 months before users see Unicode 6/7 support in Julia.
Woops, I thought I had replied but apparently my comment did not get posted. Here it is (now you have replied again, I must say that I agree we shouldn't wait for too long if upstream does not react):
@stevengj The problem is not technical, it's more about distro policies. To use a library in Fedora, you either need to create a new package (and get it reviewed, which takes some time, and does not really make sense for libmojibake), or to get a bundling exception granted. I've obtained temporary exceptions for libuv and Rmath, but Fedora people may start thinking Julia is not yet stable enough if I ask for another exception.
I hope upstream will acknowledge the fact that they are not maintaining utf8proc anymore and let you work on it instead. I packaged utf8proc in Fedora only to use it for Julia, I'd rather not maintain two separate packages just because upstream does not reply...
If you are the maintainer of the utf8proc package for Fedora, one option would be to merge the updated data tables from libmojibake as a bugfix patch.
@stevengj Yeah, I considered that too. But then I need to carry a patch against Julia to keep using utf8proc (although very simple, it will need rebasing from time to time).
@nalimilan Well until now this PR is merged to master and not to release-0.3
Upstream of utf8proc is not an individual, but a non-profit organization, probably with bylaws that make it difficult for them to officially relinquish control over utf8proc even if they wanted to. For example, they require copyright assignments for patches to even be considered. However, they have generally been supportive of us, and have linked libmojibake as a "development fork" from the utf8proc web page.
as discussed by @nalimilan in #7917, make it easier to link -lutf8pro…
…c if desired by the packager
@nalimilan, I just pushed a patch so that you can make USE_SYSTEM_UTF8PROC=1 if you want to continue linking to -lutf8proc.
NEWS for #7917
@stevengj Thanks, this is very helpful!