Unac #15306

wants to merge 4 commits into
54 Library/Formula/unac.rb
@@ -0,0 +1,54 @@
+# -*- coding: utf-8 -*-
smgoller Oct 4, 2012

Emacs added the coding line. Should I remove it, even though it indicates there's UTF-8 in this formula?

mistydemeo Oct 4, 2012

No, definitely keep it.

jacknagel Oct 5, 2012

Would it make more sense to try and handle this globally, i.e. by adding -KU to our shebang lines and the places where we invoke the interpreter directly?

mistydemeo Oct 5, 2012

I thought so, but I can't seem to find the right magic invocation. -KU according to the manpage is for kanji and in practice I was still getting encoding errors:

invalid multibyte escape: /^\037\213/ (SyntaxError)
invalid multibyte escape: /^\037\235/
invalid multibyte escape: /^\xFD7zXZ\x00/

Which is weird. Similar story with -Eutf-8:utf-8.

jacknagel Oct 5, 2012

Well, those sequences aren't utf-8 (the last one even has an embedded null!). I suppose we have to compile them with /n to disable multibyte interpretation under 1.9.

mistydemeo Oct 5, 2012

Hm, but if they're not valid utf-8 why does it work with the utf-8 magic comment?

jacknagel Oct 5, 2012

Because we are talking about two different encodings here. The first is the source encoding, i.e. the encoding of the actual bytes that make up the file. This is what the magic comment and command-line switch address.

The second is the encoding used by the Regexp engine when compiling /^\037\213/, etc., which under 1.9 is utf-8. The warnings you see are generated after the source file is loaded.

IOW, the sequence "/^\037\213/" is perfectly valid utf-8 (obviously it's also valid ASCII). The file can thus be read as utf-8. But when those escapes are interpreted later, it generates the sequence 0x001F 0x008B, which is not a valid utf-8 sequence, and under 1.9 the regexp engine is not encoding agnostic and needs to be told to compile them without an encoding, e.g.

[irb(main)]$ /^\037\213/
SyntaxError: (irb):4: invalid multibyte escape: /^\037\213/
    from /usr/local/opt/ruby/bin/irb:12:in `<main>'
[irb(main)]$ /^\037\213/n
 ====> /^\037\213/n
+require 'formula'
+class Unac < Formula
+ homepage 'http://savannah.nongnu.org/projects/unac'
+ url 'http://ftp.de.debian.org/debian/pool/main/u/unac/unac_1.8.0.orig.tar.gz'
+ sha1 '3e779bb7f3b505880ac4f43b48ee2f935ef8aa36'
+ depends_on 'gettext' => :build
+ depends_on :autoconf => :build
+# requires a newer automake than 1.10
+ depends_on :automake => :build
+ depends_on :libtool => :build
+ def patches
+ {
+ :p0 => ["http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=patch-libunac1.txt;att=1;bug=623340",
+ "http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=10;filename=patch-unaccent.c.txt;att=1;bug=623340"],
+ :p1 => ["http://ftp.de.debian.org/debian/pool/main/u/unac/unac_1.8.0-6.diff.gz",
+ }
+ end
+ def install
+ system "chmod","+x","./configure"
+ touch "config.rpath"
+ inreplace "autogen.sh", "libtool", "glibtool"
+ system "./autogen.sh"
+ system "./configure", "--disable-debug", "--disable-dependency-tracking",
+ "--prefix=#{prefix}"
+ system "make install"
+ end
+ def test
+ `#{bin}/unaccent utf-8 fóó`.chomp == 'foo'
+ end
+# configure.ac doesn't properly detect Mac OS's iconv library. This patch fixes that.
+diff --git a/configure.ac b/configure.ac
+index 4a4eab6..9f25d50 100644
+--- a/configure.ac
++++ b/configure.ac
+@@ -49,6 +49,7 @@ AM_MAINTAINER_MODE
++LIBS="$LIBS -liconv"
+ iconv_open not found try to install replacement from
+ http://www.gnu.org/software/libiconv/