Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Unac #15306

Closed
wants to merge 4 commits into from

4 participants

@smgoller

This formula is for unac, a C library for removing accents from a string. I'm submitting this formula because it's required for flactag, the program I really want to get into homebrew. Its home page is at http://flactag.sourceforge.net/ . If/when this formula gets accepted, I'll be submitting one for flactag.

unac is a very stable project (it hasn't been modified for quite a while) and it seems at this point to be maintained by debian. This formula pulls the latest source from them, as well as their patches. The local patches are made to get things to build properly on Mac OS.

This also exists in macports.

smgoller added some commits
@smgoller smgoller Add formula for unac.
unac is a C library and command that removes accents from a string. For instance the string été will become ete. It provides a command line interface that removes accents from a input flow or a string given in argument (unaccent command).

This package is what I would consider to be extremely stable. Even though the project was somewhat abandoned in the 2002-2004 timeframe, Debian has continued to ensure the package works. The formula gets the code and patches from them, then makes local patches in order to compile properly under Mac OS.
0a0cad3
@smgoller smgoller Add homepage for project. c6ab842
@adamv

autoconf and automake should use :symbol syntax too I think?

@adamv

indentation

@adamv

Does this...break ruby?

Does it? I got the test from mistym, who knows way more about ruby than I do. I didn't write it.

Ruby 1.8 doesn't care because all chars are just bytes; it has no concept of encoding. Ruby 1.9 would throw a fit though.

Add this line as the very first in the line and it works on Ruby 1.9 too:

# -*- coding: utf-8 -*-

@adamv

Please document the DATA patch.

@smgoller smgoller commented on the diff
Library/Formula/unac.rb
@@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
@smgoller
smgoller added a note

Emacs added the coding line. Should I remove it, even though it indicates there's UTF-8 in this formula?

@mistydemeo Owner

No, definitely keep it.

@jacknagel Owner

Would it make more sense to try and handle this globally, i.e. by adding -KU to our shebang lines and the places where we invoke the interpreter directly?

@mistydemeo Owner

I thought so, but I can't seem to find the right magic invocation. -KU according to the manpage is for kanji and in practice I was still getting encoding errors:

invalid multibyte escape: /^\037\213/ (SyntaxError)
invalid multibyte escape: /^\037\235/
invalid multibyte escape: /^\xFD7zXZ\x00/

Which is weird. Similar story with -Eutf-8:utf-8.

@jacknagel Owner

Well, those sequences aren't utf-8 (the last one even has an embedded null!). I suppose we have to compile them with /n to disable multibyte interpretation under 1.9.

@mistydemeo Owner

Hm, but if they're not valid utf-8 why does it work with the utf-8 magic comment?

@jacknagel Owner

Because we are talking about two different encodings here. The first is the source encoding, i.e. the encoding of the actual bytes that make up the file. This is what the magic comment and command-line switch address.

The second is the encoding used by the Regexp engine when compiling /^\037\213/, etc., which under 1.9 is utf-8. The warnings you see are generated after the source file is loaded.

IOW, the sequence "/^\037\213/" is perfectly valid utf-8 (obviously it's also valid ASCII). The file can thus be read as utf-8. But when those escapes are interpreted later, it generates the sequence 0x001F 0x008B, which is not a valid utf-8 sequence, and under 1.9 the regexp engine is not encoding agnostic and needs to be told to compile them without an encoding, e.g.

[irb(main)]$ /^\037\213/
SyntaxError: (irb):4: invalid multibyte escape: /^\037\213/
    from /usr/local/opt/ruby/bin/irb:12:in `<main>'
[irb(main)]$ /^\037\213/n
 ====> /^\037\213/n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mistydemeo
Owner

It doesn't work with the system automake on Xcode 3.2.6, but can use the system versions of autoconf and libtool.

@adamv
Owner

Should note the autotools version requirements.

@smgoller

To be honest, I'm not sure what the autotools version requirements are. I'm running Mountain Lion with the latest Xcode.

@mistydemeo
Owner

@smgoller Just needs a note about why any hard dependencies are required - in this case automake requires some version newer than the one which used to ship with Xcode.

@adamv adamv closed this pull request from a commit
@smgoller smgoller unac 1.8.0
unac is a C library and command that removes accents from a string.

Closes #15306.

Signed-off-by: Adam Vandenberg <flangy@gmail.com>
da8f135
@adamv adamv closed this in da8f135
@adamv
Owner

Sorry for the delay in pulling this; thanks for the submission.

@smgoller
@yourabi yourabi referenced this pull request from a commit
@smgoller smgoller unac 1.8.0
unac is a C library and command that removes accents from a string.

Closes #15306.

Signed-off-by: Adam Vandenberg <flangy@gmail.com>
2f4526c
@dholm dholm referenced this pull request from a commit in dholm/homebrew
@smgoller smgoller unac 1.8.0
unac is a C library and command that removes accents from a string.

Closes #15306.

Signed-off-by: Adam Vandenberg <flangy@gmail.com>
9c7364b
@guyzmo guyzmo referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@cooljeanius cooljeanius referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@rajeeja rajeeja referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 4, 2012
  1. @smgoller

    Add formula for unac.

    smgoller authored
    unac is a C library and command that removes accents from a string. For instance the string été will become ete. It provides a command line interface that removes accents from a input flow or a string given in argument (unaccent command).
    
    This package is what I would consider to be extremely stable. Even though the project was somewhat abandoned in the 2002-2004 timeframe, Debian has continued to ensure the package works. The formula gets the code and patches from them, then makes local patches in order to compile properly under Mac OS.
  2. @smgoller

    Add homepage for project.

    smgoller authored
  3. @smgoller
Commits on Oct 9, 2012
  1. @smgoller
This page is out of date. Refresh to see the latest.
Showing with 54 additions and 0 deletions.
  1. +54 −0 Library/Formula/unac.rb
View
54 Library/Formula/unac.rb
@@ -0,0 +1,54 @@
+# -*- coding: utf-8 -*-
@smgoller
smgoller added a note

Emacs added the coding line. Should I remove it, even though it indicates there's UTF-8 in this formula?

@mistydemeo Owner

No, definitely keep it.

@jacknagel Owner

Would it make more sense to try and handle this globally, i.e. by adding -KU to our shebang lines and the places where we invoke the interpreter directly?

@mistydemeo Owner

I thought so, but I can't seem to find the right magic invocation. -KU according to the manpage is for kanji and in practice I was still getting encoding errors:

invalid multibyte escape: /^\037\213/ (SyntaxError)
invalid multibyte escape: /^\037\235/
invalid multibyte escape: /^\xFD7zXZ\x00/

Which is weird. Similar story with -Eutf-8:utf-8.

@jacknagel Owner

Well, those sequences aren't utf-8 (the last one even has an embedded null!). I suppose we have to compile them with /n to disable multibyte interpretation under 1.9.

@mistydemeo Owner

Hm, but if they're not valid utf-8 why does it work with the utf-8 magic comment?

@jacknagel Owner

Because we are talking about two different encodings here. The first is the source encoding, i.e. the encoding of the actual bytes that make up the file. This is what the magic comment and command-line switch address.

The second is the encoding used by the Regexp engine when compiling /^\037\213/, etc., which under 1.9 is utf-8. The warnings you see are generated after the source file is loaded.

IOW, the sequence "/^\037\213/" is perfectly valid utf-8 (obviously it's also valid ASCII). The file can thus be read as utf-8. But when those escapes are interpreted later, it generates the sequence 0x001F 0x008B, which is not a valid utf-8 sequence, and under 1.9 the regexp engine is not encoding agnostic and needs to be told to compile them without an encoding, e.g.

[irb(main)]$ /^\037\213/
SyntaxError: (irb):4: invalid multibyte escape: /^\037\213/
    from /usr/local/opt/ruby/bin/irb:12:in `<main>'
[irb(main)]$ /^\037\213/n
 ====> /^\037\213/n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+require 'formula'
+
+class Unac < Formula
+ homepage 'http://savannah.nongnu.org/projects/unac'
+ url 'http://ftp.de.debian.org/debian/pool/main/u/unac/unac_1.8.0.orig.tar.gz'
+ sha1 '3e779bb7f3b505880ac4f43b48ee2f935ef8aa36'
+
+ depends_on 'gettext' => :build
+ depends_on :autoconf => :build
+# requires a newer automake than 1.10
+ depends_on :automake => :build
+ depends_on :libtool => :build
+
+ def patches
+ {
+ :p0 => ["http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=patch-libunac1.txt;att=1;bug=623340",
+ "http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=10;filename=patch-unaccent.c.txt;att=1;bug=623340"],
+ :p1 => ["http://ftp.de.debian.org/debian/pool/main/u/unac/unac_1.8.0-6.diff.gz",
+ DATA]
+ }
+ end
+
+ def install
+ system "chmod","+x","./configure"
+ touch "config.rpath"
+ inreplace "autogen.sh", "libtool", "glibtool"
+ system "./autogen.sh"
+ system "./configure", "--disable-debug", "--disable-dependency-tracking",
+ "--prefix=#{prefix}"
+ system "make install"
+ end
+
+ def test
+ `#{bin}/unaccent utf-8 fóó`.chomp == 'foo'
+ end
+end
+
+#
+# configure.ac doesn't properly detect Mac OS's iconv library. This patch fixes that.
+#
+__END__
+diff --git a/configure.ac b/configure.ac
+index 4a4eab6..9f25d50 100644
+--- a/configure.ac
++++ b/configure.ac
+@@ -49,6 +49,7 @@ AM_MAINTAINER_MODE
+
+ AM_ICONV
+
++LIBS="$LIBS -liconv"
+ AC_CHECK_FUNCS(iconv_open,,AC_MSG_ERROR([
+ iconv_open not found try to install replacement from
+ http://www.gnu.org/software/libiconv/
Something went wrong with that request. Please try again.