diff --git a/Gemfile.default b/Gemfile.default index 2cc79b7..e45e65f 100644 --- a/Gemfile.default +++ b/Gemfile.default @@ -1,3 +1,2 @@ source :rubygems -gem "rbench" gemspec diff --git a/README.md b/README.md index d3c584c..be1f564 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,13 @@ # Babosa -Babosa is a library for creating slugs. It is an extraction and improvement of -the string code from [FriendlyId](http://github.com/norman/friendly_id), -intended to help developers create similar libraries and plugins. +Babosa is a library for creating human-friendly identifiers. Its primary +intended purpose is for creating URL slugs, but can also be useful for +normalizing and sanitizing data. + +It is an extraction and improvement of the string code from +[FriendlyId](http://github.com/norman/friendly_id). I have released this as a +separate library to help developers who want to create libraries similar to +FriendlyId. ## Features / Usage @@ -15,8 +20,8 @@ intended to help developers create similar libraries and plugins. "Jürgen Müller".to_slug.approximate_ascii.to_s #=> "Jurgen Muller" "Jürgen Müller".to_slug.approximate_ascii(:german).to_s #=> "Juergen Mueller" -Currently, only German, Spanish and Serbian are supported. I'll gladly accept -contributions and support more languages. +Supported language currently include Danish, German, Serbian and Spanish. I'll +gladly accept contributions and support more languages. ### Non-ASCII removal @@ -41,17 +46,47 @@ whose length is limited by bytes rather than UTF-8 characters. "Gölcük, Turkey".to_slug.normalize.to_s #=> "golcuk-turkey" +### Other stuff + +Babosa can also generate strings for Ruby method names. (Yes, Ruby 1.9 can use UTF-8 chars +in method names, but you may not want to): + + + "this is a method".to_slug.to_ruby_method! #=> this_is_a_method + "über cool stuff!".to_slug.to_ruby_method! #=> uber_cool_stuff! + + # You can also disallow trailing punctuation chars + "über cool stuff!".to_slug.to_ruby_method(false) #=> uber_cool_stuff + + +You can add not only transliterations, but expansions for some characters if you want: + + Babosa::Characters.add_approximations(:user, { + "0" => "oh", + "1" => "one", + "2" => "two", + "3" => "three", + "." => " dot " + }) + "Web 2.0".to_slug.normalize!(:transliterations => :user) #=> "web-two-dot-oh" ### UTF-8 support Babosa has no hard dependencies, but if you have either the Unicode or ActiveSupport gems installed and required prior to requiring "babosa", these will be used to perform upcasing and downcasing on UTF-8 strings. On JRuby 1.5 -and above, Java's native Unicode support will be used. +and above, Java's native Unicode support will be used instead. Unless you're on +JRuby, which already has excellent support for Unicode via Java's Standard +Library, I recommend using the Unicode gem because it's the fastest Ruby +Unicode library available. If none of these libraries are available, Babosa falls back to a simple module -which only supports Latin characters. I recommend using the Unicode gem where -possible since it's a C extension and is very fast. +which only supports Latin characters. + +This default module is fast and can do very naive Unicode composition to ensure +that, for example, "é" will always be composed to a single codepoint rather +than an "e" and a "´" - making it safe to use as a hash key. But seriously - +save yourself the headache and install a real Unicode library. ### Rails 3 @@ -59,10 +94,11 @@ possible since it's a C extension and is very fast. Most of Babosa's functionality is already present in Active Support/Rails 3. Babosa exists primarily to support non-Rails applications, and Rails apps prior to 3.0. Most of the code here was originally written for FriendlyId. Several -things, like tidy_bytes and ASCII transliteration, were later added to Rails and I18N. +things, like `tidy_bytes` and ASCII transliteration, were later added to Rails +and I18N. Babosa differs from ActiveSupport primarily in that it supports non-Latin -strings by default, and has per-locale transliterations already baked-in. If +strings by default, and has per-locale ASCII transliterations already baked-in. If you are considering using Babosa with Rails 3, you should first take a look at Active Support's [transliterate](http://edgeapi.rubyonrails.org/classes/ActiveSupport/Inflector.html#M000565) @@ -82,8 +118,8 @@ Babosa can be installed via Rubygems: You can get the source code from its [Github repository](http://github.com/norman/babosa). -Babosa is tested to be compatible with Ruby 1.8.6-1.9.2, JRuby 1.4-1.5, -Rubinius 1.0, and is probably compatible with other Rubies as well. +Babosa is tested to be compatible with Ruby 1.8.6-1.9.2, JRuby 1.4-1.5, and +Rubinius 1.0.x. It's probably compatible with other Rubies as well. ## Reporting bugs @@ -100,11 +136,13 @@ Please use Babosa's [Github issue tracker](http://github.com/norman/babosa/issue ## Contributors +* [Molte Emil Strange Andersen](http://github.com/molte) - Danish support * [Milan Dobrota](http://github.com/milandobrota) - Serbian support ## Changelog +* 0.2.0 - Added support for Danish. Added method to generate Ruby identifiers. Improved performance. * 0.1.1 - Added support for Serbian. * 0.1.0 - Initial extraction from FriendlyId. @@ -112,12 +150,12 @@ Please use Babosa's [Github issue tracker](http://github.com/norman/babosa/issue Copyright (c) 2010 Norman Clarke -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. diff --git a/Rakefile b/Rakefile index 8be69cb..b261fc0 100644 --- a/Rakefile +++ b/Rakefile @@ -1,3 +1,4 @@ +require "rubygems" require "rake/testtask" require "rake/clean" require "rake/gempackagetask" diff --git a/lib/babosa/identifier.rb b/lib/babosa/identifier.rb index 8220ee4..08f88c4 100644 --- a/lib/babosa/identifier.rb +++ b/lib/babosa/identifier.rb @@ -137,8 +137,21 @@ def normalize!(options = nil) end # Normalize a string so that it can safely be used as a Ruby method name. - def to_ruby_method! - normalize!(:to_ascii => true, :separator => "_") + def to_ruby_method!(allow_bangs = true) + leader, trailer = @wrapped_string.strip.scan(/\A(.+)(.)\z/).flatten + if allow_bangs + trailer.downcase.gsub!(/[^a-z0-9!=\\\\?]/, '') + else + trailer.downcase.gsub!(/[^a-z0-9]/, '') + end + id = leader.to_identifier + id.transliterate! + id.to_ascii! + id.clean! + id.word_chars! + id.clean! + @wrapped_string = id.to_s + trailer + with_separators!("_") end # Delete any non-ascii characters. diff --git a/lib/babosa/utf8/java_proxy.rb b/lib/babosa/utf8/java_proxy.rb index e3c807d..7b48569 100644 --- a/lib/babosa/utf8/java_proxy.rb +++ b/lib/babosa/utf8/java_proxy.rb @@ -6,7 +6,7 @@ module UTF8 module JavaProxy extend UTF8Proxy extend self - import java.text.Normalizer + java_import java.text.Normalizer def downcase(string) string.to_java.to_lower_case.to_s diff --git a/test/babosa_test.rb b/test/babosa_test.rb index ef855a1..cba2321 100644 --- a/test/babosa_test.rb +++ b/test/babosa_test.rb @@ -181,7 +181,9 @@ class BabosaTest < Test::Unit::TestCase end test "should get a string suitable for use as a ruby method" do - ss = "カタカナ: katakana is über cool".to_identifier - assert_equal "katakana_is_uber_cool", ss.to_ruby_method! + assert_equal "hello_world?", "¿¿¿hello... world???".to_slug.to_ruby_method! + assert_equal "katakana_is_uber_cool", "カタカナ: katakana is über cool".to_slug.to_ruby_method! + assert_equal "katakana_is_uber_cool!", "カタカナ: katakana is über cool!".to_slug.to_ruby_method! + assert_equal "katakana_is_uber_cool", "カタカナ: katakana is über cool".to_slug.to_ruby_method!(false) end end