Skip to content
This repository has been archived by the owner on Oct 20, 2019. It is now read-only.

Commit

Permalink
merged pdg/master into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
Dominik Liebler committed Feb 2, 2013
2 parents 8b13a08 + 9c974ac commit c505818
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 3 deletions.
1 change: 1 addition & 0 deletions History.txt
@@ -1,6 +1,7 @@
== 0.6.0 / 2013-02-02

* added per-language support for black- and whitelists (thanks to bobjflong)
* fixed tokenization for UTF-8 strings (this is broken in Ruby 1.8.x!) (thanks to pdg)

== 0.5.2 / 2012-02-25

Expand Down
6 changes: 5 additions & 1 deletion lib/highscore/content.rb
Expand Up @@ -33,9 +33,13 @@ def initialize(content, wordlist = nil)
:consonants => 0,
:ignore_short_words => true,
:ignore_case => false,
:word_pattern => /\w+/,
:word_pattern => /\p{Word}+/u,
:stemming => false
}

if RUBY_VERSION =~ /^1\.8/
@emphasis[:word_pattern] = /\w+/
end
end

# configure ranking
Expand Down
22 changes: 20 additions & 2 deletions test/highscore/test_content.rb
@@ -1,4 +1,8 @@
require File.dirname(__FILE__) + '/../test_highscore'
# encoding: utf-8
$:.unshift(File.join(File.dirname(__FILE__), %w{.. .. lib highscore}))
require "content"
require "test/unit"
require 'rubygems'

class TestContent < Highscore::TestCase
def setup
Expand Down Expand Up @@ -29,6 +33,19 @@ def test_keywords_fixnum
assert_equal 1, content.keywords.length
end

def test_keywords_utf8
content = 'Schöne Grüße, caractères, русский'

content = Highscore::Content.new content

if RUBY_VERSION =~ /^1\.8/
# Ruby 1.8 doesn't support correct tokenization
assert_equal 3, content.keywords.length
else
assert_equal 4, content.keywords.length
end
end

def test_vowels_and_consonants
keywords = 'foobar RubyGems'.keywords do
set :vowels, 2
Expand Down Expand Up @@ -109,4 +126,5 @@ def test_language_english
def test_language_german
assert_equal :german, Highscore::Content.new("Das ist sicherlich ein deutscher Text!").language
end
end
end

0 comments on commit c505818

Please sign in to comment.