Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Replace check method to see if a string is numeric
The old code used the very correct Float(word) method to see if a string was numeric. This works reliable with all sorts of edge-case data but it is very slow. Since we have already parsed out a lot of possibilities during word atomisation (e.g. decimal numbers like 123.45 have already been split into "123" and "45") we do not need this level of "dealing with edge" case surety. Therefore we can just do a simple regex check to see if the string is all numerals or not. In tests on 1000 emails (Single threaded) the run-time was reduced from 2.4 seconds to 1.4 seconds. Since we have traded edge-case reliability for speed we can no longer leave this as a String class monkey-patch so move it into a method that will only be called by Ankusa itself
- Loading branch information
Showing
with
27 additions
and 15 deletions.
- +0 −4 lib/ankusa/extensions.rb
- +12 −3 lib/ankusa/hasher.rb
- +15 −8 test/hasher_test.rb
@@ -1,25 +1,32 @@ | ||
require File.join File.dirname(__FILE__), 'helper' | ||
|
||
class HasherTest < Test::Unit::TestCase | ||
def setup | ||
|
||
def test_stemming | ||
string = "Words word a the at fish fishing fishes? /^/ The at a of! @#$!" | ||
@text_hash = Ankusa::TextHash.new string | ||
@array = Ankusa::TextHash.new [string] | ||
end | ||
|
||
def test_stemming | ||
assert_equal @text_hash.length, 2 | ||
assert_equal @text_hash.word_count, 5 | ||
|
||
assert_equal @array.length, 2 | ||
assert_equal @array.word_count, 5 | ||
end | ||
|
||
def test_atomization | ||
string = "Hello 123,45 My-name! is Robot14 123.45 @#$!" | ||
@array = Ankusa::TextHash.atomize string | ||
|
||
assert_equal %w{hello 123 45 my name is robot14 123 45}, @array | ||
end | ||
|
||
def test_valid_word | ||
assert (not Ankusa::TextHash.valid_word? "accordingly") | ||
assert (not Ankusa::TextHash.valid_word? "appropriate") | ||
assert Ankusa::TextHash.valid_word? "^*&@" | ||
assert Ankusa::TextHash.valid_word? "mother" | ||
assert (not Ankusa::TextHash.valid_word? "21675") | ||
assert !Ankusa::TextHash.valid_word?("accordingly") | ||
assert !Ankusa::TextHash.valid_word?("appropriate") | ||
assert Ankusa::TextHash.valid_word?("^*&@") | ||
assert Ankusa::TextHash.valid_word?("mother") | ||
assert !Ankusa::TextHash.valid_word?("21675") | ||
assert !Ankusa::TextHash.valid_word?("00000") | ||
end | ||
end |