Skip to content

Commit

Permalink
Added support for per-country requests, bumped version
Browse files Browse the repository at this point in the history
  • Loading branch information
Brian Muller committed Jul 25, 2012
1 parent 9f766ac commit 530127f
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 19 deletions.
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
sexmachine (0.0.2)
sexmachine (0.0.4)

GEM
remote: http://rubygems.org/
Expand Down
11 changes: 10 additions & 1 deletion README.rdoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,22 @@ This gem uses the underlying data from the program "gender" by Jorg Michael (des
>> d.get_gender("Sally")
:female
>> d.get_gender("Pauley") # should be androgynous
:andy
:andy

The result will be one of andy (androgynous), male, female, mostly_male, or mostly_female. Any unknown names are considered andies.

I18N is fully supported:

>> d.get_gender("Álfrún")
:female

Additionally, you can give preference to specific countries:

>> d.get_gender("Jamie")
=> :female
>> d.get_gender("Jamie", :great_britain)
=> :mostly_male

If you have an alterative data file, you can pass that in as an optional argument to the Detector.

Try to avoid creating many Detectors, as each creation means reading in the data file.
Expand Down
66 changes: 50 additions & 16 deletions lib/sexmachine/detector.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
module SexMachine

class Detector
COUNTRIES = [ :great_britain, :ireland, :usa, :italy, :malta, :portugal, :spain, :france, :belgium, :luxembourg, :the_netherlands, :east_frisia,
:germany, :austria, :swiss, :iceland, :denmark, :norway, :sweden, :finland, :estonia, :latvia, :lithuania, :poland, :czech_republic,
:slovakia, :hungary, :romania, :bulgaria, :bosniaand, :croatia, :kosovo, :macedonia, :montenegro, :serbia, :slovenia, :albania,
:greece, :russia, :belarus, :moldova, :ukraine, :armenia, :azerbaijan, :georgia, :the_stans, :turkey, :arabia, :israel, :china,
:india, :japan, :korea, :vietnam, :other_countries ]

def initialize(fname=nil)
fname ||= File.expand_path('../data/nam_dict.txt', __FILE__)
parse fname
Expand All @@ -10,40 +16,68 @@ def parse(fname)
@names = {}
open(fname, "r:iso8859-1:utf-8") { |f|
f.each_line { |line|
eatNameLine line
eat_name_line line
}
}
end

def get_gender(name)
@names.fetch(name, :andy)
def get_gender(name, country = nil)
if not @names.has_key?(name)
:andy
elsif country.nil?
most_popular_gender(name) { |country_values|
country_values.split("").select { |l| l.strip != "" }.length
}
elsif COUNTRIES.include?(country)
index = COUNTRIES.index(country)
most_popular_gender(name) { |country_values|
country_values[index].ord
}
else
raise "No such country: #{country}"
end
end

private
def eatNameLine(line)
def eat_name_line(line)
return if line.start_with?("#") or line.start_with?("=")

parts = line.split(" ").select { |p| p.strip != "" }

if parts[0].include? "F"
set parts[1], :female
elsif parts[0].include? "M"
set parts[1], :male
else
set parts[1], :andy
country_values = line.slice(30, line.length)

case parts[0]
when "M" then set(parts[1], :male, country_values)
when "1M", "?M" then set(parts[1], :mostly_male, country_values)
when "F" then set(parts[1], :female, country_values)
when "1F", "?F" then set(parts[1], :mostly_female, country_values)
when "?" then set(parts[1], :andy, country_values)
else raise "Not sure what to do with a sex of #{parts[0]}"
end
end

def set(name, gender)
# go w/ first option, don't reset
return if @names.has_key? name
def most_popular_gender(name)
return :andy unless @names.has_key?(name)

max = 0
best = @names[name].keys.first
@names[name].each { |gender, country_values|
count = yield country_values
if count > max
max = count
best = gender
end
}
best
end

def set(name, gender, country_values)
if name.include? "+"
[ '', '-', ' ' ].each { |replacement|
set name.gsub("+", replacement), gender
set name.gsub("+", replacement), gender, country_values
}
else
@names[name] = gender
@names[name] ||= {}
@names[name][gender] = country_values
end
end
end
Expand Down
2 changes: 1 addition & 1 deletion lib/sexmachine/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module SexMachine
VERSION = "0.0.3"
VERSION = "0.0.4"
end

0 comments on commit 530127f

Please sign in to comment.