Skip to content
Tool that tries to guess a person's gender based on their name and location
C Python
Branch: master
Clone or download
Pull request Compare This branch is 11 commits ahead, 5 commits behind tue-mdse:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
0717-182
nameLists
python-nameparser @ 9f11f41
unidecode @ 9b6eb6f
.gitignore
.gitmodules
dictUtils.py
env.env
filters.py
genderComputer.py
genderc_python.py
nameUtils.py
readme.md
test.py
testSuites.py
unicodeMagic.py

readme.md

Gender computer

===

Python tool that tries to infer a person's gender from their name (mostly first name) and location (country). For example, Andrea is a first name typically used by men in Italy and women in Germany, while Bogdan is a first name typically used by men irrespective of the country. Similarly, a Russian person called Anna Akhmatova is more than likely a woman because of the -ova suffix.

Quick example

git clone git@github.com:joshterrell805-historic/genderComputer.git
cd genderComputer
git submodule init
cd python-nameparser
python2 setup.py build
cd ../unidecode
python2 setup.py build
cd ..
source env.env
python2 test.py <username1> <username2> ... <usernameN>
python2 test.py josh ruth marie mike sam kate betty robin

Data provenance

The tool uses lists of male and female first names for different countries. Whenever available, the data came from national statistics institutes and was accompanied by frequency information. See this list for details about the source of data for each country.

The tool also uses the database of first names from all around the world provided together with gender.c, an open source C program for name-based gender inference (http://www.heise.de/ct/ftp/07/17/182/). We transform the database (i.e., the nam_dict.txt file shipped together with gender.c; see the archive on http://www.heise.de/ct/ftp/07/17/182/) into a Python dictionary using the genderc_python.py script.

Dependencies

Usage

To use the tool simply create a new GenderComputer object and call the resolveGender method on a (name, country) tuple:

from genderComputer import GenderComputer
gc = GenderComputer(os.path.abspath('./nameLists'))

print gc.resolveGender('Alexei Matrosov', 'Russia')
> male

print gc.resolveGender('Matrosov Alexei', 'Russia')
> male

print gc.resolveGender('Bogdan', None)
> male

print gc.resolveGender('w35l3y', 'Brazil')
> male

print gc.resolveGender('Ashley Maher', 'Australia')
> female

The tool works well for clean names, but may produce unexpected results otherwise:

print gc.resolveGender('jasondavis', 'USA')
> None

print gc.resolveGender('aix', None)
> female

Reporting bugs

Please use the Issue Tracker for reporting bugs and feature requests.

Patches, bug fixes etc are welcome. Please fork the repository and create a pull request when done fixing/implementing the new feature.

Licenses

You can’t perform that action at this time.