# Evaluate `genderize.io` 

In [2]:
from genderize import Genderize
import pandas as pd
from evaluators import GenderizeIoEvaluator

### Can it handle surnames?

In [2]:
results = Genderize().get(['Hans Joachim Schmidt', 'Anna Meier'])

In [3]:
print(results)

[{'name': 'Hans Joachim Schmidt', 'gender': None}, {'name': 'Anna Meier', 'gender': None}]


### Double names (where the order matters)

In [4]:
results = Genderize().get(['Hans Joachim', 'Hans-Joachim', 'Maria-José', 'José Maria', 'Jose Maria', 
                           'José-Maria', 'Josémaria', 'theo c. m'])

In [5]:
for r in results:
    print(r)

{'name': 'Hans Joachim', 'gender': None}
{'name': 'Hans-Joachim', 'probability': 1.0, 'gender': 'male', 'count': 1}
{'name': 'Maria-José', 'probability': 1.0, 'gender': 'female', 'count': 2}
{'name': 'José Maria', 'probability': 1.0, 'gender': 'male', 'count': 3}
{'name': 'Jose Maria', 'probability': 0.99, 'gender': 'male', 'count': 125}
{'name': 'José-Maria', 'gender': None}
{'name': 'Josémaria', 'gender': None}
{'name': 'theo c. m', 'gender': None}


The examples show that the API: 

* accepts double names
* is sensitive towards non-letter characters such as '-' or ' ' (cf. `Hans Joachim` and `Hans-Joachim`)
* works fine with non-ASCII characters (e.g. `é`)
* is sensitive towards accents (cf. `José Maria` and `Jose Maria`)

### Names with different gender depending on ethnicity

In [6]:
results = Genderize().get(['Nicola', 'Andrea', 'Alex', 'Mika', 'Addison', 'Ash', 'Dakota'])

In [7]:
for r in results:
    print(r)

{'name': 'Nicola', 'probability': 0.71, 'gender': 'female', 'count': 1226}
{'name': 'Andrea', 'probability': 0.79, 'gender': 'female', 'count': 5794}
{'name': 'Alex', 'probability': 0.87, 'gender': 'male', 'count': 5856}
{'name': 'Mika', 'probability': 0.51, 'gender': 'male', 'count': 182}
{'name': 'Addison', 'probability': 0.64, 'gender': 'male', 'count': 11}
{'name': 'Ash', 'probability': 0.56, 'gender': 'male', 'count': 243}
{'name': 'Dakota', 'probability': 0.75, 'gender': 'male', 'count': 139}


These examples show that:

* names like `Andrea` or `Nicola` where the gender is highly country-specific have a higher score than common unisex names like `Mika` or `Ash`
* Alex is a nickname for either Alexander or Alexandra and is one of the most evenly divided gender-neutral names. Its probability value here is quite high with 0.87

### Check for nonsense words

In [8]:
results = Genderize().get(['the', 'a', 'with', 'an', 'I', 'my'])

In [9]:
for r in results:
    print(r)

{'name': 'the', 'probability': 1.0, 'gender': 'female', 'count': 1}
{'name': 'a', 'probability': 0.59, 'gender': 'male', 'count': 56}
{'name': 'with', 'gender': None}
{'name': 'an', 'probability': 0.83, 'gender': 'female', 'count': 170}
{'name': 'I', 'gender': None}
{'name': 'my', 'probability': 0.73, 'gender': 'female', 'count': 44}


Not every word which gets a gender assigned is a name. This is due to the fact that such words are sometimes part of social media names, and this is what the API is based on.

### Capital letters

In [2]:
results = Genderize().get(['pierre', 'Pierre'])

In [3]:
for r in results:
    print(r)

{'count': 852, 'name': 'pierre', 'probability': 0.99, 'gender': 'male'}
{'count': 852, 'name': 'Pierre', 'probability': 0.99, 'gender': 'male'}
