# Evaluate Namsor

http://www.namsor.com/

Determine the gender of a personal name on a -1 (Male) to +1 (Female) scale. Automatically recognizes which culture to apply when assessing the gender of a personal name. Some examples: “Andrea Rossini” is most likely an Italian name and a male name, whereas “Andrea Parker” is most likely an anglosaxon name and a female name; 声涛周 is most likely male ; “O. Sokolova” is most likely female.

In [1]:
import requests
import pandas as pd

In [2]:
from hammock import Hammock as NamsorAPI

In [61]:
def fetch_from_namsor(name):
    """
    Fetch the NamSor API: https://api.namsor.com/namsor/faces/viewapikey.xhtml
    
    """
    # Namsor takes names that are already properly split on fore- and surname
    if not isinstance(name, tuple):
        raise Exception('When calling NamSor, name must be a tuple')
    else:
        forename, surname = name
    
    namsor = NamsorAPI("http://api.namsor.com/onomastics/api/json/gender")
    resp = namsor(forename, surname).GET()

    return resp.json()

In [80]:
first_name = 'jose'
middle_name = 'maria'
surname = 'garcia'
connectors = ['', ' ', '-']
names = [first_name + c + middle_name for c in connectors]
api_resp = [fetch_from_namsor((name, surname)) for name in names]
api_resp_genders = set([r['gender'] for r in api_resp])

In [82]:
api_resp

[{'firstName': 'josemaria',
  'gender': 'male',
  'id': '1513784211732',
  'lastName': 'garcia',
  'scale': -0.71},
 {'firstName': 'jose maria',
  'gender': 'male',
  'id': '1513784212705',
  'lastName': 'garcia',
  'scale': -0.43},
 {'firstName': 'jose-maria',
  'gender': 'male',
  'id': '1513784213639',
  'lastName': 'garcia',
  'scale': -0.71}]

In [91]:
sorted([x for x in api_resp], key=lambda x: abs(x['scale']), reverse=True)

[{'firstName': 'josemaria',
  'gender': 'male',
  'id': '1513784211732',
  'lastName': 'garcia',
  'scale': -0.71},
 {'firstName': 'jose-maria',
  'gender': 'male',
  'id': '1513784213639',
  'lastName': 'garcia',
  'scale': -0.71},
 {'firstName': 'jose maria',
  'gender': 'male',
  'id': '1513784212705',
  'lastName': 'garcia',
  'scale': -0.43}]

In [93]:
max(api_resp, key=lambda x: abs(x['scale']))

{'firstName': 'josemaria',
 'gender': 'male',
 'id': '1513784211732',
 'lastName': 'garcia',
 'scale': -0.71}

In [None]:

                    if 'male' not in api_resp_genders and 'female' not in api_resp_genders:
                        # If no gender with both names is found, use first name only
                        data = self._call_api((row.first_name, row.surname))
                    else:  
                        # if usage of middle name leads to female or male then take response with highest confidence
                        # confidence in NamSor is absolute value of scale
                        self.api_response.append(sorted(api_resp, key=lambda x: x['confidence'], reverse=True)[0])

### Can it handle surnames?

In [63]:
print(fetch_from_namsor(('Samir', 'Amin')))

{'scale': -1.0, 'gender': 'male', 'firstName': 'Samir', 'lastName': 'Amin', 'id': '1513777009279'}


In [64]:
print(fetch_from_namsor(('Samir Amin', 'dummy')))

{'scale': -0.99, 'gender': 'male', 'firstName': 'Samir Amin', 'lastName': 'dummy', 'id': '1513777042735'}


By design, NamSor takes names split in fore- and surname. If one sends the full name in the place of the forename, it also resolves the gender, though that's not the intended use of the API.

### Does it know about geolocation of names?

In [66]:
print(fetch_from_namsor(('Andrea', 'Schmidt')))

{'scale': 1.0, 'gender': 'female', 'firstName': 'Andrea', 'lastName': 'Schmidt', 'id': '1513777146662'}


In [67]:
print(fetch_from_namsor(('Andrea', 'Bocelli')))

{'scale': -0.5, 'gender': 'male', 'firstName': 'Andrea', 'lastName': 'Bocelli', 'id': '1513777164050'}


In [68]:
print(fetch_from_namsor(('Rosario', 'Giordano')))

{'scale': -0.5, 'gender': 'male', 'firstName': 'Rosario', 'lastName': 'Giordano', 'id': '1513777178010'}


In [69]:
print(fetch_from_namsor(('Rosario', 'González')))

{'scale': 1.0, 'gender': 'female', 'firstName': 'Rosario', 'lastName': 'González', 'id': '1513777197339'}


Yes, it does use the surname for the gender assignment, introducing info about the origin of the name.

### Double names (where the order matters)

In [70]:
names = ['Hans Joachim', 'Hans-Joachim', 'Maria-José', 'José Maria', 'Jose Maria', 'José-Maria', 'Josémaria', 
         'theo c. m']

In [71]:
for n in names:
    print(n), print(fetch_from_namsor((n, 'dummy')))

Hans Joachim
{'scale': -1.0, 'gender': 'male', 'firstName': 'Hans Joachim', 'lastName': 'dummy', 'id': '1513777262865'}
Hans-Joachim
{'scale': -1.0, 'gender': 'male', 'firstName': 'Hans-Joachim', 'lastName': 'dummy', 'id': '1513777263892'}
Maria-José
{'scale': 1.0, 'gender': 'female', 'firstName': 'Maria-José', 'lastName': 'dummy', 'id': '1513777264915'}
José Maria
{'scale': -0.46, 'gender': 'male', 'firstName': 'José Maria', 'lastName': 'dummy', 'id': '1513777265888'}
Jose Maria
{'scale': -0.46, 'gender': 'male', 'firstName': 'Jose Maria', 'lastName': 'dummy', 'id': '1513777266864'}
José-Maria
{'scale': -0.5, 'gender': 'male', 'firstName': 'José-Maria', 'lastName': 'dummy', 'id': '1513777267765'}
Josémaria
{'scale': -0.5, 'gender': 'male', 'firstName': 'Josémaria', 'lastName': 'dummy', 'id': '1513777268685'}
theo c. m
{'scale': -0.98, 'gender': 'male', 'firstName': 'theo c. m', 'lastName': 'dummy', 'id': '1513777269730'}


Correctly guessed in all cases, but lower confidence for weird names like Jose Maria.

### Names with different gender depending on ethnicity

In [72]:
names = ['Nicola', 'Andrea', 'Alex', 'Mika', 'Addison', 'Ash', 'Dakota']

In [73]:
for n in names:
    print(n), print(fetch_from_namsor((n, 'dummy')))

Nicola
{'scale': -0.46, 'gender': 'male', 'firstName': 'Nicola', 'lastName': 'dummy', 'id': '1513777979586'}
Andrea
{'scale': -0.1, 'gender': 'unknown', 'firstName': 'Andrea', 'lastName': 'dummy', 'id': '1513777980505'}
Alex
{'scale': -0.99, 'gender': 'male', 'firstName': 'Alex', 'lastName': 'dummy', 'id': '1513777981424'}
Mika
{'scale': 0.73, 'gender': 'female', 'firstName': 'Mika', 'lastName': 'dummy', 'id': '1513777982455'}
Addison
{'scale': 0.37, 'gender': 'female', 'firstName': 'Addison', 'lastName': 'dummy', 'id': '1513777983445'}
Ash
{'scale': -0.96, 'gender': 'male', 'firstName': 'Ash', 'lastName': 'dummy', 'id': '1513777984434'}
Dakota
{'scale': -0.26, 'gender': 'male', 'firstName': 'Dakota', 'lastName': 'dummy', 'id': '1513777985428'}


* Nicola has low confidence as male 
* Andrea marked as unknown. With surname, it'd acquire a gender
* Addison and Dakota are closer to 0 (which means they're close to neutral)

### Check for nonsense words

In [75]:
names = ['the', 'a', 'with', 'an', 'I', 'my']

In [76]:
for n in names:
    print(n), print(fetch_from_namsor((n, 'dummy')))

the
{'scale': -0.51, 'gender': 'male', 'firstName': 'the', 'lastName': 'dummy', 'id': '1513778206469'}
a
{'scale': 0.0, 'gender': 'unknown', 'firstName': 'a', 'lastName': 'dummy', 'id': '1513778207381'}
with
{'scale': -0.5, 'gender': 'male', 'firstName': 'with', 'lastName': 'dummy', 'id': '1513778208461'}
an
{'scale': 0.2, 'gender': 'female', 'firstName': 'an', 'lastName': 'dummy', 'id': '1513778209484'}
I
{'scale': 0.0, 'gender': 'unknown', 'firstName': 'I', 'lastName': 'dummy', 'id': '1513778210399'}
my
{'scale': 0.67, 'gender': 'female', 'firstName': 'my', 'lastName': 'dummy', 'id': '1513778211426'}


Some nonsense words are recognised as names, but with lower confindence (scale close to 0).

### Capital letters

In [77]:
names = ['pierre', 'Pierre', 'paul', 'Paul']

In [78]:
for n in names:
    print(n), print(fetch_from_namsor((n, 'dummy')))

pierre
{'scale': -1.0, 'gender': 'male', 'firstName': 'pierre', 'lastName': 'dummy', 'id': '1513778258896'}
Pierre
{'scale': -1.0, 'gender': 'male', 'firstName': 'Pierre', 'lastName': 'dummy', 'id': '1513778259916'}
paul
{'scale': -1.0, 'gender': 'male', 'firstName': 'paul', 'lastName': 'dummy', 'id': '1513778260831'}
Paul
{'scale': -1.0, 'gender': 'male', 'firstName': 'Paul', 'lastName': 'dummy', 'id': '1513778261759'}


No influence of capitalization, as it should