# Parsing gender from names with an API

Genderize.io determines the gender of a first name. Use the API for analytics, ad targeting, user segmenting etc. It utilizes big datasets of information, from user profiles across major social networks and exposes this data through its API. The response includes a certainty factor as well.

In [1]:
import pandas as pd

In [2]:
# %load ../parse_gender.py
import requests
import json


api_url_base = 'https://api.genderize.io/'

def get_gender(firstname):
    request_url = '{}?name={}'.format(api_url_base, firstname)
    response = requests.get(request_url)
    
    if response.status_code == 200:
        return (response.json())
    else:
        print('[!] HTTP {0} looking up name [{1}]'.format(response.status_code, firstname))
        return None

### Loading
The load function below will need to be abstracted to load csv's from a list of names entered. Otherwise, this is ready to be made into a script.

In [4]:
df = pd.read_csv('../recursecenter.csv')

In [7]:
df.columns = ['repo', 'username', 'contributions', 'avatar_url', 'profile_url', 'real_name']

In [8]:
df.shape

(160, 6)

In [9]:
## Split first name from real_name, look up gender

gender_list = list()

for name in df['real_name'].unique():
    if type(name) == str: # don't call on NaN values (which are float - why?)
        first_name = name.split(' ')[0]
        gender_result = get_gender(first_name)
        gender_result['real_name'] = name # add real name back to dictionary
        gender_list.append(gender_result)

In [10]:
gender_df = pd.DataFrame(gender_list)

In [11]:
# Sanity check for data size
print(gender_df.shape)
print(df.shape)

(143, 5)
(160, 6)


In [21]:
df = recurse

In [22]:
df2 = pd.merge(df, gender_df[['real_name', 'gender', 'probability']], 
               on='real_name', how='outer')
# important: outer join will preserve the NaN values for real_name that aren't in `gender`

In [23]:
df2.shape

(160, 8)

In [24]:
df2

Unnamed: 0,repo,username,contributions,avatar_url,profile_url,real_name,gender,probability
0,hs-cli,davidbalbert,2,https://avatars2.githubusercontent.com/u/12335...,https://api.github.com/users/davidbalbert,David Albert,male,1.00
1,blaggregator,davidbalbert,12,https://avatars2.githubusercontent.com/u/12335...,https://api.github.com/users/davidbalbert,David Albert,male,1.00
2,community,davidbalbert,335,https://avatars2.githubusercontent.com/u/12335...,https://api.github.com/users/davidbalbert,David Albert,male,1.00
3,proxy,davidbalbert,58,https://avatars2.githubusercontent.com/u/12335...,https://api.github.com/users/davidbalbert,David Albert,male,1.00
4,RSVPBot,davidbalbert,84,https://avatars2.githubusercontent.com/u/12335...,https://api.github.com/users/davidbalbert,David Albert,male,1.00
5,ca-tools,davidbalbert,11,https://avatars2.githubusercontent.com/u/12335...,https://api.github.com/users/davidbalbert,David Albert,male,1.00
6,webstack.jl,danielmendel,63,https://avatars3.githubusercontent.com/u/30420...,https://api.github.com/users/danielmendel,Daniel Espeset,male,1.00
7,webstack.jl,astrieanna,36,https://avatars3.githubusercontent.com/u/12053...,https://api.github.com/users/astrieanna,Leah Hanson,female,1.00
8,webstack.jl,zachallaun,19,https://avatars0.githubusercontent.com/u/50393...,https://api.github.com/users/zachallaun,Zach Allaun,male,0.99
9,blaggregator,zachallaun,1,https://avatars0.githubusercontent.com/u/50393...,https://api.github.com/users/zachallaun,Zach Allaun,male,0.99
