# Author's Genders

Return to the [index](https://github.com/Nkluge-correa/worldwide_AI-ethics).

**Use this notebook to infer the gender of all authors by using their first name and the [Genderize.io API](https://genderize.io/). Currently (February 25, 2023), no API key is required for less than 1000 requests/day.**

In [1]:
import pandas as pd

df = pd.read_parquet('data/authors_names')

display(df)

Unnamed: 0,name,country,url,code,authors
0,An Open Letter to the Global South: Bring the ...,Brazil,https://en.airespucrs.org/c%C3%B3pia-carta-aberta,BR,"Nicholas,Diogo,Lara,Carolina,Guilherme,Camila,..."
1,Intel’s AI Privacy Policy White Paper: Protect...,United States,https://www.intel.com/content/dam/www/public/u...,US,"David,Riccardo"
2,Everyday Ethics for Artificial Intelligence,United States,https://www.ibm.com/watson/assets/duo/pdf/ever...,US,"Francesca,Noah,Almas"
3,Responsible AI: A Global Policy Framework,United States,https://www.itechlaw.org/ResponsibleAI2021,US,"Charles,John,Susan,Christian,Michael,Khalid,Ri..."
4,Responsible AI: A Global Policy Framework,India,https://www.itechlaw.org/ResponsibleAI2021,IN,"Nikhil,Smriti"
...,...,...,...,...,...
117,"""ARCC"": An Ethical Framework for Artificial In...",China,https://www.tisi.org/13747,CN,Pony
118,Unified Ethical Frame for Big Data Analysis,United States,https://bigdata.fpf.org/wp-content/uploads/201...,US,"Martin,Paula,Jennifer,Lynn,Barbara,Miranda,Art..."
119,Human rights in the robot age: Challenges aris...,Netherlands,https://www.rathenau.nl/sites/default/files/20...,NL,Joost
120,Human rights in the robot age: Challenges aris...,United States,https://www.rathenau.nl/sites/default/files/20...,US,Linda


In [6]:
import re
import json
from unidecode import unidecode
from urllib.request import urlopen

def clean_names(input_data):
    clean_text = input_data.lower().replace("<br />", " ")
    clean_text = re.sub(r"[-()\"#/@;:<>{}=~|.?]", ' ', clean_text)
    clean_text = re.sub(' +', ' ', clean_text)
    return unidecode(clean_text)

df.authors = df.authors.apply(clean_names)

documents = []
authors_names = []
authors_gender = []
infered_nationality = []

for i, string in enumerate(df.authors):
    names = string.split(',')
    for name in names:

        response = urlopen(f"https://api.genderize.io?name={name}&country_id={df.code[i]}")
        decoded = response.read().decode('utf-8')

        authors_names.append(name)
        authors_gender.append(json.loads(decoded)['gender'])
        documents.append(df.name[i])
        infered_nationality.append(df.code[i])
        
gender_df = pd.DataFrame({'nationality': infered_nationality,
                            'authors' : authors_names,
                            'gender': authors_gender,
                            'document' : documents})

gender_df.to_parquet('authors_gender_df', compression='gzip')

display(gender_df)

print(f'Unique names found: {len(gender_df.authors.unique())}')
print(f'Unique nationalies infered: {len(gender_df.nationality.unique())}')

Unnamed: 0,nationality,authors,gender,document
0,BR,nicholas,male,An Open Letter to the Global South: Bring the ...
1,BR,diogo,male,An Open Letter to the Global South: Bring the ...
2,BR,lara,female,An Open Letter to the Global South: Bring the ...
3,BR,carolina,female,An Open Letter to the Global South: Bring the ...
4,BR,guilherme,male,An Open Letter to the Global South: Bring the ...
...,...,...,...,...
825,US,susan,female,Unified Ethical Frame for Big Data Analysis
826,US,nick,male,Unified Ethical Frame for Big Data Analysis
827,NL,joost,male,Human rights in the robot age: Challenges aris...
828,US,linda,female,Human rights in the robot age: Challenges aris...


Unique names found: 558
Unique nationalies infered: 36


In [2]:
import pandas as pd

gender_df = pd.read_parquet('data/authors_gender_df')

documents = []
male_authors = []
female_authors = []

for document in gender_df.document.unique():

    temp_df = gender_df[gender_df['document'] == document]

    if 'male' in list(temp_df.gender):
        male_authors.append(temp_df.gender.value_counts()['male'])
    else:
        male_authors.append(0)

    if 'female' in list(temp_df.gender):
        female_authors.append(temp_df.gender.value_counts()['female'])
    else:
        female_authors.append(0)

    documents.append(document)


final_gender_df = pd.DataFrame({'document': documents,
                            'male_authors' : male_authors,
                            'female_authors': female_authors})

final_gender_df.to_parquet('data/final_gender_df',  compression='gzip')
display(final_gender_df)

print(f'Male Authors: {final_gender_df.male_authors.sum()}')
print(f'Female Authors {final_gender_df.female_authors.sum()}')

Unnamed: 0,document,male_authors,female_authors
0,An Open Letter to the Global South: Bring the ...,13,5
1,Intel’s AI Privacy Policy White Paper: Protect...,2,0
2,Everyday Ethics for Artificial Intelligence,1,2
3,Responsible AI: A Global Policy Framework,32,18
4,KI Seal of Approval,9,1
...,...,...,...
63,Trustworthy AI in Aotearoa: AI Principles,1,1
64,Harmonious Artificial Intelligence Principles,1,0
65,"""ARCC"": An Ethical Framework for Artificial In...",2,0
66,Unified Ethical Frame for Big Data Analysis,4,7


Male Authors: 549
Female Authors 281


---

Return to the [index](https://github.com/Nkluge-correa/worldwide_AI-ethics).