Gender Asymmetries in Wikipedia
This folder contains Jupyter Notebooks (using Python 3.4) that perform notability, lexical, and network analysis on biographies present in DBpedia and Wikidata, according to gender.
Install the required libraries in
requirements.txt (you can use pip).
You also need:
- DBpedia Utils to iterate over DBpedia data (this is mandatory).
- matta to generate word clouds of lexical analysis results (optional).
Running the Notebooks
First, you need to edit the
dbpedia_config.py file. This is the current content of the file:
# The DBpedia editions we will consider MAIN_LANGUAGE = 'en' LANGUAGES = 'en|bg|ca|cs|de|es|eu|fr|hu|id|it|ja|ko|nl|pl|pt|ru|tr|ar|el'.split('|') # Where are we going to download the data files DATA_FOLDER = '/home/egraells/resources/dbpedia' # Folder to store analysis results TARGET_FOLDER = '/home/egraells/phd/notebooks/pajaritos/person_results' # This is used when crawling WikiData. QUERY_WIKIDATA_GENDER = False YOUR_EMAIL = 'email@example.com'
Its content will change which data is downloaded and analyzed by the notebooks. The notability notebooks take care of downloading and consolidating data.
You should start with the notebooks prefixed with Notability. Optionally, you can run the PreProcess notebooks after downloading the source files from DBpedia.
The notebook code in this folder was written by Eduardo Graells-Garrido. The notability analysis is original for a paper titled "Women Through the Glass-Ceiling: Gender Asymmetries in Wikipedia" with Claudia Wagner, David García and Filippo Menczer. Part of the lexical and network analysis were originally from a paper titled "First Women, Second Sex: Gender Bias on Wikipedia" with Mounia Lalmas and Fil Menczer, and were extended for the former paper in these notebooks.
The DBpedia version used currently in these files is 2015-10. Note that the version used on the paper is DBpedia 2014.