Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
Lexical Analysis - 01 - Process English Overviews.ipynb
Lexical Analysis - 02 PMI of Common Vocabulary.ipynb
Lexical Analysis - 03 Annotated PMI keywords with Categories.ipynb
Network - 01 - Biography Connectivity.ipynb
Notability - 01 - Download DBPedia and Gender Data.ipynb
Notability - 02 - Generate Person Data.ipynb
Notability - 03 - Consolidate Person Data.ipynb
Notability - 04 - Is there a glass ceiling?.ipynb
PreProcess - Countries per Location.ipynb


Gender Asymmetries in Wikipedia

This folder contains Jupyter Notebooks (using Python 3.4) that perform notability, lexical, and network analysis on biographies present in DBpedia and Wikidata, according to gender.


Install the required libraries in requirements.txt (you can use pip). You also need:

  • DBpedia Utils to iterate over DBpedia data (this is mandatory).
  • matta to generate word clouds of lexical analysis results (optional).

Running the Notebooks

First, you need to edit the dbpedia_config.py file. This is the current content of the file:

# The DBpedia editions we will consider
LANGUAGES = 'en|bg|ca|cs|de|es|eu|fr|hu|id|it|ja|ko|nl|pl|pt|ru|tr|ar|el'.split('|')

# Where are we going to download the data files
DATA_FOLDER = '/home/egraells/resources/dbpedia'

# Folder to store analysis results
TARGET_FOLDER = '/home/egraells/phd/notebooks/pajaritos/person_results'

# This is used when crawling WikiData.
YOUR_EMAIL = 'mail@example.com'

Its content will change which data is downloaded and analyzed by the notebooks. The notability notebooks take care of downloading and consolidating data.

You should start with the notebooks prefixed with Notability. Optionally, you can run the PreProcess notebooks after downloading the source files from DBpedia.


The notebook code in this folder was written by Eduardo Graells-Garrido. The notability analysis is original for a paper titled "Women Through the Glass-Ceiling: Gender Asymmetries in Wikipedia" with Claudia Wagner, David García and Filippo Menczer. Part of the lexical and network analysis were originally from a paper titled "First Women, Second Sex: Gender Bias on Wikipedia" with Mounia Lalmas and Fil Menczer, and were extended for the former paper in these notebooks.

The DBpedia version used currently in these files is 2015-10. Note that the version used on the paper is DBpedia 2014.