Skip to content

alyssafrazee/github_analysis

Repository files navigation

gender of GitHub repo owners


This repository contains the code and data used to analyze the gender breakdown of owners of public GitHub repositories. I wrote a blog post about what I found.

To reproduce the analysis, run scripts in the following order:

  • get_github_info_byday.py: uses the GitHub API to scrape repository data. (nb: This will take something like 60 hours to run).
  • merge_files.sh: puts all scraped data together in a big text file
  • make_database.R: dumps the scraped data into a SQLite database
  • analyze_data.py: processes the data
  • bargraph.js: JavaScript/D3 code used to make the graphic showing the results. Alex Wilson made major contributions to this code.

data

The data I scraped in get_github_info_byday.py and processed with merge_files.sh and make_database.R is available in a .db file here. I removed all repo owner last names.

dependencies

Python libraries: PyGithub, Unidecode, Pandas, SexMachine, Matplotlib. Make sure these are installed before running scripts. See requirements.txt for a more detailed specification of Python dependencies, including versions.

R packages: devtools, proto, DBI, chron, RSQLite, and RSQLITE.extfuns. All can be installed from CRAN in R using the install.packages function.

About

code for analysis of github repo ownership and gender

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published