Gender / Novels
Analysis of Gender and Gender Relations in English-Language Novels, 1770-1922
This research project concerns the depiction of gender in historical English language novels, exploring how authors of various backgrounds and experiences described gender in their works.
Currently, we have analyzed a corpus of over 4,200 books from Project Gutenberg, an online book repository, utilizing programming methods we developed. Among our findings, we discovered the ratio of male pronouns to female pronouns, the most common words after male and female pronouns, and the distance between repetitions of male and female pronouns.
As of Summer 2019, the work on this project has been forked into two repos:
- The website presenting our research: https://github.com/dhmit/gender_novels_site
- The Gender Analysis Toolkit, https://github.com/dhmit/gender_analysis
If you would like to contribute to this project, please check out one of those follow-on projects!
To use our tools or contribute to the project, please view our guide to contributing,
CONTRIBUTING.md. It includes information on how to install the tools we used as well as style guidelines for adding code. We are open to contributions and would love to see other people’s ideas, thoughts, and additions to this project, so feel free to leave comments or make a pull request!
Navigating Gender / Novels
For anybody who wants to use our code, here’s a little outline of where everything is.
gender_novels/gender_novels folder, there are six folders:
analysis— programming files focused on textual analysis and research write-ups, including data visualizations and conclusions
corpora— metadata information on each book (including author, title, publication year, etc.), including sample data sets and instructions for generating a Gutenberg mirror
deployment— this directory contains code for the original Gender/Novels website. This has now been forked and replaced with https://github.com/dhmit/gender_novels_site; we only maintain this code here for historical reasons.
pickle_data— pickled data for various analyses to avoid running time-consuming computation
testing— files for code tests
tutorials— tutorials used by the lab to learn about various technical subjects needed to complete this project
For a user who’ll need some readily available methods for analyzing documents, the files you’ll most likely want are
novel.py. These include methods used for loading and analyzing texts from the corpora. If you’d like to generate your own corpus rather than use the one provided in the repo, you’ll want to use
corpus_gen.py. If you’d only like a specific part of our corpus, the method
get_subcorpus() may be useful.
This document was prepared by the MIT Digital Humanities Lab.