Skip to content

ecmonsen/gendered_words

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Gendered Words Dataset

This dataset tags English words with the natural gender of the person or type of person the word refers to.

License

The dataset is licensed under Creative Commons Attribution 3.0, and WordNet content is licensed under the WordNet License.

Contents and Format

The dataset is a JSON file, containing a single list of JSON objects, one per word-sense, that is, a word and its definition. Thus, a single word with multiple definitions could appear multiple times in the dataset. To disambiguate words, each JSON object contains a WordNet sense number and definition (definitions are also mostly from WordNet).

JSON format

  • word is the word itself.
  • wordnet_senseno is the sense number in WordNet, and may be used to look up the word in WordNet, for example to find synonyms and related words.
  • gender is the gender of the person or people the word refers to, and currently has one of the values gender-neutral (n), male or masculine (m), female or feminine (f) or other (o).
  • gender_map contains mappings to words for other genders, if they exist.
    • Currently contains mappings for f (female/feminine), m (male/masculine), n (gender-neutral). Other genders may be added in the future or by pull request.
    • parts_of_speech is either * for all parts of speech, or one of the Penn Treebank POS tagset which is common to Python NLTK and many other NLP libraries. Examples are NN (noun) and NNP (proper noun).

Example

[
	{
		"word":"artilleryman", 
		"wordnet_senseno": "artilleryman.n.01", 
		"gender": "m", 
		"gender_map": {
				"f": [{"parts_of_speech": "*", "word": "artillerywoman"}],
				"n": [{"parts_of_speech": "*", "word": "artilleryperson"}]
			}
		}
	},
	...
]

Contributing

  1. Fork the master repo and clone your fork.
  2. Using your favorite editor or IDE, make your changes.
  3. Do not reformat the JSON. Preserve the format of one word per line.
  4. Add new words at the end of the file.
  5. Commit and push changes to your fork.
  6. Create a pull request.
  7. Maintainers of the master repo will review the pull request and accept or request changes.

How the dataset was created

This dataset was initially created by dumping all of the hyponyms of the WordNet synset for "person". Each word was manually tagged as gender-neutral (n), male or masculine (m), female or feminine (f) or other (o) (only a few words such as "hermaphrodite" are tagged as "other").

A few additional words that are not WordNet hyponyms of "person" were manually added, including personal (he, she, him, her) and possessive (his, hers, her) pronouns, and a few adjectives like male and female.

The dataset will be occasionally updated, and pull requests are welcome.

Natural Gender Definition

According to Wikipedia (as of 2019-10-31), "The natural gender of a noun, pronoun or noun phrase is a gender to which it would be expected to belong based on relevant attributes of its referent. This usually means masculine or feminine, depending on the referent's sex (or gender in the sociological sense)."

About

Dictionary of English words tagged with their natural gender.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published