A macroetymology engine that uses the Etymological Wordnet.
Branch: master
Clone or download
Latest commit bf6d267 Dec 21, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
macroetym update per pycountry.language API change Jun 1, 2018
.gitignore ignore pycache again May 13, 2016
LICENSE.txt add gpl May 13, 2016
MANIFEST.in add manifest May 13, 2016
README.md tiny pip3 example Dec 20, 2018
_config.yml Set theme jekyll-theme-slate Sep 2, 2017
setup.cfg reconfiguring project for upload to pypi May 12, 2016
setup.py increment version Jun 1, 2018


The Macro-Etymological Analyzer, v2.0.0

This is a rewrite of The Macro-Etymological Analyzer, a tool for etymological text analysis originally written as a web app on the LAMP stack.

New Features in v2.0.0

  • The web interface has been replaced with a command-line interface, making the MEA scriptable and machine-readable and writable. A web front-end to the command-line interface will be possible in a future version.
  • It is now possible to analyze and compare multiple texts at a time.
  • Users can filter for only those language families they care about.


You can install this program with git and pip:

git clone https://github.com/JonathanReeve/macro-etym
cd macro-etym
pip install .

If you experience errors, you could try installing with pip3 instead:

pip3 install .


To compare the macro-etymologies of Moby Dick and Pride and Prejudice, first download the texts to your current working directory, then run:

macroetym moby-dick.txt pride-and-prejudice.txt

To see that data represented in a chart (experimental), try appending --chart. Although you might be better off outputting it as a CSV (with --csv) and then making your own chart using spreadsheet software.

To see a list of options, run:

macroetym --help

That should show you this screen:

Usage: macroetym [OPTIONS] FILENAMES...

  Analyzes a text(s) for the etymologies of its words, and tallies the
  words by origin language, and origin language family.

  --allstats           Get all etymological statistics about the file(s).
  --lang TEXT          Specify the language of the texts. Use ISO639-3 three-
                       letter language code. Default is English.
  --showfamilies TEXT  A comma-separated list of language families to show,
                       e.g. Latinate,Germanic
  --affixes            Don't ignore affixes. Default is to ignore them.
  --current            Don't ignore current language and its middle variants.
                       Default is to ignore them.
  -c, --csv            Print a machine-readable CSV instead of a pretty
  --chart              Make a pretty graph of the results. For one text, a
                       pie; for multiple, a bar.
  --verbose            Show debugging messages.
  --help               Show this message and exit.