A list of ~40K Classical Latin proper names
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md
build_proper_names_list.py
proper_names.txt

README.md

About

The file proper_names.txt contains a newline-delimited file which contains all of the words in the PHI5 which are likely proper names (persons, places, etc.). The value of this list is that, since everything is a noun, it may be used as a default POS tagger for these unusual words.

build_proper_names_list.py shows how this file was made. proper_names.txt contains 40,683 unique, alphabetized words.

Important notes:

  • This list contains some words that are not proper nouns, and is currently being hand-checked to remove these. It is currently hand-checked to EOF.
  • Some processing artifacts remain in the text, esp forms w/ a trailing _ (underscore) character. These will be removed later via automatic processing.
  • Similarly, there are a number of doublets as a result of a lexeme + underscore + additional lexeme; e.g., 'Alexandro' vs 'Alexandro_erat'.
  • A certain number of forms with attached clitics (e.g., -que, -ve) are present in the corpus; the host lexemes of these clitics are often doublets of non-cliticized lexemes.
  • A number of apparent abbreviations have been left intact; e.g., 'Achil'.
  • There is a certain amount of orthographic doubling as the result of u/v or i/j spellings; e.g., 'Achivis' vs. 'Achiuis', or '-que' vs '-qve'. Similarly, in Greek words there are a number of doublets from variant y/u spellings; e.g., 'Amphitruone' vs. 'Amphitryone'.
  • Roman numeral notation has also been removed.

License

Copyright (c) 2014 Kyle P. Johnson, under the MIT License. See 'LICENSE' for details.