I have this big list of links to text stuff that I like, so I thought I'd make it into a repository.
Permalink
Failed to load latest commit information.
README.md

README.md

Resources

UNIX

Python + programming

Tools

Text

  • Prepared example texts that I reference frequently in class
  • Project Gutenberg
  • Pywikipediabot, a Python library for jacking text from Mediawikis (like Wikipedia)
  • Common Crawl, "a repository of web crawl data that is openly accessible to everyone"
  • Corpus of Contemporary American English: search for frequencies and contexts of words and phrases in "the largest freely-available corpus of English." (Provides no API, unfortunately.)
  • Wordnik, a dictionary. The Wordnik API "lets you request definitions, example sentences, spelling suggestions, related words like synonyms and antonyms, phrases containing a given word, word autocompletion, random words, words of the day, and much more."
  • Corpus resources
  • Corpora, "a collection of small corpuses of interesting data for the creation of bots and similar stuff." By Darius Kazemi and various contributors.
  • WordNet is, most simply stated, a computer-readable thesaurus of the English language; if you're interested in non-English equivalients, see A Complete Multilingual WordNet List by Language

Poetics and Text Analysis