Skip to content
I have this big list of links to text stuff that I like, so I thought I'd make it into a repository.
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md

README.md

Resources

UNIX

Python + programming

Tools

Text

  • Prepared example texts that I reference frequently in class
  • Project Gutenberg
  • Pywikipediabot, a Python library for jacking text from Mediawikis (like Wikipedia)
  • Common Crawl, "a repository of web crawl data that is openly accessible to everyone"
  • Corpus of Contemporary American English: search for frequencies and contexts of words and phrases in "the largest freely-available corpus of English." (Provides no API, unfortunately.)
  • Wordnik, a dictionary. The Wordnik API "lets you request definitions, example sentences, spelling suggestions, related words like synonyms and antonyms, phrases containing a given word, word autocompletion, random words, words of the day, and much more."
  • Corpus resources
  • Corpora, "a collection of small corpuses of interesting data for the creation of bots and similar stuff." By Darius Kazemi and various contributors.
  • WordNet is, most simply stated, a computer-readable thesaurus of the English language; if you're interested in non-English equivalients, see A Complete Multilingual WordNet List by Language

Poetics and Text Analysis

You can’t perform that action at this time.