Skip to content
An introductory workshop on generative text.
JavaScript Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Intro
char-rnn
rita
tracery
word2vec
.DS_Store
README.md
clean_text.py

README.md

Machines Are Poets Too: An Introduction To Generative Text

By Brent Bailey. Accompanying slides can be viewed here.

This repo contains resources for learning about a small set of the myriad avilable methods to create generative text with code. Below are a few code samples inside the p5.js editor to toy with, plus a longer list of available resources to explore - this is just a small taste of the wide world of text generation tools available.

For some of the code samples (especially the ml5 stuff), it’s easiest if you have Python installed because running them in the p5 web editor gets tricky. Don’t worry, you don’t have to learn Python for this workshop! We’ll just be using it to run a local server. If you do have Python installed, just run python -m SimpleHTTPServer -8000 (for Python 2) or python -m http.server -8000 (for Python 3). From there, navigate to localhost:8000 in your browser (all work best in Chrome).

Basic Javascript Text Generation

A simple way of generating random text strings with p5 and a text corpus taken from the works of James Baldwin: p5 sketch.

Tracery

Tracery is a JavaScript library used to create “grammars”: basically a top-level sentence structure then sets of words that meet each sentence component. We’ll use Allison Parrish’s p5 example here.

Word vectors with ml5

Word vectors is basiclly the use of complex math to determine the similarity of different words. ml5 has a simple-to-use model built on top of Tensorflow that we’ll use here. The code sample is located in the word2vec folder.

Rita.js

Rita.js is an incredible tool for any kind of computational work with text, but we’ll be focusing specifically on some potential generative applications of it.

If you want to mess around with the examples, you may find its documentation useful, as well as its list of PENN part-of-speech tags.

These examples are made with Rita’s “full” lexicon - if you get into doing more serious work with it, you may want a smaller version.

Code samples are located in the rita folder. A few of them are also online: intro showing how to extract parts of speech, rhyme generator, Markov chains.

LSTM (CharRNN)

CharRNN is a LSTM (Long Short-Term Memory) neural network available in the ml5 library. RNNs are, uh, hard to explain, but there’s more information in the slides and resources. You can find code samples in the char-rnn folder, or play with a model I trained on James Baldwin here.

Resources

“Simple” Stuff

How To Make A Dadaist Poem

Botnik - a predictive keyboard generator.

Dadaist NLP text from in-class exercise

ConceptNet, a semantic network

Cheap Bots Done Quick - quick bot creation with Tracery.

Free AIML Bots

Runway ML - basically PhotoShop for AI - plug and play machine learning with minimal setup. Currently in free and open beta!

Markov Chains

A visual explanation of Markov chains

Towards data science on Markov chains - honestly just start reading Towards Data Science if you’re into ML/AI.

Machine Learning

Allison’s Understanding Word Vectors - it’s in Python, but the principles are the same.

Publicly available nlp datasets - if you don’t want to deal with making your own, there are tons out there!

Recurrent Neural Networks Tutorial - the most beginner-friendly explanation of RNNs I’ve seen

The Unreasonable Effectiveness Of Recurrent Neural Networks - really useful intro to RNNs

Better Language Models and Their Implications (GPT-2) - scary but cool!

GPT-2 - source code for GPT-2

Can A Machine Write For The New Yorker? - article on GPT-2 and the future of writing.

Data Gathering And Prep

You can use the python script clean_text.py to quickly remove nonalphanumeric characters from a file. Simply type python clean_text.py {your_file_name} in your working directory.

Project Gutenberg - hella books.

Libgen - The Pirate Bay for books. I’m not linking here because they change URLs constantly, but look hard and you’ll find it. There’s a scraper for it in the post linked below.

My quick and dirty web scraping resources - contains links to a couple potentially useful scrapers I’ve written, plus a guide to cracking DRM protection if you’d like to get a protected corpus (fOr ReSeArcH pUrpOseS OnLy).

Effectively Pre-Processing Text Data

Other Fun Stuff

NaNoGenMo - national novel generation month. Every November. Generate a novel!

You can’t perform that action at this time.