Skip to content

Quick-and-dirty scripts for performing a wavelet transform in word-vector space, used for NaNoGenMo 2017 on Virginia Woolf's The Waves

License

Notifications You must be signed in to change notification settings

danuep/nanogenmo2017

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Scripts to generate THE WAVELETS

A National Novel Generation Month 2017 entry

The idea of this project was to take the word-vector-space projection of a novel and perform a wavelet transform on it. The results of wavelet transforms on images can produce an effect of ghostly, echoing expansion, and I thought it would be worth seeing what that might look like on text.

Dependencies

Steps

  1. Preprocess
    • Drop the text in the same directory as the scripts as waves.txt.
    • The text of The Waves at the above link has special characters (smart quotes and em dashes) which are not represented in novel-vectors-word2vec. I did a lazy search-and-replace to convert those to ', '', and --.
  2. groovy tokenize.groovy
  3. groovy encode-transform.groovy
  4. python3 vecs2text.py

Output will be in the-wavelets.txt.

Results

The results are slightly entertaining, but iffy as a legible text. It's reassuring to see that the average value for both texts (the first word in the transformed text) roughly aligns with the mood of the text. Subsequent words represent the oscillations between sections of text, which I think is harder to intuit than pixel values representing oscillations in sections of an image.

Specifically, the second-through-fourth words of The Wavelets represent that the second half of The Waves is more Grandmother than the first, the second quarter (and to a lesser extent last quarter) more Mas'r than the first, and the last quarter (and to a lesser extent second quarter) more Meadows than the third.

Future work

  • Accept command line arguments for file names
  • Smarter tokenization
  • Low-pass filtered output for "summarization"
  • Merging two works by averaging in the frequency domain and wavelet reconstruction

About

Quick-and-dirty scripts for performing a wavelet transform in word-vector space, used for NaNoGenMo 2017 on Virginia Woolf's The Waves

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published