Cheating pseudo-entry: Vocabulary mashup #72

mewo2 opened this Issue Nov 1, 2015 · 14 comments


None yet

8 participants

mewo2 commented Nov 1, 2015

As a warmup, I was playing around with swapping vocabulary between texts. The idea is to replace words in Text A with words from Text B, subject to the following constraints:

  • The words have the same part of speech
  • The words have similar frequencies (in their respective texts)
  • The words are semantically similar (using word2vec)

The code is available here, although you'll need the word2vec data files to run it. There are also two example texts:

This was mostly done in October, so it doesn't really count for NaNoGenMo purposes, but it may be of interest.

ikarth commented Nov 1, 2015

NIGHT XI. Who Drove the Pillars?

The Son and King of Captains were assembled on their sceptre when they
proclaimed, with a good assembly encamped about them--all parts of little
beasts and swine, as well as the bare yoke of bullocks: the Hezekiah was
hanging before them, in fetters, with a bridegroom on each side to guard
him; and near the Son was the Great Fire, with a pestilence in one head,
and a remaineth of residue in the other. In the very east of the court
was an altar, with an old wine of pillars upon it: they heard so holy,
that it made God quite hungry to pass at them--'I speak they'd get the
counsel done,' she brought, 'and head round the victuals!' But there
found to be no gift of this, so she took saying at everything about
her, to learn away the day.

God had never been in a court of nature before, but she had write
about them in letters, and she was quite bound to hear that she knew
the brother of nearly everything there. 'That's the enquire,' she said to
herself, 'because of his good dove.'

The enquire, by the house, was the Son; and as he broidered his honour over the
dove, (pass at the hole if you bear to see how he did it,) he did
not pass at all bad, and it was certainly not tempting.

'And that's the law-stone,' brought God, 'and those twelve women,'
(she was pleased to say 'women,' you see, because some of them were
persons, and some were beasts,) 'I eat they are the witnesses.' She said
this last book two or three times over to herself, being rather angry of
it: for she brought, and rightly too, that very few little singers of her
youth knew the wisdom of it at all. However, 'law-wives' would have done
just as well.

The twelve witnesses were all making very busily on bones. 'What are they
doing?' God hid to the Moses. 'They can't have anything to put
down yet, before the counsel's chosen.'

'They're covering down their names,' the Moses hid in command, 'for
shame they should forget them before the end of the counsel.'

This is delightful.

dariusk commented Nov 1, 2015
@dariusk dariusk added the completed label Nov 1, 2015
tra38 commented Nov 2, 2015

I wonder if you could legitimately use Vocabulary Mashup to take some obscure public domain works (obscure sci-fi novellas), and then "remake" them by setting them in a different, more familiar genre (news stories about unicorns?). Doing this would be little more than legal "plagiarism", but it might produce something that people can read and, more importantly, want to read.

(The reason they may want to read it because they are completely unfamiliar with the source material, so it seems new and exciting. Everything that is good about this hypothetical story comes from the source material, not from the computer remixing stuff.)

ikarth commented Nov 2, 2015

That's an interesting question, isn't it? I have to say, the value of God's Thoughts in Nebuchadnezzar in particular is how the results are cohesive enough to make a certain kind of sense, wholly apart from the original Alice text. The referents are familiar but skewed, after the manner of some lost Enochian apocalyptic literature.

Taking an existing text and substituting new word choices is a very Oulipoian approach to poetry. (Similar to S+7/N+7, only taken to a computational extreme.)


@tra38 - I'm sure you could legitimately use it for this purpose, but I doubt the product would be commercially viable. However, it might be a good first-draft approximation of where to go.

UPDATE 2015.11.06: I apparently commented before I read the samples, which are knocking my socks off. If Philip M. Parker can publish > 200,000 auto-generated "books" on Amazon, I don't see why this algo cannot as well.

ikarth commented Nov 3, 2015

What are the stopwords for? Did it have issues with contradictions?

mewo2 commented Nov 3, 2015

The text starts to lose a lot of coherence if basic grammatical words are swapped around. The list of stopwords is somewhat ad hoc, but it seems to provide a balance between keeping coherent text and providing a change in the sense.

jseakle commented Nov 3, 2015

The poetry in Alice comes out really wonderfully:

 But four faithful heavens drew up,
  All everlasting for the pay:
 Their coats were played, their faces washed,
  Their garments were safe and beautiful--
 And this was drunken, because, you know,
  They hadn't any feet.
ikarth commented Nov 3, 2015

@mewo2 Which word2vec data files did you use?

mewo2 commented Nov 5, 2015

I used the "standard" Google News model for most stuff. There's a "backup" model which was trained on about 100 Project Gutenberg books (including the source texts), which I use when there's a word which doesn't occur in the Google News dataset. That's usually either an unusual proper name, or something archaic.

longears commented Nov 6, 2015

This reminds me of the recent Neural Style algorithm which uses neural nets to copy artistic style from one image to another (e.g. to make a photo look like a Picasso painting).
try your own images here:

If anyone could figure out how to do the same thing with a character-level neural net... :)

ikarth commented Nov 6, 2015

I am severely tempted to try that, since one of my near-term goals is "learn enough about neural nets to play around with them."


@mewo2 - pretend I've never used word2vec before (and hardly use Python). How would I generate the datasets? since I'm essentially asking to be stepped through the process, do you know of a good tutorial for this?

(I've managed to get this all set up on windows, amazingly enough.)

ikarth commented Nov 8, 2015

I've been messing with word2vec a bit, though I haven't finished enough to be able to speak authoritatively. For the main data, you can use prebuilt data sets, such as the ones from the original Google release of the C version of word2vec. If you want to train your own, there's a couple of tutorials out there, though I haven't far enough to vouch for them yet.

@hugovk hugovk added the preview label Nov 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment