-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentence-level Markov model (or, reconstructing Moby-Dick using a neural network) #99
Comments
|
The premise reminds me of [my phrase-chain project](
https://github.com/enkiv2/misc/tree/master/phrasechain), which I used in
NaNoGenMo 2016. It seems like your version probably has better results
since it takes more words into account?
…On Fri, Nov 30, 2018 at 7:24 PM Jeff Binder ***@***.***> wrote:
I split each chapter of *Moby-Dick* into sentences, then used a neural
network to try to guess what order the sentences should appear in. I call
the result *Mboy-Dcki*
<https://raw.githubusercontent.com/jeffbinder/sentence-level-markov/master/mboydcki.txt>
.
This is essentially a Markov chain model that works at the level of
sentences rather than words or tokens. Such a model cannot be trained
directly, so I created a encoder-decoder-type recurrent neural network that
takes in the last 25 characters of a sentence and tries to guess what the
first 25 characters of the next sentence will be. I then used this network
to compute the probabilities for each pair of sentences.
It actually sort of works—at the very least, it picks the right sentence a
little more often than chance would dictate. But the point, of course, is
in the interesting ways it fails.
Code and a more detailed explanation are here.
<https://github.com/jeffbinder/sentence-level-markov>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#99>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAd6GdbXLnIal7eeoWMEO87RDHHQrZaoks5u0cxNgaJpZM4Y8mXI>
.
|
Thanks for sharing this! The two approaches could possibly be combined by running the neural network on phrases rather than sentences. The training script should work without modification on any linguistic unit (phrases, clauses, paragraphs, etc.)—the corpus just has to be prepared differently. Doing it at the phrase level might make the strangeness more immediate because you would have to read fewer words on average before getting to something that differs from the original text. I'm not sure how well the particular model I used would do at assembling phrases into syntactically correct sentences, though. |
I split each chapter of Moby-Dick into sentences, then used a neural network to try to guess what order the sentences should appear in. I call the result Mboy-Dcki.
This is essentially a Markov chain model that works at the level of sentences rather than words or tokens. Such a model cannot be trained directly, so I created a encoder-decoder-type recurrent neural network that takes in the last 25 characters of a sentence and tries to guess what the first 25 characters of the next sentence will be. I then used this network to compute the probabilities for each pair of sentences.
It actually sort of works—at the very least, it picks the right sentence a little more often than chance would dictate. But the point, of course, is in the interesting ways it fails.
Code and a more detailed explanation are here.
The text was updated successfully, but these errors were encountered: