Compiler pipeline + writers' techniques = a "proper novel" ::blink:: #11

Open
cpressey opened this Issue Oct 26, 2015 · 89 comments

Projects

None yet
@cpressey

Novel: A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer O'James during her First Three Years in the Space Fighters
Code: on Bitbucket
Write-ups: Overview of a "Story Compiler"


Observation: It is very difficult for the average person to read a typical NaNoGenMo-generated novel in its entirety, from beginning to end.

It's because the brain begins to tire, right? It gets all "I see what you did there" and balks at facing yet more unpredictable stuff.

Goal: To write a generator that generates a novel that does not succumb to this effect.

You still might not be able to read the resulting novel to the end, but, if you stop reading after the first 2 chapters, it should be because the novel is just plain bad, not because its aura of generativeness is burning a hole in your attention span.

Downloading an existing novel from Project Gutenberg, or similarly trivial approaches, don't count.

This is, of course, a completely unrealistic goal. But one must have some goal, mustn't one?

@TheCommieDuck

That pretty much sums up my thoughts/goals too.

Is it really so unrealistic?

@cpressey

Well, I guess we'll see, but yes I think it's incredibly unrealistic.

@ikarth
ikarth commented Oct 26, 2015

Perfect! And definitely a worthy goal.

@cpressey

I should maybe qualify those statements a bit.

I do think the goal I stated is highly unrealistic, certainly with the techniques that I'm personally prepared to use. But the space of possible techniques is vast, so who knows?

What I'm sort of getting at by choosing that goal is this:

In 2013, I tried generating a "proper novel". Last year, I did a bunch of experiments closer to the so-called "conceptual writing" side of things. This year, I'm returning to the "proper novel", however quixotic any such attempt might be.

Given that I've stated a goal that I admit is unrealistic, I suppose I do not expect myself to actually achieve it. But it will be interesting to see how I fail.

At the same time, one need not have only one goal, so...

After last NaNoGenMo, around January of this year, I started thinking a lot about how people write stories. I did a lot of research (if you can call reading article after article on TVTropes research) and I came to the conclusion that there are certainly some story-writing techniques that can be approximated with algorithms.

So, one of my secondary goals is: To implement one or more story-writing techniques that human writers use.

This is a much more realistic goal, I think.

Heck, even The Swallows had a MacGuffin, but it wasn't really developed. I'd like to go a bit beyond that.

I'll probably continue to expand on these thoughts in future posts to this issue.

@MichaelPaulukonis

In 2013, I tried generating a "proper novel".

::blink blink::

[updated as I had not pasted what I wanted to have pasted]

@brianfay

I can imagine a computer-generated book being easier to read than something like Naked Lunch or Finnegan's Wake.

@cpressey

@YottaSecond

I can imagine a computer-generated book being easier to read than something like Naked Lunch or Finnegan's Wake.

Mmmaybe...

But I wager that if someone stops reading Finnegan's Wake after chapter 2 it's almost certainly not because their brain went all "I see what you did there."

@dariusk dariusk changed the title from There is no way I'll be able to build this in four weeks, oh wait, that's kind of the point isn't it to A fully end-to-end readable novel Oct 27, 2015
@dariusk
Owner
dariusk commented Oct 27, 2015

Hi, I'm going through and updating the titles on issues to make them more specific. Feel free to edit my edit if it's not to your liking. This is to make browsing issues a lot more pleasant.

@cpressey cpressey changed the title from A fully end-to-end readable novel to Compiler pipeline + writers' techniques = a "proper novel" ::blink:: Oct 28, 2015
@cpressey

While there will certainly be similarities, my third goal is to not just end up re-writing The Swallows. I was looking through that code yesterday, seeing how much of it could be re-used. Very little, I think.

My background is programming languages, so I have a hard time not seeing a story generator as a kind of compiler.

A typical compiler is structured as a pipeline with a number of phases. The process for writing a story is much messier, but in a broad sense it too is a "pipeline", from idea to outline to draft to finished work.

In fact a story-writing pipeline is in some ways the inverse of a compiler pipeline.

A compiler takes a readable text and turns it into an incoherent blob. A writer takes an incoherent blob and turns it into a readable text.

One of the first things a compiler often does is strip comments from the source code and throw them away, because they're not crucial to the result. One of the last things a writer might do is add commentary that's not crucial to the story.

One of the last things a compiler does is optimize the generated code to make it shorter and more efficient. One of the first things a writer might do is complicate the plot to make it longer and more interesting.

Somewhere in the middle of the compiler, it might check that the program does not contain certain errors, like assigning a string value to an integer variable. Somewhere in the middle of writing a story, a writer might check that the characters are not doing something that, in that scene, would not be possible.

And so forth. The similarities really are rather remarkable.

@enkiv2
enkiv2 commented Oct 28, 2015

Of course, since most of these operations are generative, if a single pass
fluffs out a summary and checks continuity, you could just run the same
pass over and over on an arbitrarily small summary until you had 50k words
;-)

On Wed, Oct 28, 2015 at 6:33 AM Chris Pressey notifications@github.com
wrote:

While there will certainly be similarities, my third goal is to not just
end up re-writing The Swallows. I was looking through that code yesterday,
seeing how much of it could be re-used. Very little, I think.

My background is programming languages, so I have a hard time not
seeing a story generator as a kind of compiler.

A typical compiler is structured as a pipeline with a number of phases.
The process for writing a story is much messier, but in a broad sense it
too is a "pipeline", from idea to outline to draft to finished work.

In fact a story-writing pipeline is in some ways the inverse of a compiler
pipeline.

A compiler takes a readable text and turns it into an incoherent blob. A
writer takes an incoherent blob and turns it into a readable text.

One of the first things a compiler often does is strip comments from the
source code and throw them away, because they're not crucial to the result.
One of the last things a writer might do is add commentary that's not
crucial to the story.

One of the last things a compiler does is optimize the generated code to
make it shorter and more efficient. One of the first things a writer might
do is complicate the plot to make it longer and more interesting.

Somewhere in the middle of the compiler, it might check that the program
does not contain certain errors, like assigning a string value to an
integer variable. Somewhere in the middle of writing a story, a writer
might check that the characters are not doing something that, in that
scene, would not be possible.

And so forth. The similarities really are rather remarkable.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@cpressey

Sure, except (continuing with the compiler analogy) most compilers aren't designed to take as input that which they generate as output.

I certainly wasn't planning on building anything that could read a novel!

@cpressey

(Excuse my "designblogging" but it helps give me something to do to stop myself from jumping the gun and starting too early! Am chomping at the bit, can you tell? Trying to keep each post reasonably short.)

If the "novel compiler" doesn't take a written text as input, I suppose that raises the question of what it does take as input.

One answer could be "nothing, it's just a generator, you just run it," which might be literally true, but it doesn't really answer the question.

A more satisfying answer would be that it takes an outline of a plot, in some kind of data format, as input, even if that outline is hardcoded or randomly generated in the compiler itself.

It then refines that plot by iteratively rewriting it, stepwise, into increasingly more detailed plots. Once it has a detailed enough plot, it rewrites that into a series of events, and in the end rewrites those into sentences. I suppose this is a top-down, plot-driven approach, as @TheCommieDuck described it.

About these plots... (kind of thinking out loud here...)

The "seed plot" that the compiler starts with could be as skeletal as The Hero's Journey.

Or maybe even more basic, like, the "null story":

Once upon a time, they lived happily ever after. The end.

From there, you just keep inserting subplots into it. I'm still weighing ideas about exactly how to accomplish this process. I might write more about it later.

@ikarth
ikarth commented Oct 29, 2015

One thing I was playing with in past projects was embedding metadata about the generation in the outputted text, and then performing a last cleanup phase before the actual final output. So there would be a bunch of bracketed tags scattered around marking things that could potentially be expanded. And the last step stripped the bracketed text out or reduced it to its default.

I never fully implemented the idea, but it might be useful for your novel compiler.

@mewo2
mewo2 commented Oct 29, 2015

One solution to the problem of passes being able to read their own output would be to take a leaf from LLVM and have a single intermediate representation (e.g. a list of events), which most passes use for both input and output. You can munge this repeatedly until your novel is complex enough, then run a single final pass which converts to prose.

@cpressey

@ikarth My understanding is that this (embedding structured data inside unstructured text) was one of the original use cases for XML, though it's probably under-used these days.

I don't currently see foresee myself having a huge need for this, but if it becomes desirable, I'll keep that idea in mind, thanks.

@mewo2 I'm currently thinking of the individual passes as purely internal rewriting operations on whatever data structures happen to be convenient at that point in the pipeline. But if the whole novel-model becomes too much to hold in memory, I suppose I will have to think about reading and writing intermediate representations, yeah.

@MichaelPaulukonis

I note that many of the classic Narrative generators generated their world + stories, and had another independent system that "translated" them into more natural language. For example, TALE-SPIN:

2015-10-29 09_47_59-inside computer understanding_ five programs plus miniatures - r c schank c
(source)

It's a lot more complicated than this, but I can't find back an example/citation right now. Multiple sentences about the current world-state would be combined (JOE WAS IN THE CAVE. JOE KNEW HE WAS IN THE CAVE. THE CAVE WAS DARK. THE CAVE HAD AN EXIT. JOE KNEW THE CAVE WAS DARK. JOE WANTED TO BE IN THE LIGHT. JOE KNEW THE CAVE HAD AN EXIT. => Joe wanted to get out of the cave and into the light.)

@cpressey
cpressey commented Nov 2, 2015

@MichaelPaulukonis the 2nd version is more pleasant to read, but the 1st is just that much closer to 50,000 words, isn't it?

Participating, even reading all the issues for this year's edition, is clearly going to cut into what little time I already have. I'll keep these updates short and infrequent.

I suppose I have a goal number 4, which is: don't use any libraries or corpuses or APIs except the bare minimum. Well, that's not a goal so much as predilection. I enjoy writing code. I don't enjoy learning and futzing with the idiosyncrazies of Yet Another Dependency. But this gives you an indication of what the final result will be like here.

I'm not planing on releasing any previews or code until the end, or at least until the result reaches a certain minimum quality (but see I don't expect that to happen in November so, like, until the end.)

@MichaelPaulukonis

I'm not planing on releasing any previews or code until the end

DRAT! There goes my plan!

As usual, I'm hoping to play with a bunch of different dependencies, and then see if anything sticks. Each to our own.

@cpressey
cpressey commented Nov 4, 2015

Update: it generates a story. It is terrible. I do hope Goal 1 didn't get anyone's hopes up. I did call it "unrealistic" and "incredibly unrealistic" in almost immediate succession...

Actually, suppose we reframe Goal 1 slightly, with gradation instead of as a yes-or-no thing. How many words of the average NaNoGenMo text is the average reader willing to read, on average, before they give up? By "read" I of course mean, try to make sense of the words, not just look at them.

For texts that are complete word salad, the number is probably well below 100. (and then you start skimming forward, maybe, looking for interesting nonsense.) For others, maybe higher. A couple of hundred, at a guess. Hard to say, without going to the ridiculous length of actually conducting experiments on it.

Anecdotes welcome, though!

@enkiv2
enkiv2 commented Nov 4, 2015

One of the reasons I did generative erotica is that people will, on
average, be more entertained with less coherent erotica -- the subject
matter is either purient or funny. As a result, a fairly simple and
low-quality grammar produces a result that I was willing to read several
pages of. I suspect that there are other tricks with regard to style or
subject matter that work similarly to increase the readability of content
irrespective of novelty or quality. (For instance, vague yet evocative
sentences like those used in Monfort's 1k generators would be great if you
could consistently generate them!)

On Wed, Nov 4, 2015 at 9:06 AM Chris Pressey notifications@github.com
wrote:

Update: it generates a story. It is terrible. I do hope Goal 1 didn't get
anyone's hopes up. I did call it "unrealistic" and "incredibly unrealistic"
in almost immediate succession...

Actually, suppose we reframe Goal 1 slightly, with gradation instead of as
a yes-or-no thing. How many words of the average NaNoGenMo text is the
average reader willing to read, on average, before they give up? By "read"
I of course mean, try to make sense of the words, not just look at them.

For texts that are complete word salad, the number is probably well below
100. (and then you start skimming forward, maybe, looking for interesting
nonsense.) For others, maybe higher. A couple of hundred, at a guess. Hard
to say, without going to the ridiculous length of actually conducting
experiments on it.

Anecdotes welcome, though!


Reply to this email directly or view it on GitHub
#11 (comment)
.

@cpressey
cpressey commented Nov 4, 2015

Several pages, meaning, what, about 1500 words?

@enkiv2
enkiv2 commented Nov 4, 2015

Yeah, something like that. (I have a fairly high tolerance for this stuff,
though. You can analyse it for yourself:
https://github.com/enkiv2/NaNoGenMo-2015/blob/master/orgasmotron.md )

On Wed, Nov 4, 2015 at 9:15 AM Chris Pressey notifications@github.com
wrote:

Several pages, meaning, what, about 1500 words?


Reply to this email directly or view it on GitHub
#11 (comment)
.

@tra38
tra38 commented Nov 4, 2015

For non-simulations, my guess is that the attention span starts drifting at '3*templates', where templates are the number of words within the template in question. It's enough for the user to gets bored because he grasps the pattern. So if you get a template that is 500 words in length, then that would probably make your 1500 words.

(and then you start skimming forward, maybe, looking for interesting nonsense.)

It seems bots are excellent at generating text, but it's the humans who are trying to shift through the resulting nonsense to find actual meaning and worth. There has to be a mathematical formula that can be used to measure the 'fitness' of a text, allowing the bots to engage in filtering for how interesting* it is. This way, you can have the bots generate a bunch of words and then engage in automatic curation.

*We can define 'interesting' perhaps by sentiment analysis or how well it matches one of Vonnegut's plot curves. Or maybe, pull in machine learning. You rate a passage the computer generates on a scale of 1-10, and with enough data, eventually the computer will find a pattern.

@hugovk
Collaborator
hugovk commented Nov 4, 2015

There has to be a mathematical formula that can be used to measure the 'fitness' of a text, allowing the bots to engage in filtering for how interesting* it is.

Brings to mind genetic algorithms. Has anyone tried that approach?

@enkiv2
enkiv2 commented Nov 4, 2015

With regard to the 'mathematical formula', I suspect you could use
Shannon's information entropy formula with the prior of the reader's mind
:P. After all, humans accept only a narrow band of novelty, and what counts
as novel depends upon what the reader has seen before.

On Wed, Nov 4, 2015 at 9:17 AM Tariq Ali notifications@github.com wrote:

For non-simulations, my guess is that the attention span starts drifting
at '3*templates', where templates are the number of words within the
template in question. It's enough for the user to gets bored because he
grasps the pattern. So if you get a template that is 500 words in length,
then that would probably make your 1500 words.

(and then you start skimming forward, maybe, looking for interesting
nonsense.)

It seems bots are excellent at generating text, but it's the humans who
are trying to shift through the resulting nonsense to find actual meaning
and worth. There has to be a mathematical formula that can be used to
measure the 'fitness' of a text, allowing the bots to engage in filtering
for how interesting it is. This way, you can have the bots generate a bunch
of words and then engage in automatic curation.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@ikarth
ikarth commented Nov 4, 2015

Humans figuring out patterns seems to be part of the interestingness metric. It seems to work on multiple scales: Groking the central conceit in Aggressive Passive or Redwreath and Goldstar Have Traveled to Deathsgate takes a few minutes at most, which will give you the sense of the overall plot without reading it. (And then figuring out the puzzle of which question goes with which answer can take a lifetime.)

Something like #72 or Alice's Adventures in the Whale takes a bit longer, because once you've grasped the pattern, the pleasure is in seeing the changes that were made in a familiar text.

I suspect that simulations play by slightly different rules. Dwarf Fortress has certainly generated a lot of stories, though I'm not sure how many of them are interesting precisely because they were interactive. (Not to mention, most renditions are a retelling of the events, rather than a direct output.) I'm going to be watching this year's simulation results with interest.

One pleasure that most generative works lack is a sense that an author intended them to happen this way. Not that you can't get a degree of intention-sense. I suspect that's why high-concept things like Aggressive Passive work so well: we can read the higher authorial intent, and that makes it easier to get closure and catharsis.

@ikarth
ikarth commented Nov 4, 2015

@hugovk @enkiv2 Maybe generate a large corpus of generated results (with the metadata for the generator settings), use a crowdsourced interestingness vote, and then use that as the fitness criteria for an RNN?

@enkiv2
enkiv2 commented Nov 4, 2015

Honestly, I'd love to do that (and there are some similar systems out
there). But, it sort of requires exposure (or mechanical turk!), so it
might be hard to do this at novel length in a month, since it requires some
large number of people to read more than a novel-length quantity of
generated content in a month.

(I mean, you could do this with a markov chain rather than an RNN too. Do
monte carlo and generate a handful of 'next sentences' and have people vote
on which one is best, then feed the winner back in as training data or
increment its connections.)

On Wed, Nov 4, 2015 at 9:47 AM Isaac Karth notifications@github.com wrote:

@hugovk https://github.com/hugovk @enkiv2 https://github.com/enkiv2
Maybe generate a large corpus of generated results (with the metadata for
the generator settings), use a crowdsourced interestingness vote, and then
use that as the fitness criteria for an RNN?


Reply to this email directly or view it on GitHub
#11 (comment)
.

@MichaelPaulukonis

@hugovk - didn't we talk about this in the GenerativeText list? The trouble is the fitness algorithm -- if you've got one, well -- you've solved the problem. Otherwise, we're talking about using human readers via Amazon's Mechanical Turk or something.

Nothing that we little people could handle (:money:), but maybe in a few years somebody can think of a sneaky way to grab eyeballs with something like ReCaptcha, or Facebook will heave its vast and labyrinthine bulk in its direction.

@enkiv2 - For similar reasons new directors often work with low-budget horror movies. Witness Sam Raimi - he did Evil Dead not for any particular love of the genre, but for most-likely return on investment (time+money) (a source). The audience tends to eat it up no matter how low the quality. Witness the large numbers of self-published zombie books on Amazon. Or romance novels of any of the vast, arcane genotypes of romance.

So - generated horror fiction. SplatterGenPunk. Note to self -- add this to #14

@enkiv2
enkiv2 commented Nov 4, 2015

It's not as though there aren't people who will do that for free (see
crowdsound, darwintunes, basically every quote DB, and that one project
where a novelist is having randoms vote on his plot elements, along with
twitch plays anything). But, getting that audience isn't guaranteed and it
takes a while. If we didn't care about November, we could start a thing
like that and then just let people discover and play with it as they will.

On Wed, Nov 4, 2015 at 9:53 AM Michael Paulukonis notifications@github.com
wrote:

@hugovk https://github.com/hugovk - didn't we talk about this in the
GenerativeText list? The trouble is the fitness algorithm -- if you've got
one, well -- you've solved the problem. Otherwise, we're talking about
using human readers via Amazon's Mechanical Turk or something.

Nothing that we little people could handle (:money:), but maybe in a few
years somebody can think of a sneaky way to grab eyeballs with something
like ReCaptcha, or Facebook will heave its vast and labyrinthine bulk in
its direction.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@ikarth
ikarth commented Nov 4, 2015

Alice in Wonderland and Zombies?

@MichaelPaulukonis

@ikarth - I first got this idea while brain-storming ways of getting texts from an extended Swallows of Summer engine. More people, more locations, flock/avoidance algorithms, object transference (in this case, the plague vector, whatever it is), focused event-replay as pseudo-plot, etc.

(Lightly Edited) Extracts from some emails:

And what would happen to Swallows with sufficient complexity to allow for emergent behavior?

A larger environment, more peoplem, and more behaviors for the people, including flocking, avoidance, eat sleep, etc.

We could easily go to a zombie scenario -- some sort of infectious behavior, infected flock, uninfected flock, uninfected avoid, etc etc etc. But need for food increases probability of movement, so...

I'm curious as to what flocking etc. could do in a narrative situation -- BORING, probably. But with other behaviors, and flocking not set to a herd-level, clusters of people, cliques, mean-girls, etc etc. competitiveness, objects, things.

"zombie plague" was just an example of transferrence. In "Tale-Spin" "Gravity" was an invisible character who pulled people down (until Gravity ended up drowning in a number of stories) or pushed people down. "Infection" would be given, without being lost as an item, from one character to another. Could be a cold, flu virus, ear-worm pop-song, or whatever. Tribbles.

If the system just played everything out [generating events, not text], and then AT THE END - where end is defined as all people have been infected - and took the events and narrated them from the standpoint of the last uninfected character, you'd have a ... I dunno, not a tragedy or thriller, but it would appear to have a plot.

And that's from basic rules of transfer/flock/avoid - no added narrative rules to direct plot.

That intrigues me.

@enkiv2
enkiv2 commented Nov 4, 2015

Simulation is definitely a way to produce 'plot' in the sense of events
that follow each other with internal consistency, causality, and logic, but
it doesn't in any way filter for what would be interesting to a reader.
Most things that happen in reality (or any procedural simulation of a
subset of reality) is not very interesting from the perspective of a reader
of fiction, even things that are potentially very interesting to watch
(it's one thing to watch a kung-fu movie, but reading a pokemon-style
transcript of the same film's fight scenes would be incredibly boring).

My position is that an interesting style is necessary to make a simulation
readable, and that furthermore, accurate simulation does not necessarily
add to readability when an interesting style is already involved. (Lots of
very interesting books have very dull plots, and lots of very dull books
have very interesting plots; wonderful books are not necessarily internally
consistent, and some famous books have a connection to any kind of physical
reality that is tenuous at best and instead swim quite deeply in wordplay
and aggressive subjectivity.) If you need style either way, a focus on
style makes sense.

On Wed, Nov 4, 2015 at 10:14 AM Michael Paulukonis notifications@github.com
wrote:

@ikarth https://github.com/ikarth - I first got this idea while
brain-storming ways of getting texts from an extended Swallows of Summer
engine. More people, more locations, flock/avoidance algorithms, object
transference (in this case, the plague vector, whatever it is), focused
event-replay as pseudo-plot, etc.

(Lightly Edited) Extracts from some emails:

And what would happen to Swallows with sufficient complexity to allow
for emergent behavior?

A larger environment, more peoplem, and more behaviors for the people,
including flocking, avoidance, eat sleep, etc.

We could easily go to a zombie scenario -- some sort of infectious
behavior, infected flock, uninfected flock, uninfected avoid, etc etc etc.
But need for food increases probability of movement, so...

I'm curious as to what flocking etc. could do in a narrative situation --
BORING, probably. But with other behaviors, and flocking not set to a
herd-level, clusters of people, cliques, mean-girls, etc etc.
competitiveness, objects, things.

"zombie plague" was just an example of transferrence. In "Tale-Spin"
"Gravity" was an invisible character who pulled people down (until Gravity
ended up drowning in a number of stories) or pushed people down.
"Infection" would be given, without being lost as an item, from one
character to another. Could be a cold, flu virus, ear-worm pop-song, or
whatever. Tribbles.

If the system just played everything out [generating events, not text],
and then AT THE END - where end is defined as all people have been infected

  • and took the events and narrated them from the standpoint of the last
    uninfected character, you'd have a ... I dunno, not a tragedy or thriller,
    but it would appear to have a plot.

And that's from basic rules of transfer/flock/avoid - no added narrative
rules to direct plot.

That intrigues me.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@cpressey
cpressey commented Nov 4, 2015

I had forgotten about Aggressive Passive. I tried reading it again, and managed to read the first 2 chapters before it got distinctly samey. I could've forced myself to go on, maybe, but found the urge to skim pretty strong so I consciously decided to stop there. wc says the first 2 chapters comprise 1333 words.

So I guess, for this project, I'm aiming for around 1500 readable words, at minimum, although if I could get 2000 or more, that would make me very happy.

I'll probably generate two versions at the end: a short but readable version, and a 50K-word "NaNoGenMo version" in which, as I've mentioned before, the plot will probably begin to suffer partway through.

@cpressey
cpressey commented Nov 4, 2015

On the subject of simulation: it's a fertile area, but it's a bit complicated how it interacts with narrative. There's this tension between the "hierarchical" plot structure and the "linear" action-reaction stream... No doubt a writer thinks about how a character will react to something happening, and how other characters will react to that reaction, etc. But a writer also thinks about how they want things to work out in the story, and they will often move characters towards a certain conclusion, to advance the plot. When they do it well, the reader hardly notices.

Which leads me to Goal 2...

Oh, darn, here I am posting frequently when I was just saying I wouldn't be posting frequently. Sorry. I'll just shut up now and try to get this trainwreck-generator really working.

@MichaelPaulukonis

Since your rules are thus invalidated, please post code and works-in-progress.

@MichaelPaulukonis

"interesting point-of-view" was one of the reasons I floated the "records all actions in the system, and replay from perspective of last survivor".

However, additional points-of-interest could be determined algorithmically -- situation where person is hiding and herd approaches, only to veer off when distracted by another character.

Will this be as good as a human? OF COURSE NOT. I'm just floating some ideas to extend it beyond "people reading for sex and/or violence".

@dariusk
Owner
dariusk commented Nov 4, 2015

I would like to remind everyone that there is no requirement to post code until the very end of the month and there is no requirement at all to post works-in-progress.

@MichaelPaulukonis if that request was meant to be kind encouragement, I request you try and be kinder in the future.

@cpressey
cpressey commented Nov 5, 2015

@dariusk No worries, I assume it was deadpan humour.

@MichaelPaulukonis My response can only be this.

@cpressey
cpressey commented Nov 6, 2015

I might not be releasing code or previews YET but I'm happy to talk about what I'm doing, time permitting. In fact I'd really like to talk about Goal 2, since that is The Interesting Goal here. But first perhaps I should put Goal 3 to rest.

As I expected, there is certainly some similarity between this thing and The Swallows. I didn't use an external corpus for either of them, so they're both "written in my hand" so to speak. The event model is also not dissimilar, and they're both narrated in 3rd-person past tense...

But significantly, where The Swallows had basically only Brownian motion to work with, this writes around an actual plot. Conflicts get resolved and the story has an end and everything. There are other differences — the diction is sometimes better (or at least different) and the setting is wildly different and the internal architecture is a lot more intentionally-designed and compiler-like — but the plot thing should be enough.

So, yeah, I think I'm happy with Goal 3 at this point.

@cpressey
cpressey commented Nov 6, 2015

@MichaelPaulukonis I do like the idea of generating more (much more) than you need and discarding most of it. Back in January I considered simulating a whole city, with each individual going about their daily life, going to work, planning crimes, etc., and then taking some kind of cross-section of that. Somehow.

The problem* is that identifying interesting situations is probably as hard or harder than fabricating them.

*not an actual problem, because this is NaNoGenMo and trying different things is what we do

@hugovk
Collaborator
hugovk commented Nov 6, 2015

If you could somehow evaluate each person's day in the city, maybe find a baseline, average, boring day, then pick the most exciting one.

@rngwrldngnr

It might simplify the problem to stack the deck and give everyone in the city hidden goals and dark secrets. Something along the lines of Neil Gaiman's City of Spies. The interest curve would probably be flatter, but instead of explicitly looking for the most interesting person, you could do queries about what kind of story you want, like: "Two people who never meet, but indirectly destroy each others lives", and let the huge number of people who could potentially be living lives that fit that plan work for you.

It could even be a pilot program to a more general, realistic city, since you would essentially only need to write one kind of person, to start.

@enkiv2
enkiv2 commented Nov 6, 2015

Exciting might be more difficult to estimate and less useful as a rubric
for interestingness than unusual. If you simulate the whole city, you will
get some overlap between behaviors, and so you can essentially eliminate
from consideration any sequence of actions that is identical to another
character's sequence and any sequence of actions that's identical to a
previously described sequence.

On Fri, Nov 6, 2015 at 10:46 AM rngwrldngnr notifications@github.com
wrote:

It might simplify the problem to stack the deck and give everyone in the
city hidden goals and dark secrets. Something along the lines of Neil
Gaiman's City of Spies. The interest curve would probably be flatter, but
instead of explicitly looking for the most interesting person, you could do
queries about what kind of story you want, like: "Two people who never
meet, but indirectly destroy each others lives", and let the huge number of
people who could potentially be living lives that fit that plan work for
you.

It could even be a pilot program to a more general, realistic city, since
you would essentially only need to write one kind of person, to start.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@MichaelPaulukonis

The problem* is that identifying interesting situations is probably as hard or harder than fabricating them.

Well, if there is a central crime, that is a thread to follow.

Then, everybody else who is within n proximity of some part of the crime is another thread.

Different ways of combine -- first part of the story is the crime thread, beginning to end. Subsequent portions are those within proximity. Since people have (hopefully!) read the first part, the subsequent sections are notable for their similarities to the crime, etc. Even though the crime itself would never be mentioned (I'm specifically excluding people that are directly impacted by the crime/planning/etc. So - proximity := > 1 < n

@enkiv2
enkiv2 commented Nov 6, 2015

I dunno. Crime isn't always interesting, and in order to distinguish
between criminal and noncriminal behavior you need to implement laws in the
city and produce complex incentive structures surrounding them that make
characters mostly follow the law, break the law occasionally, and break the
law in varied and inconsistent ways. In other words, you'd need to program
your simulation in order to ensure that criminal activity is narratively
interesting.

Meanwhile, if you use unusual behaivor (i.e., behavior rare in the
statistical sample of the whole city) then you will pick up stories about
bugs in your simulation, along with stories wherein sequences of highly
unusual things happen. Strange coincidence stories will be generated, along
with stories about characters forced by circumstance into unusual sets of
actions. (And, if someone inside your simulation invents a simulation,
you'll get the script of World on a Wire in your simula-3 prototype ;-)

On Fri, Nov 6, 2015 at 12:53 PM Michael Paulukonis notifications@github.com
wrote:

The problem* is that identifying interesting situations is probably as
hard or harder than fabricating them.

Well, if there is a central crime, that is a thread to follow.

Then, everybody else who is within n proximity of some part of the crime
is another thread.

Different ways of combine -- first part of the story is the crime thread,
beginning to end. Subsequent portions are those within proximity. Since
people have (hopefully!) read the first part, the subsequent sections are
notable for their similarities to the crime, etc. Even though the crime
itself would never be mentioned (I'm specifically excluding people that are
directly impacted by the crime/planning/etc. So - proximity := > 1 < n


Reply to this email directly or view it on GitHub
#11 (comment)
.

@cpressey
cpressey commented Nov 6, 2015

These are all interesting angles to try. I literally just thought of another one. There's a nursery rhyme that goes:

Half a pound of tuppenny rice,
Half a pound of treacle,
That’s the way the money goes,
Pop goes the weasel!
Up and down the city road,
In and out the Eagle,
That’s the way the money goes,
Pop goes the weasel!

In North America is this is of course known as "Pop goes the weasel" as has mostly-different lyrics, but in the UK it's known as "Half a pound of tuppenny rice".

One interpretation of this is that it's recounting the journey of the two-penny piece (or, more generally, cash) as it changes hands between people in the town. Someone spends it on rice at the store, then the storekeeper spends it at the pub (the Eagle), and so on.

You could of course apply this to this city-simulation. The individuals carry money, and they spend it (or drop it or have it stolen)... and you write the story by choosing a coin and following the path it takes. (In practice you'd probably try many coins and pick the one that has the most interesting story. So it doesn't remove that hurdle of recognizing an interesting story, but there's a possibility it would simpler in the case of a coin... merely counting the number of times it changes hands would be a good start.)

I think this is a fun idea, and if anyone wants to try this, please go ahead and do so!

@tra38
tra38 commented Nov 6, 2015

Creating a simulation of an entire city just to trace the history of a coin seems absurd. This sounds like a task where a state machine would be better suited. Bob randomly picks from a list where to spend his money, picks "buy rice from the shopkeeper", and then the shopkeeper then consults his list to find out where to spend the money and decides to spend the dollar at the pub. Then the person who owns the pub checks his list, and decides to spend it on Bob's rice-cakes. No need for simulation, just picking randomly from a shopping list (and maybe deleting from the random list when the character spends his money: Bob doesn't need to buy rice again).

EDIT: Consider also the possibility that a simulation is useful only for generating the data that can then be transformed into text. It can be possible to just generate the data outright, without needing a simulation to slowly make the data.

@TheCommieDuck

The issue I'm currently running into wrt simulation: it's a scarily deep rabbit hole. People should desire things like food. People should be, then, able to know whether they have food. If not, they should be able to find or buy food. Then they need to be able to find, and purchase. They need to know how to sell, and what they're allowed to take or not take. They need to know whether something is available for them to buy. Where does it stop?

Then you also need to make the people who sell, where they sell, where the items are, what things are worth, etc. I've just spent about 3 days doing things entirely unrelated (it seems) to having a story.

Also I've never heard that nursery rhyme called 'half a pound of tuppenny rice'. Odd.

@ikarth
ikarth commented Nov 6, 2015

There are, I think, at least two separate challenges in simulating a novel, which we've kind of been dancing around in this discussion. The first is that making the simulation itself challenging is tricky byitself. In theory, emergent complexity should make a sufficiently broad simulation deep because of the combinatorial complexity, but that requires quite a lot of content and it can be hard to judge which content actually contributes. Second, the simulation gets you a plot, and even possibly some stage business and details but that's mostly confined to the fabula. The syuzhet and rendition of the text are another matter.

It occurs to me that it might be easiest to present such a simulation in a relatively avant guarde or hypertextual approach, like the procedurally generated newspapers of small towns by katierosepipkin.

The more textual approaches such as markov chains, word2vec and the like are, in contrast, heavy on the style but have no conception of plot. And at this point I am not sure which is harder for a machine. I do think the most interesting approaches, to me at least, combine them.

@ikarth
ikarth commented Nov 6, 2015

@TheCommieDuck I suspect that the art of designing a narrative simulation is to work out what you can leave out. Your simulated people will be unlikely to be complete psychological models, but perhaps there are bits of Maslow's hierarchy that can be elided for the particular effect you are attempting in one particular work.

That said, I don't know where that line is. Dwarf Fortress does the kitchen sink approach, which at least dazzles with its specificity. Though it also uses stereotypes of its fantasy races (or newly invented ones) to simplify some aspects and turn their simplicity into virtues - all dwarves drink, all elves would rather commit cannibalism rather than harm a tree, etc. It strikes me that choosing the specific aspects to simulate or ignore can itself be a powerful rhetorical statement.

@enkiv2
enkiv2 commented Nov 6, 2015

The extreme kitchen-sink kind of simulation, combined with absurd
attributes, can be more entertaining than more realistic descriptions. I
would rather read a dry description of a day in the life of a Dwarf
Fortress dwarf than a day in the life of a character in The Sims, because
The Sims is too realistic and reality is boring. In this sense, simulation
bugs are literally features.

On Fri, Nov 6, 2015 at 2:10 PM Isaac Karth notifications@github.com wrote:

@TheCommieDuck https://github.com/TheCommieDuck I suspect that the art
of designing a narrative simulation is to work out what you can leave out.
Your simulated people will be unlikely to be complete psychological models,
but perhaps there are bits of Maslow's hierarchy that can be elided for the
particular effect you are attempting in one particular work.

That said, I don't know where that line is. Dwarf Fortress does the
kitchen sink approach, which at least dazzles with its specificity. Though
it also uses stereotypes of its fantasy races (or newly invented ones) to
simplify some aspects and turn their simplicity into virtues - all dwarves
drink, all elves would rather commit cannibalism rather than harm a tree,
etc. It strikes me that choosing the specific aspects to simulate or ignore
can itself be a powerful rhetorical statement.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@tra38
tra38 commented Nov 7, 2015

@cpressey So I went ahead and actually implemented a program that will show the history of a McGuffin as it is being passed from person to person. Note that this program is not a simulation, but the output would be virtually indistinguishable from a simulation.

Source Code: https://gist.github.com/tra38/0064c227a298d7f51c11
Story: https://gist.github.com/tra38/f96a02f65f816a61ebfc

Not going to be an entry of NaNoGenMo because I don't really like the code. I wrote it though to make a point that simulations may seem neat but that there are other ways of accomplishing the same task.

@ikarth
ikarth commented Nov 7, 2015

I sort of disagree, and sort of agree. I think it (perhaps appropriately for NaNoGenMo) comes down to the Tale-Spin effect - simulations that don't properly explain enough of what is going on under the hood create the illusion that the simulation is simpler than it actually is.

@MichaelPaulukonis

Once you have the simulation, then you can tweak the language-generator to explain what is going on better. That's the impression I got with the epithet Tale-Spin effect in Expressive Processing (which I'm still reading). Waldrip-Fruin seemed somewhat annoyed that there was so much richness to the simulation and "just one day" given to the text-generator side of the application, and thus the app (and, by extension, computational narratology as a whole) given short shrift by many.

@ikarth
ikarth commented Nov 7, 2015

That matches my impression. With the addendum from my (artistic application) perspective that it can also be aided by choosing a design where the surface matches the underlying process.

I was inspired by his discussion of the SimCity effect (among other things) to try to formulate a better definition of the relationship between agency and erdoic cybertext. Slightly off topic here, but I think the general principle applies: one of the things we seek when reading a novel is to understand the patterns the author placed there, either consciously or unconsciously, which can only happen if enough of the hidden pattern of the idea can be detected in the material presentation.

@greg-kennedy

I'd like to shamelessly plug my "real" attempt for the year, which does exactly what was talked about midway up this thread: generate an insanely large amount of events, then pass the whole log to an editor that selects interesting portions to form chapters. Certain events may be considered to have a higher "gravity" than others - e.g. looking at something is less interesting than converting a Character to a Dead Body using the Revolver - and I'd have to manually assign such weights. The editor simply needs to follow an item, character or place for the duration of a chapter in which some interesting event occurs.

I also want my characters to have overarching aspirations and sub-goals, which help direct them to accomplish things. The resulting "story" will probably be constructed so those aspirations end in a compelling narrative.

In theory, anyway. I may get 500 words in and then pad the rest out with "meow".

@cpressey
cpressey commented Nov 9, 2015

Update: I was lucky to get a fair chunk of time to spend on my generator over the weekend.

For Goal 1, it can generate a semi-coherent story of 800 words. My milestone this week is to double that number.

For Goal 2, I implemented 2 writers' techniques, one of which was a little trickier (or at least, more tedious) than I thought it would be, and the other -- the one I really wanted to implement, since like months ago -- was a little easier than I thought it would be. My milestone for this week is to either add a third, or do the one I really wanted to do in a more sophisticated way.

I'd write more about it, but apparently what people want to talk about on this issue is simulations (rather than writers' techniques or compiler architecture, as the title of the issue might reasonably suggest.) Fine, OK. I'm not using simulation techniques here, so I'll just keep quiet about my generator until I've made more progress.

@cpressey
cpressey commented Nov 9, 2015

@tra38

Creating a simulation of an entire city just to trace the history of a coin seems absurd.

Indeed, I basically stand by what I said here: identifying interesting situations (inside a simulation) is probably as hard or harder than fabricating them (instead of simulating them). That's one reason I didn't actually follow up on my idea back in January to simulate a city. That, and, narrative methods are much more unexplored territory for me, and interesting for that reason.

Actually I think the snippet you posted would be a fine NaNoGenMo entry (with more situations and a few paragraph breaks, perhaps) -- does it really matter if the code is a bit distasteful? Well, maybe it does, depends on your goals I suppose. And I could understand if you don't want to spend any more time going down that avenue.

@rngwrldngnr

Regarding story compiler architecture, I think the first-draft-to-finished-story section you mentioned in one of your earlier posts could be very fruitful. I can think of several tasks that would be relatively simple to operate on a draft where text (or a near text intermediate representation) would be both input and output.

You could count word frequencies and replace a portion of anything way above the Zipf curve with synonyms. You could combine or split apart sentences to get more variety in length. You could manage pronoun ambiguity by making sure that a persons real name is given every few sentences and at least once per paragraph. You could switch passive voice to active voice (or the reverse, if that's what you want stylistically).

Most of these could probably be built straight into the earlier stages of the story compiler, and some of them might even be easier to implement there, but I think that would depend heavily on your implementation, where as operating on plain text makes the layers machine independent. It also allows you to switch the order of any of the layers and toggle them on and off as required.

I'm not sure, but I feel like the Draft Revision layers also give you the possibility to harness techniques that real writers use to get around some of the problems that often plague them and story generators. If you look at output from BRUTUS1 or AUTHOR, they look a lot closer to human-written stories than most of the story generators, but they still can feel off, and iirc a couple of contests suggested that Brutus in particular, even when mistaken for human, came off as low quality. I know from participating in peer review of stories that first drafts often give off this feel, and I wonder if by trying to internalize all of the ideas about how stories should look and feel, what good English sentences look like, and so on, we might be missing that it really is just easier to apply a lot of the changes on the 'final' text.

This approach could also be generalized to other stages in the story compiler. If you did create intermediate representations, there could, at least in theory, be a similar architectural layer at each stage that contains many minor revisions that get applied cumulatively to that intermediate representation before the transformative step that takes you to the next intermediate representation. I'm tempted to refer to these as story optimizations, as I think those are the only steps a compiler takes without switching representation, but I don't think that makes much sense as compiler terminology.

@cpressey
cpressey commented Nov 9, 2015

@rngwrldngnr What do you have in mind for a "near text intermediate representation"? My position is that parsing arbitrary English text is difficult and painful, and if it's non-arbitrary it might as well be a data structure.

I don't have any pipeline stages that take text as input right now, and I don't plan on having any this month, because that's a can of worms I'd like to avoid opening if possible.

The pipeline I do have right now works on what are basically trees with lists of properties on each node (until the very end, when it spits out text). And yes, most of the stages in the pipeline try to do one job and one job only, and there are many stages... at a guess, about a dozen right now. It would not to be wildly inaccurate to call the stages at the end story optimizations, since they do things like (attempt to) choose pronouns to use instead of nouns when the referent is apparent.

@greg-kennedy

I'd write more about it, but apparently what people want to talk about on this issue is simulations (rather than writers' techniques or compiler architecture, as the title of the issue might reasonably suggest.) Fine, OK.

Well SORRY for dragging our dirty simulator feet on your otherwise pristine issue :P

@cpressey

I apologise if my eye-rolling yesterday irritated anyone.

NaNoGenMo's supposed to be fun, and I'm having a lot of fun writing my generator, but I'm finding the discussions this year not much fun at all. It feels like there's a lot more talking than there is listening, and that it's really heavy on the jargon and reference-dropping, and it's tiring.

Consider how many participants are completely new to programming, or NLP, or generative art (or, or, or...) If it's tiring for me, I'm sure they're mostly not even bothering to read it.

If the only way to get away from it is to not discuss then I'll just not discuss, that's all.

My generator can make a semi-coherent story of about 1000 words in length now. Watch this space. If you care. Or don't, if you don't.

@ikarth
ikarth commented Nov 10, 2015

Fair enough. And I'm afraid I'm as guilty as anyone of leaning on the jargon. I'd certainly prefer to introduce new people to the fun, though after the third go-round I'm maybe slightly out of touch with the needs of a beginner.

1000 semi-coherent words isn't anything to sneeze at. What approaches are you using in your current generator? What does a story compiler look like?

@cpressey

What does a story compiler look like?

First, I should note: the reason I haven't released any code or samples is because I feel they would be kind of a spoiler. I think this will be most entertaining if I get it to a good point and release it all at once; everything after that might be a bit anti-climactic.

But knowing the basics of the underlying architecture is probably not much of a spoiler, and since it has solidified* now, I can probably write up something on it. Maybe as a blog post instead of a comment here, if it gets long. I'll see what I can do, hopefully in the next day or two.


*This is a lie. It would be more fair to say it's gotten risky to make any more significant changes to it. The last thing I want to do is introduce an obscure bug that will take a long time to track down. So I'll just manage with it as it is.

@MichaelPaulukonis

The last thing I want to do is introduce an obscure bug that will take a long time to track down.

wouldn't your unit-tests catch that pretty quickly? he asked, innocently.

@hugovk
Collaborator
hugovk commented Nov 11, 2015

The last thing I want to do is introduce an obscure bug that will take a long time to track down.

wouldn't your unit-tests catch that pretty quickly? he asked, innocently.

And it's easy to roll back from source control, naturally.

@greg-kennedy

The last thing I want to do is introduce an obscure bug that will take a long time to track down.

wouldn't your unit-tests catch that pretty quickly? he asked, innocently.

And it's easy to roll back from source control, naturally.

Especially since you left such helpful comments on everything you did.

@cpressey

The last thing I want to do is introduce an obscure bug that will take a long time to track down.

wouldn't your unit-tests catch that pretty quickly? he asked, innocently.

And it's easy to roll back from source control, naturally.

Especially since you left such helpful comments on everything you did.

And at the very least, the programming language will stop me from doing anything too stupid, right?

Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> True = False
>>> print True
False
>>> print False
False
>>> print (True == False)
True
>>> 

YAY SOFTWARE

@cpressey

What does a story compiler look like?

OK, I have written this thing. It's a bit long (maybe 10 or 15 minutes to read?), so I put it in a gist.

https://gist.github.com/cpressey/6324fff6ef0dfdf69b96

I don't know how well I've succeeded, but I've tried to write it for a general intermediate-programmer audience, not assuming any knowledge of compilers or any advanced programming knowledge. You can also skip over the first and last sections without really missing anything.

@ikarth
ikarth commented Nov 12, 2015

The story compiler approach seems amenable to incorporating some of what you might call low-level plotting techniques: writer's approaches to how things individual scenes get constructed.

Like the bit about structuring the story from conflict here, or Jim Butcher's Scenes and Sequels technique. Or, on a larger level, things like this note card technique to structure the novel's chapters.

@cpressey

The story compiler approach seems amenable to incorporating some of what you might call low-level plotting techniques: writer's approaches to how things individual scenes get constructed.

Yes! Which is exactly what I'm doing with Goal 2 -- trying to automate some of those techniques. Progress in that area has only been modest so far though.

Thanks for the links, it will be interesting to compare the methods they describe with what I've got so far.

@cpressey

The peals of the half-time gong echoing in the distance, the month trundles along into its third week!

Update: not so lucky with the time allocation this past weekend.

Goal 1: semi-coherent story length is about 1500 words, with caveats.

I also have a few ideas about how one could make the novel more readable (or rather, less unreadable) at 50K-word scale. They're gimmicky cheap ideas and I don't necessarily like them, but having stated Goal 1 the way I did, I guess I'm obligated to pursue them.

Needing to choose my battles, I deem Goal 2 completed. I implemented the one writers' technique that I really wanted to implement, even if I didn't end up applying it in a particularly good way.

@cpressey

OK well I could keep tweaking this and tweaking this and tweaking this and inching closer and closer to Goal 1 but honestly I think it has reached the point of diminishing returns and/or the minimum quality level I referred to earlier so - here it is!!!

A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer O'James during her First Three Years in the Space Fighters

Generator is here, has name in ALLCAPS in great tradition of names of story generators. Will mirror the code on GitHub in near future, time permitting. To reproduce this novel, run the generator with SEEDBANK_SEED=9889.

@enkiv2
enkiv2 commented Nov 20, 2015

This is honestly pretty good. Like, I'm having a hard time distinguishing
it from slightly-higher-than-average-quality fanfiction, reading through
the first two chapters.

On Fri, Nov 20, 2015 at 10:39 AM Chris Pressey notifications@github.com
wrote:

OK well I could keep tweaking this and tweaking this and tweaking this and
inching closer and closer to Goal 1 but honestly I think it has reached the
point of diminishing returns and/or the minimum quality level I referred to
earlier so - here it is!!!
A Time for Destiny: The Illustrious Career of Serenity Starlight Warhammer
O'James during her First Three Years in the Space Fighters
https://cdn.rawgit.com/cpressey/eafe2c9fad2884b812f3/raw/602c724016a901a5aafdde688cea805423ebfc03/A%2520Time%2520for%2520Destiny.html

Generator is here https://bitbucket.org/catseye/marysue/src, has name
in ALLCAPS in great tradition of names of story generators. Will mirror the
code on GitHub in near future, time permitting. To reproduce this novel,
run the generator with SEEDBANK_SEED=9889.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@greg-kennedy

I read three full chapters before skimming. After the first two I was preparing to skip forward, but then Nebulon showed up, and things got interesting again!

...but after that I wondered what other plot events might happen, so I started scrolling...

so I guess that's 2770 words read. Not bad.

@enkiv2
enkiv2 commented Nov 20, 2015

The space-opera setting tempts me to swap out some texture in characters.py
and costume.py and produce something resembling 'leather goddesses of
phobos', only more twee.

On Fri, Nov 20, 2015 at 11:16 AM Greg Kennedy notifications@github.com
wrote:

I read three full chapters before skimming. After the first two I was
preparing to skip forward, but then Nebulon showed up, and things got
interesting again!

...but after that I wondered what other plot events might happen, so I
started scrolling...

so I guess that's 2770 words read. Not bad.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@MichaelPaulukonis

I think I know who Serenity O'James is....

As usual, your code leaves me gnashing my teeth, wishing mine were as robust. And handsome. You have such robust, handsome code. I love to run my keyboard over it....

@cpressey

Oh! Well! Glad it's well-received (increasingly creepy vibe I'm getting from these responses notwithstanding...)

I think I know who Serenity O'James is....

Well, you're probably wrong, as the character is little more than a synthesis of a selection of these traits. I'll probably try to write something about that, and the implementation of writer's techniques, etc, over the weekend.

@greg-kennedy

Don't act like you've never heard of LGoP before...

@ikarth
ikarth commented Nov 20, 2015

I'm looking forward to the write-up.

The generator does a pretty good job with creating individual scenes that are at least nominally readable. Which is high praise in NaNoGenMo.

The rather inane resolution of most of the plots is a weakness, of course. Though I suppose it's in-genre for your source material. And I suspect handling that better would be a major project in itself.

@tra38
tra38 commented Nov 21, 2015

May I request that this generator is under some kind of license (even the Unlicense), in case other people wishes to use or expand on the MARY SUE program?

One thing that I did find interesting in your original writeup and in your source code is that the actual starting plot is pretty basic: characters get introduced, then the characters meet up and laugh. That's it. Literally everything else is just the generator adding additional sub-plots and complications to increase the word count. The system works of course, but it seems incredibly foreign to how a human writer would write plot (come up with a central core plot first and then flesh/pad it out). Maybe this has to do with fundamental differences between man and machine: humans are trying to convey some underlining message or meaning to other humans, while bots don't really care at all what they write.

@hugovk hugovk added the completed label Nov 21, 2015
@enkiv2
enkiv2 commented Nov 23, 2015

I'm not convinced that it's alien to the way human writers plot --
particular authors of this particular style of mary-sue-OC fanfiction that
the generator mimics. After all, a lot of OC/self-insert fic involves
inserting an OC into emotionally satisfying but more or less arbitrary
social situations with more extablished characters or settings, and one can
make the argument that this is exactly what this generator does (with the
exception that the generic parody-space-opera setting and generic
parody-space-opera characters aren't clearly attached to an existing
franchise -- but they certainly would fit in neatly with, say, TOS-era Star
Trek along with a whole host of golden age space operas of the Zapp
Brannigan variety).

It's not how a professional author plans out plots. But, it's definitely
how some starry-eyed amateurs do.

On Fri, Nov 20, 2015 at 7:55 PM Tariq Ali notifications@github.com wrote:

May I request that this generator is under some kind of license (even the
Unlicense), in case other people wishes to use or expand on the MARY SUE
program?

One thing that I did find interesting in your original writeup and in your
source code is that the actual starting plot is pretty basic: characters
get introduced, then the characters meet up and laugh. That's it. Literally
everything else is just the generator adding additional sub-plots and
complications to increase the word count. The system works of course, but
it seems incredibly foreign to how a human writer would write plot (come up
with a central core plot first and then flesh/pad it out). Maybe this has
to do with humans trying to convey some underlining meaning and message to
other humans, while bots don't really care at all what they write.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@cpressey

I'm looking forward to the write-up.

Had not even enough time over the weekend to do that. Might not have enough time to write up some single coherent thing. Might work in chunks. To wit, I can start with this:

What I most wanted to implement was a story generator that could do Chekov's Gun, i.e. foreshadowing an object which does not start out important in the story, but later on becomes important.

Implementing Checkov's Gun is actually really easy:

  • write the story
  • if there are any scenes that need a Gun, just write the Gun in at that part.
  • when finished, go back over the story, and collect a list of Guns that need to be foreshadowed
  • for each Gun, insert some description of it near the beginning of the story

This is basically what is described in this article that @ikarth shared earlier -- see the "XXXX ADD GUN EARLIER XXXX" part.

Not difficult, but notably also not something you can do with a simulation alone. Because you can't just output events as they happen. You have to go back and examine them & edit them selectively. So you have to keep the events around in some kind of data structure. This is what led to the compiler-like architecture.

In fact it is very clumsily applied in the novel, but I don't really care, the goal was to implement it. As a sort of bonus I also made it add a mention of the object at the end, as a sort of reminder (I don't know if there's a name for this.)

@cpressey

May I request that this generator is under some kind of license (even the Unlicense), in case other people wishes to use or expand on the MARY SUE program?

This will probably happen eventually (eventually meaning, sometime after November is over). Because I feel strongly about open-source and all that, and over the past umpteen years, I've put nearly everything I've done under a permissive license.

But honestly, I have to wonder if it even means anything anymore in a world where appropriation artists sell printouts of screenshots of strangers' web pages for tens of thousands of dollars.

@enkiv2
enkiv2 commented Nov 23, 2015

I would dare to suggest that the existence of appropriation artists depends
upon the existence of non-permissive licenses. There is nothing subversive
about 'appropriating' free content.

On Mon, Nov 23, 2015 at 1:24 PM Chris Pressey notifications@github.com
wrote:

May I request that this generator is under some kind of license (even the
Unlicense), in case other people wishes to use or expand on the MARY SUE
program?

This will probably happen eventually (eventually meaning, sometime after
November is over). Because I feel strongly about open-source and all that,
and over the past umpteen years, I've put nearly everything I've done under
a permissive license.

But honestly, I have to wonder if it even means anything anymore in a
world where appropriation artists sell printouts of screenshots of
strangers' web pages for tens of thousands of dollars.


Reply to this email directly or view it on GitHub
#11 (comment)
.

@cpressey

I would dare to suggest that the existence of appropriation artists depends upon the existence of non-permissive licenses. There is nothing subversive about 'appropriating' free content.

So... when Georg Baselitz, in The Painter's Equipment (1987), claims to have composed Fidelio (1814) when he was 6 years old - you would say there is nothing subversive about that? Interesting. I think that that is much more subversive than the conceptual parlour tricks that so many so-called artists produce these days.

Not that being subversive is the only way to get attention. Not that getting attention is the purpose of art. Unless that's all that's left of art now, I guess.

@cpressey

Update: code is on GitHub now and in the public domain now.

Have been too ill in the past week to write anything up further. Really, it mostly comes down to this:

The point of NaNoGenMo is to generate a novel. Even if you take that literally (as is my wont), there's nothing saying it has to be a good novel. So, what's a rich source of bad writing? Fanfiction. And what's the worst kind of writing in fanfiction? Reports vary, but it's hard to go wrong with a Mary Sue. This relieves the writer of many burdens (such as having scenes in which the protagonist does not appear, of the protagonist having character flaws, of the other characters being more than 2-dimensional, etc) and provides much opportunity for padding (the salvation of every writer with a word quota), in this case in the form of poorly thought-through similes and waxing poetic about costume. (In fact, even having a Chekov's Gun in a Mary Sue story is a bit out of place - it's too sophisticated - you'd almost expect things to be pulled out of the air without any foreshadowing instead. Well, this is how it ended up, anyway.)

I read three full chapters before skimming.

Oi! You exceeded the dosage guidelines! There was a warning label! I can't be held responsible...

Yeah, OK, I'm just pointing out, the warning label is the gimmicky cheap idea I referred to earlier. The idea being, you could probably slog through any NaNoGenMo novel, if you did it in small enough pieces and gave yourself enough time between pieces.

Also, there's a sense in which I should've spent the final week adding as many new subplot choices as I could, so maybe you'd see "4. Stranded in Space" after reading the 3rd chapter, and maybe you'd read even more words, and... yeah, I probably should've, in order to strictly pursue Goal 1. But I was so tired of reading the escapades of Serenity Starlight by the time I released this (I'm sure I've read 50,000 words of it myself), and being sick didn't exactly help matters either. I will be content at it being a proof-of-concept for this approach to that goal.

Lastly, I just want to emphasize that it's not just an anthology of short stories, too. (Because that wouldn't be a proper novel, would it?) There is a story arc... sort of.

Thoughts on plot... I might write about... at some future point... maybe.

@tra38
tra38 commented Dec 10, 2015

An anthology of short stories can also have a story arc too (see "The Martian Chronicles", which was a bunch of short stories that still had an overarching plot). If I understand correctly, there are two central plots in this novel:

  1. Serenity's careerist rise in rank, only hampered by Serenity's own refusal of promotions
  2. Serenity's love affair with Commander Joe

The problem is that unless you really paid attention to the novel itself, looking at how events change within each individual chapter, you would not really notice the story arcs in question. So while there is an arc, the reader may very well not notice it. (This problem does seem solvable though. If you had added more subplots though to get user attention, maybe the reader who have been invested long enough into the story to detect the story trend.)

EDIT: I also don't think the novel text itself is bad. I really get more of a theme of a "Saturday Morning Cartoon" show than that of an annoying Mary Sue flaunting her status. I notice Serenity never seems to ever get kidnapped herself though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment