New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Other and the Same (Algebraic properties of Markov chains) #9
Comments
OK, a premise: In NaNoGenMo 2019 I implemented a method that I described as "taking the intersection of a Markov chain and a finite automaton". Recently I wondered: can this be generalized? Can one take the intersection of two Markov chains? I believe the answer is yes. There are a few technical hitches that need to be accounted for, but roughly speaking, if A and B are Markov chains, a sentence which is likely in A ∩ B is one which is both likely in A and likely in B. But I don't know what the result looks like in practice. That's something I'd like to try to find out this month. Also, is there a corresponding notion of union of Markov chains? Again, I believe there is, and moreover I think it's nearly the same as building a Markov chain from multiple texts. That's also something I'd like to try to confirm this month. Finally, can one go further with these two operators, intersection and union? There seem to be snags. They're somewhat limited in their algebraic properties. Other choices of operators might have nicer properties, but also be further removed from the usual notions of probability theory. I'm unsure of how interesting exploring these alternatives might be. It might only be interesting from an algebraic point of view, and even then, not terribly so. If I have time I might try to fill in some of the details here. |
Now that I've had a chance to prototype this and play with it, I've realized that the aforementioned "technical hitch" has a significant influence on this idea, so I'll try to explain what it is. Going back to probability theory, we'd like to regard each transition in a Markov chain to be an outcome of an event -- the event of moving from one state to the next. The 2nd of Kolomogorov's 3 axioms of probability theory states that the probabilities of the set of outcomes of an event must sum to 1. This makes a lot of sense. But, it precludes having an event that has an empty set of outcomes. And in a Markov chain, sometimes you don't have any possible next state. If you build a Markov chain from a corpus and the last word in that corpus is "Manitoba" and "Manitoba" doesn't appear elsewhere in that corpus, then, well, you have a state labelled "Mantioba" that has no next state. That's just how it goes. Intersection exacerbates this, because if A and B are disjoint then A ∩ B = ∅. To give a more concrete example, if one Markov chain has
and the other has
then the intersection will be
That is to say, the state "been" still exists, but it has no next states. If we want to regard state transitions as events with probability distributions, we'll need to patch that somehow. I'm sure we can, but I want to understand the implications of it a little better before writing more about it. |
Would it work to treat "no outcome" as it's own state? Or have it handle the empty set in some way? It doesn't quite address the problem of the union of two chains, but it strikes me that there are some states that could be included as tokens but aren't in the text, such as "end of sequence". Though this is maybe getting too far away from the intersection of Markov chains to fit your idea. |
Seeing this made me think that the following would be interesting:
from the following possible chains in A:
B:
Don't know what it means, or whether it's a workable definition of |
Moving to a special "exception" state when there are no available outcomes is what I was thinking too. And then, from that "exception" state, you can move to any other state in the chain with equal probability. This would help in practice to keep the generation interesting. The trick would be in explaining this behaviour in a way where it's compatible with the underlying theory and not horribly contrived. I don't know that you can't still have something sensible if you loosen the 2nd axiom and say that the sum of a probability distribution must be either 1 or 0. But creating an entire alternate theory of probability just for NaNoGenMo seems a little drastic... So, we can say that the objects in a Markov chain aren't probability distributions per se, they're objects from which we can derive probability distributions, and which have operations that correspond closely with operations on probability distributions. From an empty such object we derive a probability distribution where the probability of moving to the "exception" state is 1. And as we walk a Markov chain, we're repeatedly deriving probability distributions from these objects, and picking from these distributions. This is maybe a little contrived, but it's not horribly contrived, so that's good. And in our definition of these objects, we might even be able to give them other properties, ones that are useful for our greater purpose. Well... maybe. |
Anyway, assuming we've satifactorily swept ∅ under the rug, the rest is straightforward. Probability theory tells us that
and these can be confirmed by going through a number of thought experiments involving urns and marbles. In our case, if And that's it really. I've written some code (which I'm not entirely happy with so may want to revise but it verily does the thing) and since "what does this look like in practice" was my first goal, I'll post some example output tomorrow or maybe later today. |
So OK, answering my first question, "what does this look like in practice?" I built a first-order Markov chain from this work and some output from walking that chain was:
And then I built another first-order Markov chain, from this work this time, and some output from walking it was:
And I'm sure you'll agree, these two writers wrote about on very diffferent subjects in very different styles. So I took the intersection of these two Markov chains, and generated some output from the resulting chain, to see what it would look like. It looked like this:
|
The answer to my 2nd question is "no". The underlying reason is that the transition matrix that we collect when processing a corpus, and the Markov chain we derive from that transition matrix, are two different objects with different algebraic properties. (Looking at implementations, this has been a little difficult for me to see, because most applications aren't concerned with the algebraic properties of these things, so there's no reason to make a distinction between them when writing code for them.) The transition matrix is a matter of magnitude (you can always find one more occurrence of For related reasons, I'm also not very hopeful that the 3rd question has a positive answer. Happy to unpack those statements with some examples on request, but this is possibly further afield than NaNoGenMo needs to go, because even if I had answers I have no idea how I'd use them in generating a novel. |
Going to try picking up the pieces of this, even if it is heavily on the mathematical side. You can just skip it if you're not too interested in the mathematics. So we have transition matrices (counts of what-state-comes-after-what-state) and we have Markov chains (probabilities of what-state-comes-after-what-state). And these have different algebraic properties. What is the relationship between these two things? Markov chains are quotients of transition matrices. Every transition matrix gives rise to only one Markov chain, but every Markov chain can be given rise to by an infinite number of transition matrices. Markov chains support a form of union and intersection (as already discussed). But these operators don't have some of the properties we expect from the analogous operators in set theory (e.g. absorbtion). Do transition matrices support union and intersection? After a fashion, yes. Given two matrices, we can create a new matrix where the count of How closely do these operators correspond to union and intersection in Markov chains? Well, if the count of In practice, what does taking "min" and "max" of transition matrices look like? (Or rather, what does output from walking a Markov chain based on such a transition matrix look like?) I'll try to explore that in a future update. Do union and intersection on transition matrices have nicer algebraic properties than on Markov chains? Yes, the absorption law holds, for example. Meaning, we can consider a lattice of transition matrices. In fact I think it might even be distributive. The prospect of getting a lattice structure around this is very interesting to me for some reason, even though I have no clear idea if it has any value from a generative perspective. Revisiting my 2nd question. We now know that, despite some similarities, building a Markov chain from multiple corpora is not the same as union of Markov chains. But is building a transition matrix from multiple corpora the same as max of transition matrices? Again, there are some similarities (different ones this time), but the answer is still no. Okay, is there any operation on transition matrices that corresponds to building from multiple corpora? Yes! That operation is addition. You're just adding more counts to the matrix, right? If addition is a reasonable operation on transition matrices, is subtraction too? Well, it's certainly not clear what kind of Markov chain a transition matrix with negative counts would give rise to. But if you could work out some sensible meaning for them, then maybe yes, and maybe with + and - you would have another way in which transition matrices could be regarded algebraically. |
I said I'd demonstrate what
Now here's an example of walking the Markov chain obtained from the
Like intersection, the output is restricted to transitions that only occur in both sources. But it's less harsh at the margins. Probably because the intersection of a 1% chance and a 1% chance is a 0.01% chance, but the One thing that has troubled me a little is that I had doubts that I think there is not much more I have to say about this. I need to publish the code for this and run it to 50,000 words sometime in the next week. |
Have you ever wondered if there is a common thread running through the great works of English literature? And have you ever tried to find this supposed thread by taking 10 lauded works of English literature, deriving Markov Chains from them, taking the intersection of those Markov Chains, and generating a new work of English literature by randomly walking the resulting Markov Chain? Well, I have. And the result is a novel called "The Other and the Same". Which is, apparently, a series of monologues, delivered by one or more unidentified Newfoundlanders, reminiscing on... various topics, with particular attention paid to... aspects of difference and equality. This novel was generated as part of what I eventually decided to call Chainscape, which consists primarily of a tool to derive transition matrices and/or Markov Chains from text files, apply operations to those objects, and walk the resuling Markov Chains to produce a new text. Specific instructions to generate The Other and the Same are also included in the Chainscape repository. |
"NaNoGenMo again, huh? You must have a keen interest in computational narratology!"
"No, I have a keen interest in events with absurd premises."
(edit Nov 30 2023)
Have you ever wondered if there is a common thread running through the great works of English literature?
And have you ever tried to find this supposed thread by taking 10 lauded works of English literature, deriving Markov Chains from them, taking the intersection of those Markov Chains, and generating a new work of English literature by randomly walking the resulting Markov Chain?
Well, I have. And the result is a novel called "The Other and the Same". Which is, apparently, a series of monologues, delivered by one or more unidentified Newfoundlanders, reminiscing on... various topics, with particular attention paid to... aspects of difference and equality.
This novel was generated as part of Chainscape, a project to explore the algebraic properties of Markov Chains during NaNoGenMo 2023. For information on it can be found in its repo and/or in this GitHub issue thread.
The text was updated successfully, but these errors were encountered: