# A Socratic Dialogue Generator

This generator analyzes speech from a Platonic dialogue in TEI XML, and generates that speech using Markov chains. Here's a demonstration, from _Phaedrus_. 

In [99]:
Dialogue('phaedrus.xml', 'Socrates', 'Phaedrus').generate(4)


**Socrates**: 

 But undertakes to try to escape detection than to the young yet, that in a musician would be more powerful nature? Do, reproaches him of those say at whatever point and beautiful has knowledge of my foot.


**Phaedrus**: 

 Yes, unless you say who pretend to walk on you really had stumbled upon some similar inventions of which he has been said?


**Socrates**: 

 Oh, before we not happen even the light, but I know very well to the mind of painting; but to bear fruit, but to which is in the ancients; and all the passion of division is a superhuman wonder as soon as our friend, and if you purposely exposed me.


**Phaedrus**: 

 So as you easily make up at Epicrates' house, Socrates, and my walk along the Well.

In [95]:
# Library for parsing XML
from lxml import etree

# We'll mostly use NLTK for tokenizing. 
import nltk

# Randomly choose things. 
from random import choice as pick

# Display things nicely. 
from IPython.display import display, Markdown

In [96]:
class Character(): 
    """
    This class analyzes and generates character-level speech. 
    """
    def __init__(self, tree, name):
        """
        Gets a character's speech from the TEI XML, and breaks it up
        by utterance, sentence, and word.
        """
        self.name = name
        # Get dialogue by speaker from the TEI. 
        self.xpath = ".//sp[speaker='%s']/p" % name
        self.element = tree.findall(self.xpath) 
        if len(self.element) == 0: 
            # Something's wrong. Let's try the other format. 
            self.xpath = ".//said[@who='#%s']" % name
            self.element = tree.findall(self.xpath)
        if len(self.element) == 0: 
            raise Exception("Can't find any dialog!")
        self.lines = [line.text for line in self.element]
        self.lineWords = [nltk.word_tokenize(line) for line in self.lines]
        self.lineLens = [len(line) for line in self.lineWords]
        self.text = '\n'.join(self.lines)
        self.sents = nltk.sent_tokenize(self.text)
        # This seems weird, but it's the required format for the readability module
        self.sentWords = [nltk.word_tokenize(sent) for sent in self.sents]
        self.words = [w for w in nltk.word_tokenize(self.text)]
        self.wordsLower = [w.lower() for w in nltk.word_tokenize(self.text)]
        self.uniquewords = list(set(self.words)) 
        self.firstWords = [s[0] for s in self.sentWords]
        self.makeProbs()
        
    def makeProbs(self): 
        """ 
        Makes a list of words and the words that follow those words.
        Some words are actually punctuation marks. 
        """
        table = {}
        for word in self.uniquewords:
            lword = word.lower()
            idxs = [i for i, val in enumerate(self.wordsLower) if val==lword]
            for idx in idxs: 
                # Make sure we don't fall off the edge of the list. 
                if idx+1 < len(self.wordsLower): 
                    nextWord = self.words[idx+1]
                    if lword not in table: 
                        table[word.lower()] = [nextWord]
                    else: 
                        table[lword].append(nextWord)
        self.probs = table
        
    def chain(self, n): 
        """
        Chains together words according to the "probs" dictionary.
        """
        chain = [] 
        # Pick first word
        word = pick(self.firstWords)
        chain.append(word)
        # Now get a bunch of subsequent words.
        for i in range(n): 
            nextWord = pick(self.probs[word.lower()])
            chain.append(nextWord)
            word = nextWord
        # Keep going until the end of the sentence. 
        while chain[-1] not in ['?', '.', '!']: 
            nextWord = pick(self.probs[word.lower()])
            chain.append(nextWord)
            word = nextWord
        chain = self.untokenize(chain)
        display(Markdown(chain))

    def untokenize(self, chain): 
        """
        Stitches sentences back together. 
        """
        out = ""
        for word in chain: 
            # Handle words that aren't totally alphabetical
            if word in ["(", ")"]: 
                # Just skip parentheses, since they hardly end up
                # closing. 
                continue
            if word.isalpha() or word[0] == "“": 
                out = out + ' ' + word
            else: 
                out = out + word
        return out

In [97]:
class Dialogue():
    """
    This class analyzes and generates dialogue-level speech. 
    It chooses an amount of text that is appropriate for the character,
    given the amount 
    
    """
    def __init__(self, filename, char1, char2): 
        tree = etree.parse(filename)
        self.c1 = Character(tree, char1)
        self.c2 = Character(tree, char2) 

    def generate(self, n):
        for i in range(n//2):
            for char in [self.c1, self.c2]: 
                self.makeDialogue(n, char)

    def makeDialogue(self, n, c): 
        display(Markdown('\n**' + c.name + "**: "))
        lineLen = pick(c.lineLens)
        c.chain(lineLen)

In [99]:
Dialogue('phaedrus.xml', 'Socrates', 'Phaedrus').generate(4)


**Socrates**: 

 But undertakes to try to escape detection than to the young yet, that in a musician would be more powerful nature? Do, reproaches him of those say at whatever point and beautiful has knowledge of my foot.


**Phaedrus**: 

 Yes, unless you say who pretend to walk on you really had stumbled upon some similar inventions of which he has been said?


**Socrates**: 

 Oh, before we not happen even the light, but I know very well to the mind of painting; but to bear fruit, but to which is in the ancients; and all the passion of division is a superhuman wonder as soon as our friend, and if you purposely exposed me.


**Phaedrus**: 

 So as you easily make up at Epicrates' house, Socrates, and my walk along the Well.

In [98]:
Dialogue('timaeus.xml', 'Socrates', 'Timaeus').generate(4)


**Socrates**: 

 For your prelude; in gymnastic, is of words our proposals? And live together among themselves stern in requital for all the class which is competent for this in case anyone from those whose duty it seems, they were to see added? And Hermocrates, we said, seeing that we said, my dear Timaeus!


**Timaeus**: 

 That they were a change of fire or rather than a division within soul and ministers to experience with soul be taken that in that it is the Recipient to declare that One single sensation; but it uselessly, or the truth, for this. For whom the unintelligent, by his own structure the Other is termed “the sacreddisease.” boils up the Errant Cause of its own proper position. and the substance from the present life by us to attain becoming a result of corn fall in its own experiences which indicates that, and some Cause them food, so that were sufficient for out of the matrix or in the whole of pain.


**Socrates**: 

 And children and grandparents, and sisters, both the very feeling is a most willingly, so here, and the polity I in gymnastic, indeed, and may we said that they would I, and good were willing, therefore, and do you most cordially accepted your prelude; not begin by this?


**Timaeus**: 

 And as to necessary, being produced. For the bright and the vision, but now at one firestream is thus we previously described. The quicker but formerly the whole kind of the most fair and keen and of their nature allows, since it onlyhalf-solid is thus allowing their principles. Therefore let down into flesh also was porous body; and which passes the same and to judge are most part of another the inner fire and a disease of the mortal things that which is that it is to try to the Same and boundary for, become, and with hair, those who works very firmly based on the term we should allow any other Kinds should ever uniformly existent; for these maladies previously drew since no envy He had been constructed as follows.