# Markov Chain Text Generation

A brute force solution. Shannon proposed an interesting scheme to generate text according to a Markov model of order 1.

To construct [an order 1 model] for example:
1. Open a book at random and selects a letter at random on the page and record the letter. 
2. The book is then opened to another page and one reads until this letter is encountered. The succeeding letter is then recorded. 
3. Turning to another page this second letter is searched for and the succeeding letter recorded, etc. 

It would be interesting if further approximations could be constructed, but the labor involved becomes enormous at the next stage.

In [16]:
import numpy as np
from typing import Dict, List
from collections import defaultdict
import author

def read_text(path: str) -> str:    
    with open(path) as f:
        text = f.read().strip().lower().split()
    return text

def word_frequencies(text: str) -> Dict[str, List[str]]:
    '''Create a dictionary defining an order 1 model of language'''
    
    word_freqs = defaultdict(lambda: [])
    
    for first_w, second_w in list(zip(text, text[1:])):
        word_freqs[first_w].append(second_w)
    return word_freqs

def create_sentence(text: str, word_freqs: Dict[str, List[str]]) -> str:
    
    first_word = np.random.choice(text)
    
    chain = [first_word]

    while not '.' in chain[-1]:
        chain.append(np.random.choice(word_freqs[chain[-1]]))
        
    return ' '.join(chain)

def paragraph(text: str, sentence_num: int=3) -> str:
    
    word_freqs = word_frequencies(text)
    paragraph = []
    
    for _ in range(sentence_num):
        paragraph.append(create_sentence(text, word_freqs))

    return '\n\n'.join(paragraph)

In [None]:
text = read_text('data/dracula.txt')

print(paragraph(text, 10))

In [38]:
%autoreload 2

In [247]:
import author

shelly = author.Author()

shelly.read_text(new_text='I am going to the store. Find some beef for me.')

# shelly.create_word_frequencies()

shelly.create_sentence()

'going to the store.'

In [256]:
print(shelly.create_paragraph(sentence_num=3))

this data.

more of the functionality of dictionaries.

plotting data with a bar chart can be used to draw inferences about our data, and to make sense of the functionality of the plotly library.


In [249]:
shelly.add_text(new_text='''As you've seen in recent lessons, data science leans on data visualizations to draw inferences about our data, and to make sense of the math we use in making sense of this data. We saw how plotting data with a bar chart can be used to show the relationship between x and y variables.

In this lesson, we'll explore more of the functionality of the Plotly library. As we do so, pay careful attention to the data type that our methods require: whether they are dictionaries or lists, or lists of dictionaries. Ok, let's go!''')

In [254]:
shelly.word_frequencies

defaultdict(<function author.Author.__init__.<locals>.<lambda>()>,
            {'i': ['am'],
             'am': ['going'],
             'going': ['to'],
             'to': ['the', 'draw', 'make', 'show', 'the'],
             'the': ['store.',
              'math',
              'relationship',
              'functionality',
              'plotly',
              'data'],
             'store.': ['find'],
             'find': ['some'],
             'some': ['beef'],
             'beef': ['for'],
             'for': ['me.'],
             'as': ["you've", 'we'],
             "you've": ['seen'],
             'seen': ['in'],
             'in': ['recent', 'making', 'this'],
             'recent': ['lessons,'],
             'lessons,': ['data'],
             'data': ['science', 'visualizations', 'with', 'type'],
             'science': ['leans'],
             'leans': ['on'],
             'on': ['data'],
             'visualizations': ['to'],
             'draw': ['inferences'],
             'i