# Markov Chain Text Generation

A brute force solution. Shannon proposed an interesting scheme to generate text according to a Markov model of order 1.

To construct [an order 1 model] for example:
1. Open a book at random and selects a letter at random on the page and record the letter. 
2. The book is then opened to another page and one reads until this letter is encountered. The succeeding letter is then recorded. 
3. Turning to another page this second letter is searched for and the succeeding letter recorded, etc. 

It would be interesting if further approximations could be constructed, but the labor involved becomes enormous at the next stage.

In [16]:
import numpy as np
from typing import Dict, List
from collections import defaultdict
import author

def read_text(path: str) -> str:    
    with open(path) as f:
        text = f.read().strip().lower().split()
    return text

def word_frequencies(text: str) -> Dict[str, List[str]]:
    '''Create a dictionary defining an order 1 model of language'''
    
    word_freqs = defaultdict(lambda: [])
    
    for first_w, second_w in list(zip(text, text[1:])):
        word_freqs[first_w].append(second_w)
    return word_freqs

def create_sentence(text: str, word_freqs: Dict[str, List[str]]) -> str:
    
    first_word = np.random.choice(text)
    
    chain = [first_word]

    while not '.' in chain[-1]:
        chain.append(np.random.choice(word_freqs[chain[-1]]))
        
    return ' '.join(chain)

def paragraph(text: str, sentence_num: int=3) -> str:
    
    word_freqs = word_frequencies(text)
    paragraph = []
    
    for _ in range(sentence_num):
        paragraph.append(create_sentence(text, word_freqs))

    return '\n\n'.join(paragraph)

In [None]:
text = read_text('data/dracula.txt')

print(paragraph(text, 10))

In [38]:
%autoreload 2

In [281]:
import author

shelly = author.Author()

shelly.read_text(path='data/pure_reason.txt')

# shelly.create_word_frequencies()

shelly.create_sentence()

'follow that conception, which does not so many different interests of construction, the simplicity is the object of things that is, on the history of modality being as much of thought.'

In [282]:
print(shelly.create_paragraph(sentence_num=3))

empirical laws which a series—its parts do not sooner suggest itself exist only as a synthetical propositions is also no means of all the world is, of virtue, who acknowledges the singular proposition.

example, a discipline can i have a doctrine of the conception of experience.

of the respective strength from each other spheres in general, to a free action.


In [1]:
a = [(1, 2), (2, 1), (3, 0)]

In [2]:
a.sort(key=lambda x: x[1])

In [5]:
def first(x):
    return x[0]

sorted(a, key=first)

[(1, 2), (2, 1), (3, 0)]

In [None]:
a = [1, 2, 3, 4, 5]


