# Markov Chains

---
In class today we will be implementing a Markov chain to process sentences

---
## Learning Objectives

1. Students will be able to explain the Markov Chain process
1. Implement a Markov Chain


---
## Background

Markov Chains represent a series of events following the Markov Property: future states are memory-less in that they depend only on the current state. This can be expanded to the idea of variable order Markov models where there is a variable-length memory (eg. 1st order Markov Model). Markov models consist of fully observable states. 

> A common example of this is in predicting the weather: We can clearly see the current weather and would like to predict tomorrow's weather. This is also applicable to biology with one case being CpG islands. 

Our goal today will be to implement a Markov model built from words. For our example text, we will use the classic example of Dr. Seuss because of the repetitive nature of the text.

---
## Train Markov model

For our initial implementation of the Markov Model, we will use the simple example of Dr. Seuss: "One fish two fish red fish blue fish."



In [12]:
def build_markov_model(markov_model, new_text):
    '''
    Function to build or add to a 1st order Markov model given a string of text
    We will store the markov model as a dictionary of dictionaries
    The key in the outer dictionary represents the current state
    and the inner dictionary represents the next state with their contents containing
    the transition probabilities.
    Note: This would be easier to read if we were to build a class representation
           of the model rather than a dictionary of dictionaries, but for simplicitiy
           our implementation will just use this structure.
    
    Args: 
        markov_model (dict of dicts): a dictionary of word:(next_word:frequency pairs)
        new_text (str): a string to build or add to the moarkov_model

    Returns:
        markov_model (dict of dicts): an updated markov_model
        
    Pseudocode:
        Add artificial states for start and end
        For each word in text:
            Increment markov_model[word][next_word]
        
    '''
    def add_pair(first_pair,second_pair):
        if(first_pair in markov_model):
            markov_model[first_pair][second_pair] = 1
        else:
            markov_model[first_pair]={second_pair: 1}

    text = new_text.split(" ")
    for count in range(0,len(text)):
        if(count == 0):
            add_pair('*S*',text[count])
            add_pair(text[count],text[count+1])
        elif(count == len(text)-1):
            add_pair(text[count],'*E*')
        else:
            add_pair(text[count],text[count+1])
    return markov_model


In [13]:
markov_model = dict()
text = "one fish two fish red fish blue fish"
markov_model = build_markov_model(markov_model, text)
print (markov_model)

{'*S*': {'one': 1}, 'one': {'fish': 1}, 'fish': {'two': 1, 'red': 1, 'blue': 1, '*E*': 1}, 'two': {'fish': 1}, 'red': {'fish': 1}, 'blue': {'fish': 1}}


###  Nth order Markov chain
In the above model, each event or word is output from only the previous state with no memory of any prior states. While this is useful in some cases, typical biological applications of Markov chains require higher-order models to accurately capture what we know about a system. For instance, in attempting to identify coding regions of a genome, we know that open reading frames (ORFs) contain codon triplets, and so a third or sixth order Markov chain would better describe these regions. Here you will implement a generalized form of our previous Markov Chain to allow for Nth order chains.


In [31]:
def build_markov_model(markov_model, text, order=1):
    '''
    Function to build or add to a Nth order Markov model given a string of text

    Args: 
        markov_model (dict of dicts): a dictionary of word:(next_word:frequency pairs)
            or None if a new model is being built
        new_text (str): a string to build or add to the moarkov_model
        order (int): the number of previous states to consider for the model
        
    Returns:
        markov_model (dict of dicts): an updated/new markov_model
    '''
    def add_pair(first_pair,second_pair):
        if(first_pair in markov_model):
            markov_model[first_pair][second_pair] = 1
        else:
            markov_model[first_pair]={second_pair: 1}

    this_text = text.split(" ")
    current_tuple = ('*S*',)*order
    for count in range(0,len(this_text)+1):
        if(count == 0):
            add_pair(current_tuple,this_text[count])
        elif(count == len(this_text)):
            current_tuple = current_tuple[1:]+(this_text[count-1],)
            add_pair(current_tuple,'*E*')
        else:
            current_tuple = current_tuple[1:]+(this_text[count-1],)
            add_pair(current_tuple,this_text[count])
    return markov_model

In [28]:
markov_model = dict()
text = "one fish two fish red fish blue fish"
markov_model = build_markov_model(markov_model, text, order=2)
markov_model

{('*S*', '*S*'): {'one': 1},
 ('*S*', 'one'): {'fish': 1},
 ('one', 'fish'): {'two': 1},
 ('fish', 'two'): {'fish': 1},
 ('two', 'fish'): {'red': 1},
 ('fish', 'red'): {'fish': 1},
 ('red', 'fish'): {'blue': 1},
 ('fish', 'blue'): {'fish': 1},
 ('blue', 'fish'): {'*E*': 1}}

## Generate text from Markov Model

Markov models are "generative models". That is, the probability states in the model can be used to generate output following the conditional probabilities in the model.

We will now generate a sequence of text from the Markov model. For this section, I recommend using np.random.choice, which allows for you to provide a probability distribution for drawing the next edge in the chain.

In [29]:
import numpy as np


def get_next_word(current_word, markov_model, seed=42):
    '''
    Function to randomly move a valid next state given a markov model
    and a current state (word)

    Args:
        current_word (tuple): a word that exists in our model
        markov_model (dict of dicts): a dictionary of word:(next_word:frequency pairs)

    Returns:
        next_word (str): a randomly selected next word based on transition probabilies

    Pseudocode:
        Calculate transition probilities for all next states from a given state (counts/sum)
        Randomly draw from these to generate the next state

    '''
    word_list = []
    freq_list = []
    np.random.seed(seed)
    for i in markov_model:

        if current_word == i or current_word in i:  # handles both tuple or singular words
            word_list.extend(markov_model[i].keys())
            freq_list.extend(markov_model[i].values())
    prob = [x / sum(freq_list) for x in freq_list]
    random_word = np.random.choice(word_list, p=prob)
    return random_word


def generate_random_text(markov_model, seed=42):
    '''
    Function to generate text given a markov model

    Args:
        markov_model (dict of dicts): a dictionary of word:(next_word:frequency pairs)

    Returns:
        sentence (str): a randomly generated sequence given the model

    Pseudocode:
        Initialize sentence at start state
        Until End State:
            append get_next_word(current_word, markov_model)
        Return sentence

    '''
    scentence = []
    np.random.seed(seed)

    order = 0
    start_state = '*S*'
    for i in markov_model:
        if start_state in i:
            order = len(i)
            break

    init_start_state = ('*S*',) * order

    for _ in range(len(markov_model)):  # number of iteration found in the markov model
        words_to_connect = (get_next_word(init_start_state, markov_model, seed))
        if words_to_connect == '*E*':
            break

        scentence.append(words_to_connect)
        init_start_state = (*init_start_state[1:], words_to_connect)
    return ' '.join(scentence)


---

## All the Fish
Up till now, you have only been working with a line or two of the Dr. Seuss' _One Fish, Two Fish_. Now, I want you to build a model using the whole book and try different orders of Markov models.

In [41]:
# Now just add some more training data to the markov model. You can find it under data/one_fish_two_fish.txt

markov_model = dict()
# Read in the whole book
# An example of a more complex text that we can use to generate more complex output

file = open("data/one_fish_two_fish.txt", "r")
fishies = ""
for line in file:
    line = line.strip()
    fishies = fishies + ' ' + line
markov_model = build_markov_model(markov_model, fishies, order=6)

print (generate_random_text(markov_model,seed=7))

 One fish, Two fish, Red fish, Blue fish, Black fish, Blue fish, Old fish, New fish. This one has a little car. This one has a little star. Say! What a lot of fish there are. Yes. Some are red, and some are blue. Some are old and some are new. Some are sad, and some are glad, And some are very, very bad. Why are they sad and glad and bad? I do not know, go ask your dad. Some are thin, and some are fat. The fat one has a yellow hat. From there to here, From here to there, Funny things are everywhere. Here are some who like to run. They run for fun in the hot, hot sun. Oh me! Oh my! Oh me! oh my! What a lot of funny things go by. Some have two feet and some have four. Some have six feet and some have more. Where do they come from? I can't say. But I bet they have come a long, long way. we see them come, we see them go. Some are fast. Some are slow. Some are high. Some are low. Not one of them is like another. Don't ask us why, go ask your mother. Say! Look at his fingers! One, two, three

---
## Shakespeare

Now, let's play around with some Shakespeare.

In [40]:
# An example of a more complex text that we can use to generate more complex output
sonet_markov_model = dict()
file = open("data/sonnets.txt", "r")
sonet = ""
for line in file:
    line = line.strip()
    if line == "":
        # Empty line so build model
        sonet_markov_model = build_markov_model(sonet_markov_model, sonet, order=2)
        sonet = ""
    else:
        sonet = sonet + ' ' + line
 
print (generate_random_text(sonet_markov_model,seed=7))

 Then let not winter's ragged hand deface, In thee thy summer, ere thou be distill'd: Make sweet some vial; treasure thou some place With beauty's treasure ere it be self-kill'd. That use is not forbidden usury, Which happies those that pay the willing loan; That's for thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the very same And that unfair which fairly doth excel; For never-resting time leads summer on To hideous winter, and confounds him there; Sap checked with frost, and lusty leaves quite gone, Beauty o'er-snowed and bareness every where: Then were not summer's distillation left, A liquid prisoner pent in walls of glass, Beauty's effect with beauty were bereft, Nor it, nor no remembrance what it was: But flowers distill'd, though they with winter meet, Leese but their show; their substance still lives sweet.
