# The markov property
A sequence on random variables has the markov property if the probability of the next value (state) depends only on the current state, and not on previous states given knowledge of the current state.  
$$
P( x_{i+1}\, \vert \, x_i)=P( x_{i+1}\, \vert \, x_j, \forall \, j\le i )
$$
When there are a finite number of states, the transisiton probabilities can be represented by a transition martix $A(i,j)=P( x_{t}=j\, \vert \, x_{t-1}=i)$.  With large state spaces, it may not be possible to get training data for every state transition.  Rather than setting the transition probabilities to zero, smoothing is used.  Without smoothing
$$
A(i,j)= \frac{count(i \rightarrow j)}{count(i)}
$$
With smoothing, we use $$
A(i,j)= \frac{count(i \rightarrow j) +\epsilon}{count(i)+\epsilon N}
$$
Where N is the number of states.


As an example, consider a second order model for generating phrases trained on Robert Frost poetry.  In this model, we train to predict a word given the previous two words $$
P(w_{t+1} \, \vert \, w_t, w_{t-1})
$$


In [23]:
import numpy as np
import string



#for removing the punctuation from the poem
def remove_pn(s):
    translator=str.maketrans( '','',string.punctuation)
    return([st.translate(translator) for st in s ])

def add2dict(d,k,v):
    if k not in d:
        d[k]=[]
    d[k].append(v)
    

In [24]:
#tests
s="all, talk"
remove_pn(s)

['a', 'l', 'l', '', ' ', 't', 'a', 'l', 'k']

In [31]:
#dictionaries for the first, second, and other words in sentences.
init={}
second={}
transitions={}
for line in open('robert_frost.txt'):
    tokens=remove_pn(line.rstrip().lower().split())
    N=len(tokens)
    for i in range(N):
        t=tokens[i]
        if i==0:
            init[t]=init.get(t,0)+1
        else:
            t1=tokens[i-1]
            if i==N-1:
                add2dict(transitions, (t1,t),'END')
            if i ==1:
                add2dict(second, t1,t)
            else:
                t2=tokens[i-2]
                add2dict(transitions, (t2,t1),t)
                
#normlize distributions
init_total=sum(init.values())
for t,c in init.items():
    init[t]=c/init_total
    
def list2prob(ts):
    d={}
    n=len(ts)
    for t in ts:
        d[t]=d.get(t,0)+1
    for t,c, in d.items():
        d[t]=c/n
    return(d)

for t1,ts in second.items():
    second[t1]=list2prob(ts)
    
for t,ts in transitions.items():
    transitions[t]=list2prob(ts)   
        
def sample(d):
    p0=np.random.random()
    cum=0
    for t,p in d.items():
        cum+=p
        if p0<cum:
            return(t)
    assert(False) #for testing purposes
    
def generate():
    for i in range(4):
        sent=[]
        w0=sample(init)
        sent.append(w0)
        w1=sample(second[w0])
        sent.append(w1)
        while True:
            w2=sample(transitions[(w0,w1)])
            if w2=='END':
                break
            sent.append(w2)
            w0,w1=w1,w2
            
    print(' '.join(sent))
        

In [38]:
generate()

now lefts no bigger than a harness gall


# Hidden Markov Models
