## Markov Chain
- Probabistic Model for Text/Natural Language Generation
- Simple and effective way of generating new text
    - Text
    - Lyrics
    - Story/Novel
    - Code

In [2]:
def generateTable(data,k):
    #T: transition table
    T = {}
    for i in range(len(data)-k):
        X = data[i:i+k]
        Y = data[i+k]
        
        if T.get(X) is None:
            T[X] = {} #dict
            T[X][Y] = 1
        else:
            if T[X].get(Y) is None:
                T[X][Y] = 1
            else:
                T[X][Y] += 1
    
    return T

In [4]:
def convertFreqIntoProb(T):     
    for kx in T.keys():
        s = float(sum(T[kx].values()))
        for k in T[kx].keys():
            T[kx][k] = T[kx][k]/s
                
    return T

In [6]:
text_path = "sample_text.txt"
def load_text(filename):
    with open(filename,encoding='utf8') as f:
        return f.read().lower()
    
text = load_text(text_path)

In [7]:
print(text[:1000])

greetings cho,
please find attached details

client name : hsbc	
account id: 103285	
legal entity: citibank hongkong	
currency: cad	
payment type: receive	
paid amount: 56327540	
payment date: 16-10-2019	
payment status: processing	
pending amount: 2564636	

thanks.
hi tom,

this is to inform you that bny mellon has fully paid 80517212 usd to account id 104276 on 15-01-2020. 

thanks.
payment of 471862128 cad to account id 101165 has been made on 19/02/2020 and is in progress. please acknowledge. thanks.
joe, 

please find attached details of failed payment and guide further. 

deutsche bank	
103838	
citibank singapore	
eur	
receive	
7325436	
16-01-2020	
rejected	
9934360	

thanks.
hi tom,

i would like to inform you that bny mellon partially paid 44866916 gbp to account id 101498 on 20-04-2020

thanks.
partial payment of 51065250 gbp to account id 100216 has been made on 17/09/2019 and pending amount will be paid later. 
regards
ken adams
partial payment of 51065250 gbp to account id 

## Train our Markov Chain

In [8]:
def trainMarkovChain(text,k=4):
    
    T = generateTable(text,k)
    T = convertFreqIntoProb(T)
    
    return T
    

In [9]:
model = trainMarkovChain(text, 5)

In [10]:
print(model)

{'gree': {'t': 1.0}, 'reet': {'i': 1.0}, 'eeti': {'n': 1.0}, 'etin': {'g': 1.0}, 'ting': {'s': 1.0}, 'ings': {' ': 1.0}, 'ngs ': {'c': 1.0}, 'gs c': {'h': 1.0}, 's ch': {'o': 1.0}, ' cho': {',': 1.0}, 'cho,': {'\n': 1.0}, 'ho,\n': {'p': 1.0}, 'o,\np': {'l': 1.0}, ',\npl': {'e': 1.0}, '\nple': {'a': 1.0}, 'plea': {'s': 1.0}, 'leas': {'e': 1.0}, 'ease': {' ': 1.0}, 'ase ': {'f': 0.5, 'a': 0.25, 'g': 0.25}, 'se f': {'i': 1.0}, 'e fi': {'n': 1.0}, ' fin': {'d': 1.0}, 'find': {' ': 1.0}, 'ind ': {'a': 1.0}, 'nd a': {'t': 1.0}, 'd at': {'t': 1.0}, ' att': {'a': 1.0}, 'atta': {'c': 1.0}, 'ttac': {'h': 1.0}, 'tach': {'e': 1.0}, 'ache': {'d': 1.0}, 'ched': {' ': 1.0}, 'hed ': {'d': 1.0}, 'ed d': {'e': 1.0}, 'd de': {'t': 1.0}, ' det': {'a': 1.0}, 'deta': {'i': 1.0}, 'etai': {'l': 1.0}, 'tail': {'s': 1.0}, 'ails': {'\n': 0.5, ' ': 0.5}, 'ils\n': {'\n': 1.0}, 'ls\n\n': {'c': 1.0}, 's\n\nc': {'l': 1.0}, '\n\ncl': {'i': 1.0}, '\ncli': {'e': 1.0}, 'clie': {'n': 1.0}, 'lien': {'t': 1.0}, 'ient': {' '

## Generate Text at Text Time!


In [11]:
import numpy as np

In [12]:
# random sampling !
fruits = ["apple","banana","mango"]
prob = ["0.8",".1","0.1"]
for i in range(10):
    #sampling according a probability distribution
    print(np.random.choice(fruits,p=prob))  
    #print(np.random.choice(fruits)) will give approx same prob distr of all fruits


banana
apple
mango
apple
apple
apple
apple
apple
apple
apple


In [13]:
def sample_next(ctx,T,k):  #ctx: past sequence
    ctx = ctx[-k:]
    if T.get(ctx) is None:
        return " "
    possible_Chars = list(T[ctx].keys())
    possible_values = list(T[ctx].values())
    
    #print(possible_Chars)
    #print(possible_values)
    return np.random.choice(possible_Chars,p=possible_values)

In [29]:
sample_next("gree",model,4)

't'

In [16]:
def generateText(starting_sent,k,maxLen=1000):
    
    sentence = starting_sent
    ctx = starting_sent[-k:] #last k chars
    
    for ix in range(maxLen):
        next_prediction = sample_next(ctx,model,k)
        sentence += next_prediction
        ctx = sentence[-k:]
    return sentence

In [30]:
#np.random.seed(11)
text = generateText("gree",k=4,maxLen=20000)
print(text)

greetings cho,
please find attached details

client of 96826290 eur to be taken.
regards
ken adams
payment of 51065250 gbp to account will be paid later. 

deutsche bank	
103838	
citibank	
103838	
citibank hongkong	
pending amount id 100216 has been made on 17/09/2019	
payment dated on 20-04-2020. 
kindly reply with the further. 
regards
mary anne
payment of 96826290 eur to account: 56327540	
payment of 59369451 eur to account id 101877 is completed on 11/05/2019. 
kindly acknowledge. 
regards,
ava miller
fully paid later. 
regards
ken adams
payment of 51065250 gbp to inform you that are to be paid later. 
regards
ken adams
payment of 51065250 gbp to account id 101165 has been rejected on 20-04-2020. 

this in processing	
pending amount will be paid 44866916 gbp to inform you that bny mellon partial payment of 59369451 eur to account id: 103285	
legal entity: cad	
paid later. 
regards
mary anne
payment and pending amount id 107449 has fully paid 80517212 usd to account id 100216 has fu

In [31]:
len(text)

2004

In [32]:
with open("output.txt",'a') as f:
    s=text
    f.write(s)

![](modi.gif)