## Markov Chain
- Probabistic Model for Text/Natural Language Generation
- Simple and effective way of generating new text
    - Text
    - Lyrics
    - Story/Novel
    - Code

In [1]:
def generateTable(data,k):
    #T: transition table
    T = {}
    for i in range(len(data)-k):
        X = data[i:i+k]
        Y = data[i+k]
        
        if T.get(X) is None:
            T[X] = {} #dict
            T[X][Y] = 1
        else:
            if T[X].get(Y) is None:
                T[X][Y] = 1
            else:
                T[X][Y] += 1
    
    return T

In [2]:
def convertFreqIntoProb(T):     
    for kx in T.keys():
        s = float(sum(T[kx].values()))
        for k in T[kx].keys():
            T[kx][k] = T[kx][k]/s
                
    return T

In [3]:
text_path = "sample_text.txt"
def load_text(filename):
    with open(filename,encoding='utf8') as f:
        return f.read().lower()
    
text = load_text(text_path)

In [4]:
print(text[:1000])

greetings cho,
please find attached details

client name : hsbc	
account id: 103285	
legal entity: citibank hongkong	
currency: cad	
payment type: receive	
paid amount: 56327540	
payment date: 16-10-2019	
payment status: processing	
pending amount: 2564636	

thanks.

hi tom,
this is to inform you that bny mellon has fully paid 80517212 usd to account id 104276 on 15-01-2020. 

thanks.

payment of 471862128 cad to account id 101165 has been made on 19/02/2020 and is in progress. please acknowledge. thanks,
joe.

please find attached details of failed payment and guide further. 

deutsche bank	
103838	
citibank singapore	
eur	
receive	
7325436	
16-01-2020	
rejected	
9934360	

thanks.

hi,
i would like to inform you that bny mellon partially paid 44866916 gbp to account id 101498 on 20-04-2020.

thanks.

partial payment of 51065250 gbp to account id 100216 has been made on 17/09/2019 and pending amount will be paid later. 
regards
ken adams

payment of 45218021 usd to account id 100545 ha

## Train our Markov Chain

In [5]:
def trainMarkovChain(text,k=4):
    
    T = generateTable(text,k)
    T = convertFreqIntoProb(T)
    
    return T
    

In [6]:
model = trainMarkovChain(text, 5)

In [7]:
print(model)

{'greet': {'i': 1.0}, 'reeti': {'n': 1.0}, 'eetin': {'g': 1.0}, 'eting': {'s': 1.0}, 'tings': {' ': 1.0}, 'ings ': {'c': 1.0}, 'ngs c': {'h': 1.0}, 'gs ch': {'o': 1.0}, 's cho': {',': 1.0}, ' cho,': {'\n': 1.0}, 'cho,\n': {'p': 1.0}, 'ho,\np': {'l': 1.0}, 'o,\npl': {'e': 1.0}, ',\nple': {'a': 1.0}, '\nplea': {'s': 1.0}, 'pleas': {'e': 1.0}, 'lease': {' ': 1.0}, 'ease ': {'f': 0.5, 'a': 0.25, 'g': 0.25}, 'ase f': {'i': 1.0}, 'se fi': {'n': 1.0}, 'e fin': {'d': 1.0}, ' find': {' ': 1.0}, 'find ': {'a': 1.0}, 'ind a': {'t': 1.0}, 'nd at': {'t': 1.0}, 'd att': {'a': 1.0}, ' atta': {'c': 1.0}, 'attac': {'h': 1.0}, 'ttach': {'e': 1.0}, 'tache': {'d': 1.0}, 'ached': {' ': 1.0}, 'ched ': {'d': 1.0}, 'hed d': {'e': 1.0}, 'ed de': {'t': 1.0}, 'd det': {'a': 1.0}, ' deta': {'i': 1.0}, 'detai': {'l': 1.0}, 'etail': {'s': 1.0}, 'tails': {'\n': 0.5, ' ': 0.5}, 'ails\n': {'\n': 1.0}, 'ils\n\n': {'c': 1.0}, 'ls\n\nc': {'l': 1.0}, 's\n\ncl': {'i': 1.0}, '\n\ncli': {'e': 1.0}, '\nclie': {'n': 1.0}, 'cli

## Generate Text at Text Time!


In [8]:
import numpy as np

In [9]:
# random sampling !
fruits = ["apple","banana","mango"]
prob = ["0.8",".1","0.1"]
for i in range(10):
    #sampling according a probability distribution
    print(np.random.choice(fruits,p=prob))  
    #print(np.random.choice(fruits)) will give approx same prob distr of all fruits


apple
apple
apple
banana
apple
apple
apple
apple
apple
banana


In [10]:
def sample_next(ctx,T,k):  #ctx: past sequence
    ctx = ctx[-k:]
    if T.get(ctx) is None:
        return " "
    possible_Chars = list(T[ctx].keys())
    possible_values = list(T[ctx].values())
    
    #print(possible_Chars)
    #print(possible_values)
    return np.random.choice(possible_Chars,p=possible_values)

In [11]:
sample_next("greet",model,5)

'i'

In [12]:
def generateText(starting_sent,k,maxLen=1000):
    
    sentence = starting_sent
    ctx = starting_sent[-k:] #last k chars
    
    for ix in range(maxLen):
        next_prediction = sample_next(ctx,model,k)
        sentence += next_prediction
        ctx = sentence[-k:]
    return sentence

In [22]:
#np.random.seed(11)
text = generateText("greet",k=5,maxLen=2500)
print(text)

greetings cho,
please find attached details of failed on 22/08/2020.

thanks,
joe.

please acknowledge. 
regards,
ava miller

full payment of 45218021 usd to account id 101165 has fully paid amount: 2564636	

thanks.

payment dated on 13/03/2020 and is in progress as date: 16-10-2019	
payment type: receive	
paid amount id 102229 is completed as dated on 17/09/2019. 
kindly reply with the further steps that bny mellon has fully paid 44866916 gbp to account id: 103285	
legal entity: citibank hongkong	
currency: cad	
payment of 51065250 gbp to account id: 103285	
legal entity: citibank hongkong	
currency: cad	
payment of 96826290 eur to account id 102229 is completed as date: 16-10-2019	
payment of 45218021 usd to account id: 103285	
legal entity: citibank singapore	
eur	
receive	
paid 80517212 usd to account id 104276 on 19/02/2020 and is in progress as dated on 11/05/2019 and is in progress. please guide me through the further actions that are to be taken.
regards
ken adams

payment of 

In [23]:
len(text)

2505

In [24]:
with open("output.txt",'a') as f:
    s=text
    f.write(s)

![](modi.gif)