## Markov Chain
- Probabistic Model for Text/Natural Language Generation
- Simple and effective way of generating new text
    - Text
    - Lyrics
    - Story/Novel
    - Code

In [None]:
text = "the man was ....they...then.... the ... the  "

# X is the sequence of 'K = 3' and Y is predicted character or K+1 the character

X      Y     Freq
the    " "    4
the    "n"    2
the    "y"    1
the    "i"    1
man    "_"    1

In [17]:
def generateTable(data,k=4):
    
    T = {}
    for i in range(len(data)-k):
        X = data[i:i+k]
        Y = data[i+k]
        #print("X  %s and Y %s  "%(X,Y))
        
        if T.get(X) is None:
            T[X] = {}
            T[X][Y] = 1
        else:
            if T[X].get(Y) is None:
                T[X][Y] = 1
            else:
                T[X][Y] += 1
    
    return T
        
    

In [19]:
T = generateTable("hello hello helli")
print(T)

{'hell': {'o': 2, 'i': 1}, 'ello': {' ': 2}, 'llo ': {'h': 2}, 'lo h': {'e': 2}, 'o he': {'l': 2}, ' hel': {'l': 2}}


In [22]:
def convertFreqIntoProb(T):     
    for kx in T.keys():
        s = float(sum(T[kx].values()))
        for k in T[kx].keys():
            T[kx][k] = T[kx][k]/s
                
    return T

In [24]:
T = convertFreqIntoProb(T)
print(T)

{'hell': {'o': 0.6666666666666666, 'i': 0.3333333333333333}, 'ello': {' ': 1.0}, 'llo ': {'h': 1.0}, 'lo h': {'e': 1.0}, 'o he': {'l': 1.0}, ' hel': {'l': 1.0}}


In [78]:
text_path = "english_speech_2.txt"
def load_text(filename):
    with open(filename,encoding='utf8') as f:
        return f.read().lower()
    
text = load_text(text_path)
#text = load_text("sample_code.txt")

In [79]:
print(text[:1000])

my dear countrymen,

many of you wish many-many good wishes of the holy festival of independence.

today the country is full of confidence. the country is crossing the new heights by plowing the resolve of dreams with hard work. today's sunrise has brought a new consciousness, new excitement, new excitement, new energy.

our lovely countrymen, once in 12 years, flowers of nilakurinya grow in our country. this year, on the hills of nilgiris in the south, it is like our nilkurinji flower like the ashok chakra of the tricolor flag, in the festival of freedom of the country.

my dear countrymen, we are celebrating this festival of independence, when our daughters uttarakhand, himachal, manipur, telangana, andhra pradesh - our daughters of these states crossed seven seas and coloring the seven seas with a color of tricolor came back

my dear countrymen, we are celebrating the festival of independence at that time, when everest triumphs were so many, many of our heroes, many of our daughters

## Train our Markov Chain

In [80]:
def trainMarkovChain(text,k=4):
    
    T = generateTable(text,k)
    T = convertFreqIntoProb(T)
    
    return T
    

In [81]:
model = trainMarkovChain(text)

In [82]:
print(model)

{'my d': {'e': 1.0}, 'y de': {'a': 0.8333333333333334, 'v': 0.16666666666666666}, ' dea': {'r': 1.0}, 'dear': {' ': 1.0}, 'ear ': {'c': 1.0}, 'ar c': {'o': 1.0}, 'r co': {'u': 1.0}, ' cou': {'n': 1.0}, 'coun': {'t': 1.0}, 'ount': {'r': 1.0}, 'untr': {'y': 1.0}, 'ntry': {'m': 0.3181818181818182, ' ': 0.2727272727272727, '.': 0.13636363636363635, ',': 0.18181818181818182, "'": 0.09090909090909091}, 'trym': {'e': 1.0}, 'ryme': {'n': 1.0}, 'ymen': {',': 0.8571428571428571, ' ': 0.14285714285714285}, 'men,': {'\n': 0.14285714285714285, ' ': 0.8571428571428571}, 'en,\n': {'\n': 1.0}, 'n,\n\n': {'m': 1.0}, ',\n\nm': {'a': 1.0}, '\n\nma': {'n': 1.0}, '\nman': {'y': 1.0}, 'many': {' ': 0.7142857142857143, '-': 0.14285714285714285, ',': 0.14285714285714285}, 'any ': {'o': 0.5, 'g': 0.16666666666666666, 'c': 0.16666666666666666, 'r': 0.16666666666666666}, 'ny o': {'f': 1.0}, 'y of': {' ': 1.0}, ' of ': {'y': 0.02040816326530612, 't': 0.4489795918367347, 'i': 0.12244897959183673, 'c': 0.0204081632

## Generate Text at Text Time!


In [83]:
import numpy as np

In [84]:
# sampling !
fruits = ["apple","banana","mango"]
prob = ["0.8",".1","0.1"]
for i in range(10):
    #sampling according a probability distribution
    print(np.random.choice(fruits,p=prob))


apple
banana
apple
apple
mango
apple
apple
apple
apple
apple


In [85]:
def sample_next(ctx,T,k):
    ctx = ctx[-k:]
    if T.get(ctx) is None:
        return " "
    possible_Chars = list(T[ctx].keys())
    possible_values = list(T[ctx].values())
    
    #print(possible_Chars)
    #print(possible_values)
    
    return np.random.choice(possible_Chars,p=possible_values)

In [86]:
sample_next("comm",model,4)

'i'

In [87]:
def generateText(starting_sent,k=4,maxLen=1000):
    
    sentence = starting_sent
    ctx = starting_sent[-k:]
    
    for ix in range(maxLen):
        next_prediction = sample_next(ctx,model,k)
        sentence += next_prediction
        ctx = sentence[-k:]
    return sentence

In [88]:

text = generateText("dear",k=4,maxLen=2000)
print(text)

dear country, along. this time parliament, new energy.

in our daughters of this year, on those who have lost the country, our soldiers of the country's in a sense them a lot of nilkurinji flowers of oppressions of the country.

my dear country.

my dear countrymen, the leadership of the tricolor flag today is going and giving a consciousness, for that time paramilitary great men hanging from difficulties, our daught a new energy.

our loved its name back

my dear country and happy and give environment, new energy.

in ordinary people of massacrifice and happy and color flag on the social justice. to the country, along had betrayed life, the country, many of parliament, among with who have lost the sake our countrymen, once to the country.

in order the countrymen,

many good reports are celebrating forces spendence this years, for their live everest andhra pradesh - our country, with flood rajya bagh. how long to that corners of pujya sabha have lost the world's sixth largest economy.

## Congrats, you have learnt how to build your own text generator  !
## How about a Rap/Song Lyrics Generate or Whatsapp Autocomplete  as assigment!

![](modi.gif)