# Setting up Markov Chains as a Supervised Learning Problem

In [1]:
text = "The man was ... they ... then ... the .. the ... the ... their ... "

In [4]:
def generateTable(data, k=4):
    T = {}
    for i in range((len(data) - k)):
        X = data[i:i+k]
        Y = data[i+k]
        # print("X: %s and Y: %s" % (X, Y))
        if T.get(X) is None:
            T[X] = {}
            T[X][Y] = 1
        else:
            if T[X].get(Y) is None:
                T[X][Y] = 1
            else:
                T[X][Y] += 1
    return T

In [5]:
generateTable("the man ")

{'the ': {'m': 1}, 'he m': {'a': 1}, 'e ma': {'n': 1}, ' man': {' ': 1}}

In [9]:
T = generateTable("hello hello helli")

In [10]:
def convertFreqIntoProb(T):
    for kx in T.keys():
        s = float(sum(T[kx].values()))
        for k in T[kx].keys():
            T[kx][k] = T[kx][k]/s
    return T

In [11]:
T = convertFreqIntoProb(T)
print(T)

{'hell': {'o': 0.6666666666666666, 'i': 0.3333333333333333}, 'ello': {' ': 1.0}, 'llo ': {'h': 1.0}, 'lo h': {'e': 1.0}, 'o he': {'l': 1.0}, ' hel': {'l': 1.0}}


In [12]:
text_path = "jkrowling_speech.txt"

In [16]:
def load_text(filename):
    with open(filename, encoding='utf-8') as f:
        return f.read().lower()

In [17]:
text = load_text(text_path)

In [18]:
print(text)

president faust, members of the harvard corporation and the board of overseers, members of the faculty, proud parents, and, above all, graduates.

the first thing i would like to say is ‘thank you.’ not only has harvard given me an extraordinary honour, but the weeks of fear and nausea i have endured at the thought of giving this commencement address have made me lose weight. a win-win situation! now all i have to do is take deep breaths, squint at the red banners and convince myself that i am at the world’s largest gryffindor reunion.

delivering a commencement address is a great responsibility; or so i thought until i cast my mind back to my own graduation. the commencement speaker that day was the distinguished british philosopher baroness mary warnock. reflecting on her speech has helped me enormously in writing this one, because it turns out that i can’t remember a single word she said. this liberating discovery enables me to proceed without any fear that i might inadvertently inf

In [19]:
# "The" "the"

In [20]:
print(text[:1000])

president faust, members of the harvard corporation and the board of overseers, members of the faculty, proud parents, and, above all, graduates.

the first thing i would like to say is ‘thank you.’ not only has harvard given me an extraordinary honour, but the weeks of fear and nausea i have endured at the thought of giving this commencement address have made me lose weight. a win-win situation! now all i have to do is take deep breaths, squint at the red banners and convince myself that i am at the world’s largest gryffindor reunion.

delivering a commencement address is a great responsibility; or so i thought until i cast my mind back to my own graduation. the commencement speaker that day was the distinguished british philosopher baroness mary warnock. reflecting on her speech has helped me enormously in writing this one, because it turns out that i can’t remember a single word she said. this liberating discovery enables me to proceed without any fear that i might inadvertently inf

### Training Our Markov Chain

In [21]:
def trainMarkovChain(text, k=4):
    T = generateTable(text, k)
    T = convertFreqIntoProb(T)
    return T

In [22]:
model = trainMarkovChain(text)

In [23]:
print(model)

{'pres': {'i': 0.14285714285714285, 's': 0.5714285714285714, 'e': 0.2857142857142857}, 'resi': {'d': 1.0}, 'esid': {'e': 1.0}, 'side': {'n': 0.2, ' ': 0.8}, 'iden': {'t': 0.5, 'c': 0.5}, 'dent': {' ': 0.5, 'i': 0.5}, 'ent ': {'f': 0.13333333333333333, 'a': 0.3333333333333333, 's': 0.13333333333333333, 'u': 0.06666666666666667, 'y': 0.06666666666666667, 'i': 0.06666666666666667, 't': 0.13333333333333333, 'w': 0.06666666666666667}, 'nt f': {'a': 1.0}, 't fa': {'u': 0.14285714285714285, 'i': 0.7142857142857143, 'r': 0.14285714285714285}, ' fau': {'s': 1.0}, 'faus': {'t': 1.0}, 'aust': {',': 1.0}, 'ust,': {' ': 1.0}, 'st, ': {'m': 0.5, 't': 0.5}, 't, m': {'e': 1.0}, ', me': {'m': 1.0}, ' mem': {'b': 1.0}, 'memb': {'e': 1.0}, 'embe': {'r': 1.0}, 'mber': {'s': 0.3333333333333333, ' ': 0.6666666666666666}, 'bers': {' ': 1.0}, 'ers ': {'o': 0.25, 'a': 0.125, 'i': 0.25, 's': 0.125, 'w': 0.125, 't': 0.125}, 'rs o': {'f': 1.0}, 's of': {' ': 1.0}, ' of ': {'t': 0.2441860465116279, 'o': 0.04651162

### Generating Text at Text Time!

In [24]:
import numpy as np

In [25]:
fruits = ["apple", "mango", "banana"]

In [31]:
for i in range(10):
    print(np.random.choice(fruits, p=prob))

apple
apple
apple
apple
apple
banana
apple
apple
apple
apple


In [30]:
prob = ["0.8", "0.05", "0.15"]

In [34]:
def sample_next(ctx, T, k):
    ctx = ctx[-k:]
    if T.get(ctx) is None:
        return " "
    possible_chars = list(T[ctx].keys())
    possible_values = list(T[ctx].values())
    # print(possible_chars)
    # print(possible_values)
    return np.random.choice(possible_chars, p=possible_values)

In [42]:
sample_next("pres", model, 4)

'e'

### Text Generation

In [44]:
def generateText(starting_sent, k=4, maxLen = 1000):
    sentence = starting_sent
    ctx = starting_sent[-k:]
    for ix in range(maxLen):
        next_prediction = sample_next(ctx, model, k)
        sentence = sentence + next_prediction
        ctx = sentence[-k:]
    return sentence

In [52]:
text = generateText("pres", k=4, maxLen=1000)
print(text)

pression; the trembled uncontentment, and security security to celebrate fact their fellow humbling i would like that would like my enable resolution we all notherefore discovered this mary pottered execution day jobles me future. a win-win seized at gryffindor realises that would no idea those to say to won, save lifetime, his live, and it in they has happening stories what britained myself those classics corridor.

i am at all invention, and, we all – in my overty. than to hear of the prefer not, and i had from the trembled door, and magine benefits victims and with whose whom came to amnesty internation on they worth me. hard of one might use it turner, my parents had betters. form the solid for you about of such as poor, and than i have never, second your own great free, but the turn in you are old the above all. the keys to useful day, it way you all paradoxical convince. povernment. you univerished to sue me to the biggest to giving many of ancient address, about my earned at unt