<a href="https://colab.research.google.com/github/dksifoua/Language-Modeling---Text-Generation-with-Markov-Chain-and-LSTM/blob/master/Language_Modeling_Text_Generation_with_Markov_Chain_and_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import collections
import numpy as np


In [5]:
!wget http://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt

--2020-03-09 20:48:10--  http://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt
Resolving cs.stanford.edu (cs.stanford.edu)... 171.64.64.64
Connecting to cs.stanford.edu (cs.stanford.edu)|171.64.64.64|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt [following]
--2020-03-09 20:48:10--  https://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt
Connecting to cs.stanford.edu (cs.stanford.edu)|171.64.64.64|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4573338 (4.4M) [text/plain]
Saving to: ‘shakespeare_input.txt’


2020-03-09 20:48:12 (4.47 MB/s) - ‘shakespeare_input.txt’ saved [4573338/4573338]



# Markov Chain Modeling

In [0]:
class MarkovLangModel:

    def __init__(self, filepath, n_gram):
        self.filepath = filepath
        self.n_gram = n_gram
        self.model = collections.defaultdict(collections.Counter)

    @classmethod
    def normalize(cls, counter):
        z = sum(counter.values())
        return {char: count / z for char, count in counter.items()}

    def fit(self):
        data = open(self.filepath).read()
        
        # Padding
        data = '~' * self.n_gram + data

        # Estimating probabilities
        for idx in range(len(data) - self.n_gram):
            history, char = data[idx:idx+self.n_gram], data[idx+self.n_gram]
            self.model[history][char] += 1
        
        # Normalizing
        self.model = {history: self.__class__.normalize(char_count)
                        for history, char_count in self.model.items()}

        return self

    def generate(self, input_='', max_len=1000):
        doc = '~' * self.n_gram + input_
        for _ in range(max_len):
            history = doc[-self.n_gram:]
            char = np.random.choice([*self.model[history].keys()],
                                    p=[*self.model[history].values()])
            doc += char
        
        return doc[self.n_gram:]

In [55]:
print(MarkovLangModel('shakespeare_input.txt',
                      n_gram=4).fit().generate())

First noble her back
From of Jove, when to! I am I am forth and so import worse, but be trius.

This the stroken poor were me not me elded oak.

FORD:
No, I go
tongue,
And womansion! Ay, by my mine Narcius Cassion this night
Hath in.

QUEEN MARGARET:
See hitherwisen Aumerless the ring son with two the Lucians, and even-penny,
To herefore shouldst aughts;
I'll be obtain: none?

First.

DUKE VINCE HENRY V:
For no less pret.

BRUTUS:
Are you follow, for it not versario,
Or one,
Would 'scuss unto army mouth
this blows so loset,
We for actises
Put heavy eyebrow,
stand blest have away. Tell you second Witch:
'Tis warlic-eates; is't or of a sworn;
Fly the place of pause and middless bend his sent, sirrah.

HORATIO:
Calmly, it moved bark, have that it.

OLIVIA:
I am not blow to few I yet I believe himself took that looks fool! you more I this late host choly wrong,
Women?
There;
Vice's stuff absent offer'd at learnest peace
you'll brevitatio. What we breason, fare thout wit me threat that have