### How text is generated using markov chains

Markov Chains are mathematical systems that go from one state to another. A few rules associated with this broad statement are as follows: the next state is entirely dependent on the previous state. The next state is determined on a probabilistic basis

To put this into the context of a Meaningful Random Headlines generator, we used a dataset containing ABC news headlines. That dataset contains x number of words where there are probably many words used multiple times. For each word in the dataset,the words directly after are grouped together where words that occur more often are weighted more heavily. When generating text, a random word is chosen and from that a random word from the list of words is selected continuously until the desired word count is reached.The more often a word appears after another, the higher the probability that it will be selected in the text generator.

While the concept is simple, the hardest part of creating a generator using Markov Chains is to ensure you have enough text in your dataset so the text you generate doesn’t end up being the same words over and over.

However, in order to effectively generate meaningful headlines,you will need to provide a dataset that will contains headlines of similar categories and add weights to the probabilities of the words selected. 

### Markovify

The Py module we use here is markovify.

![](http://)Markovify is a simple, extensible Markov chain generator. Right now, its main use is for building Markov models of large corpora of text, and generating random sentences from that. But, in theory, it could be used for other applications. It is good in generating sentences that fits the model.Especially the markovify.Text model gives you a sentences that start with a capital letter and ends with a punctuation,and it does not repeat the same sentence twice

This module can be installed using pip:

pip install markovify

### Dataset

This includes the entire corpus of articles published by the ABC website in the given time range. With a volume of 200 articles per day and a good focus on international news, we can be fairly certain that every event of significance has been captured here. This dataset can be downloaded from [Kaggle Datasets](https://www.kaggle.com/therohk/million-headlines/data).



In [None]:
!pip install markovify

### Loading Required Packages

In [None]:

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import markovify #Markov Chain Generator

### Reading Input Text File

In [None]:
inp = pd.read_csv('../input/abcnews-date-text.csv')

inp.head(5)


### Sample Headlines
Randomly select 10 headlines from the first 100 headlines from the dataset

In [None]:
import random
inp.headline_text[random.sample(range(100), 10)]

### Building the text model with Markov Chain
Here we buid the model using the NewlineText function of the package.

In [None]:
text_model = markovify.NewlineText(inp.headline_text, state_size = 2)

### Time for some fun with Autogenerated Headlines



In [None]:
# Print ten randomly-generated sentences using the built model
for i in range(10):
    print(text_model.make_sentence())

### Markov Chain Model with Different State Size 

The state size argument in the NewlineText defines the number of words the probability of the next word depends on.
The higher the state size the more the generated sentence becomes similar to the original headlines.

In [None]:
text_model1 = markovify.NewlineText(inp.headline_text, state_size = 3)
text_model2 = markovify.NewlineText(inp.headline_text, state_size = 4)


In [None]:
# Print three randomly-generated sentences using the built model
for i in range(5):
    print(text_model1.make_sentence())

In [None]:
# Print three randomly-generated sentences using the built model
for i in range(10):
    temp = text_model2.make_sentence()
    if temp is not None: #escpaing None with this if condition as higher state markov model has generated more Nones
        print(temp)

### Ensembling Markov Chain Models

Here we use the combine function to combines different models and assign weigths to put emphasis on each model
Here it will combine text_model11 and text_model12, but, it would also place 50% more weight on the connections from text_model11.

In [None]:
text_model11 = markovify.NewlineText(inp.headline_text, state_size = 2)
text_model12 = markovify.NewlineText(inp.headline_text, state_size = 2)
model_combo = markovify.combine([ text_model11, text_model12 ], [ 1.5, 1 ])
# Print three randomly-generated sentences using the built model
for i in range(5):
    print(model_combo.make_sentence())

### Potential Applications 

Now, this text could become input for a Twitter Bot, Slack Bot or even a Parody Blog. And that's the point.

References to more Markovify examples: [https://github.com/jsvine/markovify#markovify-in-the-wild](https://github.com/jsvine/markovify#markovify-in-the-wild)