# Markov Chains and Text Generation

## Markov Chain Simple Example

Markov chains are a way of representing how systems change over time. The main concept behind Markov chains are that they are memoryless, meaning that the next state of a process only depends on the previous state.

<img src="markov_chain_wiki.png" alt="Drawing" align="left" style="width: 350px;"/>

The way to read the Markov chain above from [Wikipedia](https://commons.wikimedia.org/w/index.php?curid=25300524) is:
* If I am currently in the sunny state, there is a 10% chance I will go to the rainy state and a 90% chance I will remain in the sunny state
* If I am currently in the rainy state, there is an 50% chance I will go to the sunny state and a 50% chance I will remain in the rainy state

### Markov Chains as Transition Matrices

This is what our **transition matrix** will look like for the Markov chain diagram above. Take a minute to interpret the rows and columns of this matrix.

In [1]:
import numpy as np

P = np.asarray([.9, .1, .5, .5]).reshape(2,2)
P

array([[0.9, 0.1],
       [0.5, 0.5]])

### Predict Tomorrow's Weather

Let's say it's sunny today, we can represent that as:

`today = [1, 0]`

**Predict tomorrow's weather using what you know about today and the transition matrix.**

In [None]:
# Input code here

**Predict the day after tomorrow's weather.**

In [None]:
# Input code here

**What is the steady state of the weather? Or in other words, what is the probability that it will be sunny, out of all days?** (Hint: Remember Monte Carlo simulations?)

In [None]:
# Input code here

## Markov Chains for Text Generation

Markov chains can also be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

### Read in some text to imitate

We are going to generate some text in the style of inspirational quotes, so let's first read in the data.

In [6]:
filename = 'inspiration_quotes.txt'
with open(filename, "r") as file:
    quotes = file.read()
quotes

'“Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.” —Peter Shepherd\n\n“Life is a journey and if you fall in love with the journey you will be in love forever.” —Peter Hagerty\n\n“When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.” —Earl Wilson\n\n“As we grow old, the beauty steals inward.” —Ralph Waldo Emerson\n\n“Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child.” —Sam Ewing\n\nHappiness\n“Ultimately your greatest teacher is to live with an open heart.” —Emmanuel (Pat Rodegast)\n\n“Doing what you like is freedom. Liking what you do is happiness.” —Frank Tyger\n\n“We forge the chains we wear in life.” —Charles Dickens\n\nhappiness quote\n“If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money, you will never be happy with yourself. Be con

### Clean up the text data

1. Remove all line breaks (```\n```)
1. Only keep text within quotes (`“”`). The output of this step should look like this:

```'Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions. Life is a journey and if you fall in love with the journey you will be in love forever. When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.
...```

In [None]:
# 1. Remove all line breaks

In [None]:
# 2. Only keep text within quotes - the regex tokenizer can help you out here
from nltk import RegexpTokenizer
tokenizer = RegexpTokenizer("[“”]", gaps=True)

# ...keep on coding

### Build a simple Markov chain function

Build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [None]:
# Input code here

Apply the function to the quotes. Your final output should look like this:
    
```
{'Healing': ['comes'],
 'comes': ['from', 'from'],
 'from': ['taking', 'aesthetic', 'a',
...
```

### Create a text generator

Create a function that randomly generates sentences. It should take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Enlarge the universe - it is who can turn a journey you have. Gratitude unlocks.'

>'Something you are sad. All this world is freedom. Liking what you have to order.'

In [None]:
# Input code here