# N-grams and Markov chains

By [Allison Parrish](http://www.decontextualize.com/)
and re-used, edited by Defne Onen for CATN Fall 2019 Final Project.

Markov chain text generation is [one of the oldest](https://elmcip.net/creative-work/travesty) strategies for predictive text generation. This notebook takes you through the basics of implementing a simple and concise Markov chain text generation procedure in Python.

## N-grams

The first kind of text analysis that we’ll look at today is an n-gram model. An n-gram is simply a sequence of units drawn from a longer sequence; in the case of text, the unit in question is usually a character or a word. For convenience, we'll call the unit of the n-gram is called its *level*; the length of the n-gram is called its *order*.

N-grams are used frequently in natural language processing and are a basic tool text analysis. Their applications range from programs that correct spelling to creative visualizations to compression algorithms to stylometrics to generative text. They can be used as the basis of a Markov chain algorithm—and, in fact, that’s one of the applications we’ll be using them for later in this lesson.

### Finding and counting word pairs

So how would we go about writing Python code to find n-grams? We'll start with a simple task: finding *word pairs* in a text. A word pair is essentially a word-level order-2 n-gram; once we have code to find word pairs, we’ll generalize it to handle n-grams of any order.

Open Runway - im2txt generated text below to create word pairs.

In [127]:
text = open("textInspiration.txt").read()
words = text.split()

In [128]:
pairs = [(words[i], words[i+1]) for i in range(len(words)-1)]

(Why `len(words) - 1`? Because the final element of the list can only be the *second* element of a pair. Otherwise we'd be trying to access an element beyond the end of the list.)

The corresponding way to write this with a `for` loop:

In [129]:
pairs = []
for i in range(len(words)-1):
    this_pair = (words[i], words[i+1])
    pairs.append(this_pair)

In either case, the list of n-grams ends up looking like this. (I'm only showing the first 25 for the sake of brevity; remove `[:25]` to see the whole list.)

In [130]:
pairs[:25]

[('a', 'black'),
 ('black', 'and'),
 ('and', 'white'),
 ('white', 'photo'),
 ('photo', 'of'),
 ('of', 'a'),
 ('a', 'person'),
 ('person', 'holding'),
 ('holding', 'a'),
 ('a', 'cell'),
 ('cell', 'phone'),
 ('phone', 'a'),
 ('a', 'close'),
 ('close', 'up'),
 ('up', 'of'),
 ('of', 'a'),
 ('a', 'pair'),
 ('pair', 'of'),
 ('of', 'scissors'),
 ('scissors', 'on'),
 ('on', 'a'),
 ('a', 'table'),
 ('table', 'a'),
 ('a', 'couple'),
 ('couple', 'of')]

Now that we have a list of word pairs, we can count them using a `Counter` object.

In [131]:
from collections import Counter

In [132]:
pair_counts = Counter(pairs)

The `.most_common()` method of the `Counter` shows us the items in our list that occur most frequently:

In [133]:
pair_counts.most_common(10)

[(('of', 'a'), 514),
 (('a', 'close'), 213),
 (('close', 'up'), 213),
 (('up', 'of'), 213),
 (('on', 'a'), 174),
 (('with', 'a'), 172),
 (('a', 'clock'), 106),
 (('table', 'a'), 105),
 (('a', 'table'), 100),
 (('sitting', 'on'), 100)]

So the phrase "af a" occurs 514 times, by far the most common word pair in the text. In fact, "of a" comprises about 5% of all word pairs found in the text:

In [134]:
pair_counts[("of", "a")] / sum(pair_counts.values())

0.05937391706133765

You can do the same calculation with character-level pairs with pretty much exactly the same code, owing to the fact that strings and lists can be indexed using the same syntax:

In [135]:
char_pairs = [(text[i], text[i+1]) for i in range(len(text)-1)]

The variable `char_pairs` now has a list of all pairs of *characters* in the text. Using `Counter` again, we can find the most common pairs of characters:

In [136]:
char_pair_counts = Counter(char_pairs)
char_pair_counts.most_common(10)

[(('a', ' '), 2125),
 ((' ', 'a'), 1489),
 ((' ', 'o'), 1179),
 (('n', ' '), 970),
 (('e', ' '), 958),
 (('\n', 'a'), 901),
 (('i', 'n'), 876),
 (('f', ' '), 787),
 (('o', 'f'), 780),
 ((' ', 's'), 749)]

### N-grams of arbitrary lengths

The step from pairs to n-grams of arbitrary lengths is a only a matter of using slice indexes to get a slice of length `n`, where `n` is the length of the desired n-gram. For example, to get all of the word-level order 7 n-grams from the list of words in `textInspiration.txt`:

In [137]:
seven_grams = [tuple(words[i:i+7]) for i in range(len(words)-6)]

In [138]:
seven_grams[:20]

[('a', 'black', 'and', 'white', 'photo', 'of', 'a'),
 ('black', 'and', 'white', 'photo', 'of', 'a', 'person'),
 ('and', 'white', 'photo', 'of', 'a', 'person', 'holding'),
 ('white', 'photo', 'of', 'a', 'person', 'holding', 'a'),
 ('photo', 'of', 'a', 'person', 'holding', 'a', 'cell'),
 ('of', 'a', 'person', 'holding', 'a', 'cell', 'phone'),
 ('a', 'person', 'holding', 'a', 'cell', 'phone', 'a'),
 ('person', 'holding', 'a', 'cell', 'phone', 'a', 'close'),
 ('holding', 'a', 'cell', 'phone', 'a', 'close', 'up'),
 ('a', 'cell', 'phone', 'a', 'close', 'up', 'of'),
 ('cell', 'phone', 'a', 'close', 'up', 'of', 'a'),
 ('phone', 'a', 'close', 'up', 'of', 'a', 'pair'),
 ('a', 'close', 'up', 'of', 'a', 'pair', 'of'),
 ('close', 'up', 'of', 'a', 'pair', 'of', 'scissors'),
 ('up', 'of', 'a', 'pair', 'of', 'scissors', 'on'),
 ('of', 'a', 'pair', 'of', 'scissors', 'on', 'a'),
 ('a', 'pair', 'of', 'scissors', 'on', 'a', 'table'),
 ('pair', 'of', 'scissors', 'on', 'a', 'table', 'a'),
 ('of', 'scissors'

Two tricky things in this expression: in `tuple(words[i:i+7])`, I call `tuple()` to convert the list slice (`words[i:i+7]`) into a tuple. In `range(len(words)-6)`, the `6` is there because it's one fewer than the length of the n-gram. Just as with the pairs above, we need to stop counting before we reach the end of the list with enough room to make sure we're always grabbing slices of the desired length.

For the sake of convenience, here's a function that will return n-grams of a desired length from any sequence, whether list or string:

In [139]:
def ngrams_for_sequence(n, seq):
    return [tuple(seq[i:i+n]) for i in range(len(seq)-n+1)]

Using this function, here are random character-level n-grams of order 9 from `textInspiration.txt`:

In [140]:
import random
genesis_9grams = ngrams_for_sequence(9, open("textInspiration.txt").read())
random.sample(genesis_9grams, 10)

[('n', ' ', 'f', 'r', 'o', 'n', 't', ' ', 'o'),
 ('e', 'r', 'e', 'd', ' ', 'm', 'o', 'u', 'n'),
 ('k', 'y', '\n', 'a', ' ', 'r', 'o', 'w', ' '),
 ('s', 't', 'a', 'n', 'd', 'i', 'n', 'g', ' '),
 ('r', 's', 'o', 'n', ' ', 'o', 'n', ' ', 'a'),
 ('e', 'x', 't', ' ', 't', 'o', ' ', 'a', ' '),
 (' ', 'a', ' ', 'p', 'a', 'i', 'r', ' ', 'o'),
 (' ', 't', 'h', 'e', ' ', 'd', 'a', 'r', 'k'),
 ('\n', 'a', ' ', 'r', 'e', 'd', ' ', 'u', 'm'),
 ('o', 's', 'e', ' ', 'u', 'p', ' ', 'o', 'f')]

Or all the word-level 5-grams from `textInspiration.txt`:

In [141]:
textInspiration_word_5grams = ngrams_for_sequence(5, open("textInspiration.txt").read().split())
textInspiration_word_5grams

[('a', 'black', 'and', 'white', 'photo'),
 ('black', 'and', 'white', 'photo', 'of'),
 ('and', 'white', 'photo', 'of', 'a'),
 ('white', 'photo', 'of', 'a', 'person'),
 ('photo', 'of', 'a', 'person', 'holding'),
 ('of', 'a', 'person', 'holding', 'a'),
 ('a', 'person', 'holding', 'a', 'cell'),
 ('person', 'holding', 'a', 'cell', 'phone'),
 ('holding', 'a', 'cell', 'phone', 'a'),
 ('a', 'cell', 'phone', 'a', 'close'),
 ('cell', 'phone', 'a', 'close', 'up'),
 ('phone', 'a', 'close', 'up', 'of'),
 ('a', 'close', 'up', 'of', 'a'),
 ('close', 'up', 'of', 'a', 'pair'),
 ('up', 'of', 'a', 'pair', 'of'),
 ('of', 'a', 'pair', 'of', 'scissors'),
 ('a', 'pair', 'of', 'scissors', 'on'),
 ('pair', 'of', 'scissors', 'on', 'a'),
 ('of', 'scissors', 'on', 'a', 'table'),
 ('scissors', 'on', 'a', 'table', 'a'),
 ('on', 'a', 'table', 'a', 'couple'),
 ('a', 'table', 'a', 'couple', 'of'),
 ('table', 'a', 'couple', 'of', 'animals'),
 ('a', 'couple', 'of', 'animals', 'that'),
 ('couple', 'of', 'animals', 'that'

And of course we can use it in conjunction with a `Counter` to find the most common n-grams in a text:

In [142]:
Counter(ngrams_for_sequence(3, open("textInspiration.txt").read())).most_common(20)

[((' ', 'a', ' '), 1204),
 (('\n', 'a', ' '), 900),
 ((' ', 'o', 'f'), 778),
 (('o', 'f', ' '), 778),
 (('i', 'n', 'g'), 546),
 (('f', ' ', 'a'), 518),
 (('n', 'g', ' '), 466),
 (('o', 'n', ' '), 452),
 (('a', ' ', 'c'), 406),
 ((' ', 'o', 'n'), 369),
 (('p', ' ', 'o'), 351),
 ((' ', 'c', 'l'), 343),
 (('c', 'l', 'o'), 343),
 (('a', ' ', 'b'), 339),
 ((' ', 's', 'i'), 308),
 (('a', 'n', 'd'), 301),
 (('d', 'i', 'n'), 297),
 (('n', ' ', 'a'), 281),
 ((' ', 'w', 'i'), 271),
 (('u', 'p', ' '), 270)]

## Markov models: what comes next?

Now that we have the ability to find and record the n-grams in a text, it’s time to take our analysis one step further. The next question we’re going to try to answer is this: Given a particular n-gram in a text, what is most likely to come next?

We can imagine the kind of algorithm we’ll need to extract this information from the text. It will look very similar to the code to find n-grams above, but it will need to keep track not just of the n-grams but also a list of all units (word, character, whatever) that *follow* those n-grams.

Let’s do a quick example by hand. This is the same character-level order-2 n-gram analysis of the (very brief) text “condescendences” as above, but this time keeping track of all characters that follow each n-gram:

| n-grams |	next? |
| ------- | ----- |
|co| n|
|on| d|
|nd| e, e|
|de| s, n|
|es| c, (end of text)|
|sc| e|
|ce| n, s|
|en| d, c|
|nc| e|

From this table, we can determine that while the n-gram `co` is followed by n 100% of the time, and while the n-gram `on` is followed by `d` 100% of the time, the n-gram `de` is followed by `s` 50% of the time, and `n` the rest of the time. Likewise, the n-gram `es` is followed by `c` 50% of the time, and followed by the end of the text the other 50% of the time.

The easiest way to represent this model is with a dictionary whose keys are the n-grams and whose values are all of the possible "nexts." Here's what the Python code looks like to construct this model from a string. We'll use the special token `$` to represent the notion of the "end of text" in the table above.

The functions in the cell below generalize this to n-grams of arbitrary length (and use the special Python value `None` to indicate the end of a sequence). The `markov_model()` function creates an empty dictionary and takes an n-gram length and a sequence (which can be a string or a list) and calls the `add_to_model()` function on that sequence. The `add_to_model()` function does the same thing as the code above: iterates over every index of the sequence and grabs an n-gram of the desired length, adding keys and values to the dictionary as necessary.

In [145]:
def add_to_model(model, n, seq):
    # make a copy of seq and append None to the end
    seq = list(seq[:]) + [None]
    for i in range(len(seq)-n):
        # tuple because we're using it as a dict key!
        gram = tuple(seq[i:i+n])
        next_item = seq[i+n]            
        if gram not in model:
            model[gram] = []
        model[gram].append(next_item)

def markov_model(n, seq):
    model = {}
    add_to_model(model, n, seq)
    return model

Or an order 3 word-level Markov model of `textInspiration.txt`:

In [153]:
textInspiration_markov_model = markov_model(3, open("textInspiration.txt").read().split())

In [154]:
textInspiration_markov_model

{('a', 'black', 'and'): ['white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white',
  'white'],
 ('black', 'and', 'white'): ['photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'image',
  'image',
  'photo',
  'clock',
  'image',
  'photo',
  'photo',
  'photo',
  'background',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'photo',
  'background',
  'background',
  'photo',
  'photo',

We can now use the Markov model to make *predictions*. Given the information in the Markov model of `textInspiration.txt`, what words are likely to follow the sequence of words `white photo of`? We can find out simply by getting the value for the key for that sequence:

In [155]:
textInspiration_markov_model[('white', 'photo', 'of')]

['a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'an',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a',
 'a']

This tells us that the sequence `white photo of` is followed by `a` 90% of the time, `an,` 10% of the time.

### Markov chains: Generating text from a Markov model

The Markov models we created above don't just give us interesting statistical probabilities. It also allows us generate a *new* text with those probabilities by *chaining together predictions*. Here’s how we’ll do it, starting with the order 2 character-level Markov model of `condescendences`: (1) start with the initial n-gram (`co`)—those are the first two characters of our output. (2) Now, look at the last *n* characters of output, where *n* is the order of the n-grams in our table, and find those characters in the “n-grams” column. (3) Choose randomly among the possibilities in the corresponding “next” column, and append that letter to the output. (Sometimes, as with `co`, there’s only one possibility). (4) If you chose “end of text,” then the algorithm is over. Otherwise, repeat the process starting with (2). 

Why `range(100)`? No reason, really—I just picked 100 as a reasonable number for the maximum number of times the Markov chain should produce attempt to append to the output. Because there's a loop in this particular model (`nd` -> `e`, `de` -> `n`, `en` -> `d`), any time you generate text from this Markov chain, it could potentially go on infinitely. Limiting the number to `100` makes sure that it doesn't ever actually do that. You should adjust the number based on what you need the Markov chain to do.

### A function to generate from a Markov model

The `gen_from_model()` function below is a more general version of the code that we just wrote that works with lists and strings and n-grams of any length:

In [156]:
import random
def gen_from_model(n, model, start=None, max_gen=100):
    if start is None:
        start = random.choice(list(model.keys()))
    output = list(start)
    for i in range(max_gen):
        start = tuple(output[-n:])
        next_item = random.choice(model[start])
        if next_item is None:
            break
        else:
            output.append(next_item)
    return output

The `gen_from_model()` function's first parameter is the length of n-gram; the second parameter is a Markov model, as returned from `markov_model()` defined above, and the third parameter is the "seed" n-gram to start the generation from. The `gen_from_model()` function always returns a list.

So if you're working with a character-level Markov chain, you'll want to glue the list back together into a string:

If you leave out the "seed," this function will just pick a random n-gram to start with:

### Advanced Markov style: Generating lines

You can use the `gen_from_model()` function to generate word-level Markov chains as well:

In [157]:
genesis_word_model = markov_model(3, open("textInspiration.txt").read().split())

In [159]:
generated_words = gen_from_model(3, genesis_word_model, ('photo', 'of', 'a' ))
print(' '.join(generated_words))

photo of a pair of scissors on a table a building with a clock on a wall a white and black microwave a black and white photo of a person flying a kite a person riding a skate board on a street a view of a traffic light a traffic light sitting on the side of it a picture of a person holding a tennis racket on a court a large building with a clock on it a tennis racket on a court a large building with a clock on the top a bunch of green and yellow flowers in a tree a


This looks good! But there's a problem: the generation of the text just sorta... keeps going. Actually it goes on for exactly 100 words, which is also the maximum number of iterations specified in the function. We can make it go even longer by supplying a fourth parameter to the function:

In [160]:
generated_words = gen_from_model(3, genesis_word_model, ('photo', 'of', 'a'), 500)
print(' '.join(generated_words))

photo of a red and white toothbrush a bunch of umbrellas that are in the air a close up of a nintendo wii controller a red and white fire hydrant a close up of a tennis racket a person holding a pair of scissors a close up of a black and white background a close up of a person holding an open umbrella a close up of a person holding a pair of scissors a group of people standing around a luggage carousel a picture of a wall with a wall mounted to the side of a building a group of people standing next to each other a large window with a building in the background a black and white photo of a man in a suit and tie walking down a street a woman in a black dress and a white shirt and black shorts holding a tennis racket a close up of a pair of scissors on a table a bunch of trees that are in the sky a close up of a person holding an umbrella a black and white photo of a window with a picture of a dog on it a group of people sitting on top of a pole a picture of a fence in a house a group of 

The reason for this is that unless the Markov chain generator reaches the "end of text" token, it'll just keep going on forever. And the longer the text, the less likely it is that the "end of text" token will be reached.

Maybe this is okay, but the underlying text actually has some structure in it: each line of the file is actually a verse. If you want to generate individual *verses*, you need to treat each line separately, producing an end-of-text token for each line. The following function does just this by creating a model, adding each item from a list to the model as a separate item, and returning the combined model:

In [161]:
def markov_model_from_sequences(n, sequences):
    model = {}
    for item in sequences:
        add_to_model(model, n, item)
    return model

This function expects to receive a list of sequences (the sequences can be either lists or strings, depending on if you want a word-level model or a character-level model). So, for example:

In [163]:
textInspiration_lines = open("textInspiration.txt").readlines() # all of the lines from the file
# textInspiration_lines_words will be a list of lists of words in each line
textInspiration_lines_words = [line.strip().split() for line in textInspiration_lines] # strip whitespace and split into words
textInspiration_lines_model = markov_model_from_sequences(2, textInspiration_lines_words)

The `textInspiration_lines_model` variable now contains a Markov model with end-of-text tokens where they should be, at the end of each line. Generating from this model, we get:

In [164]:
for i in range(10):
    print("verse", i, "-", ' '.join(gen_from_model(2, textInspiration_lines_model)))

verse 0 - meter on the side of a snow covered mountain with a clock on the wall of a person "s" reflection in a suit
verse 1 - and chairs
verse 2 - , table and chairs
verse 3 - of boats that are sitting on the side of a nintendo wii controller
verse 4 - it sitting on the wall
verse 5 - oven in it
verse 6 - clock mounted to the side of a window
verse 7 - brushing his teeth in the sky
verse 8 - and white kite flying in the air
verse 9 - sitting under a tree


Better—the verses are ending at appropriate places—but still not quite right, since we're generating from random keys in the Markov model! To make this absolutely correct, we'd want to *start* each line with an n-gram that also occurred at the start of each line in the original text file. To do this, we'll work in two passes. First, get the list of lists of words:

In [165]:
textInspiration_lines = open("textInspiration.txt").readlines() # all of the lines from the file
# textInspiration_lines_words will be a list of lists of words in each line
textInspiration_lines_words = [line.strip().split() for line in textInspiration_lines] # strip whitespace and split into words

Now, get the n-grams at the start of each line:

In [166]:
textInspiration_starts = [item[:2] for item in textInspiration_lines_words if len(item) >= 2]

Now create the Markov model:

In [167]:
textInspiration_lines_model = markov_model_from_sequences(2, textInspiration_lines_words)

And generate from it, picking a random "start" for each line:

In [168]:
for i in range(10):
    start = random.choice(textInspiration_starts)
    generated = gen_from_model(2, textInspiration_lines_model, random.choice(textInspiration_starts))
    print("verse", i, "-", ' '.join(generated))

verse 0 - a picture of a black and white image of a table
verse 1 - a red and white photo of a red stop sign
verse 2 - a bed with a building
verse 3 - a close up of a person wearing a tie
verse 4 - a young boy riding a skateboard up the side of a blue umbrella sitting on top of a roof
verse 5 - a group of people standing next to each other
verse 6 - a close up of a person holding a pair of scissors on a table
verse 7 - a woman is taking a picture of a building
verse 8 - a close up of a road
verse 9 - a close up of a person holding an apple


### Putting it together

The `markov_generate_from_sequences()` function below wraps up everything above into one function that takes an n-gram length, a list of sequences (e.g., a list of lists of words for a word-level Markov model, or a list of strings for a character-level Markov model), and a number of lines to generate, and returns that many generated lines, starting the generation only with n-grams that begin lines in the source file:

In [169]:
def markov_generate_from_sequences(n, sequences, count, max_gen=100):
    starts = [item[:n] for item in sequences if len(item) >= n]
    model = markov_model_from_sequences(n, sequences)
    return [gen_from_model(n, model, random.choice(starts), max_gen)
           for i in range(count)]

Here's how to use this function to generate from a character-level Markov model of `textInspiration.txt`:

In [187]:
textInspiration_lines = [line.strip() for line in open("textInspiration.txt").readlines()]
for item in markov_generate_from_sequences(6, textInspiration_lines, 20):
    print(''.join(item))

a bench sitting on a bedroom with scissors sitting on top of a city street sign that says
a large crowd of people standing next to a window in a mirror
a close up of a pair of scissors on a wall
a picture of a train on a wall
a close up of people standing in front of a man standing in front of a road
a close up of a mirror
a bathroom with a clock on it
a close up of a person holding holding with a clock on the wall
a bathroom sink
a close up of a toothbrush
a close up of people standing in a suit and tie holding with flowers in it
a picture of a stop sign that is flying in the middle of men standing on a beach
a picture of a window
a row of urinals mounted to it
a close up of a table
a close up of a clock on it
a couple of white and white photo of a clock
a pair of scissors on a sidewalk holding around a lamp
a bunch of vases on a table
a blue and cabinets


And from a word-level Markov model of `textInspiration.txt`:

In [192]:
textInspiration_words = [line.strip().split() for line in open("textInspiration.txt").readlines()]
for item in markov_generate_from_sequences(5, textInspiration_words, 14):
    print(' '.join(item))

a red stop sign sitting on the side of a building
a bed sitting in a bedroom next to a window
a man is holding a bunch of bananas
a street light with a street light hanging from it
a close up of a person holding a kite
a white refrigerator freezer sitting inside of a kitchen
a red umbrella sitting on top of a rock
a person cutting a piece of paper with scissors
a group of boats that are sitting in the water
a large building with a clock on it
a bunch of surfboards are lined up on a rack
a room with a bed and a large window
a flock of birds flying over a large body of water next to a bridge
a close up of a stuffed animal on a table


A fun thing to do is combine *two* source texts and make a Markov model from the combination. So for example, read in the lines of both `textInspiration.txt` and `textoriginal-caption.txt` and put them into the same list:

In [193]:
frost_lines = [line.strip() for line in open("textInspiration.txt").readlines()]
genesis_lines = [line.strip() for line in open("textoriginal-caption.txt").readlines()]
both_lines = frost_lines + genesis_lines
for item in markov_generate_from_sequences(5, both_lines, 14, max_gen=150):
    print(''.join(item))

a couple of a blue and a field
a pair of scissors
a close up of a kitchen with a bench
a clock on a chair
a red suitcases on a table
a table
a man standing with snowy mountain in a room
a building on a wall
a pair of scissors sitting next to each other
a black and a fire hydrant sitting on top of a building
a close up of a street filled with a stop sign on a wall
a man walking down a surfboard
a bowl of a person holding
a group of a table


The resulting text has properties of both of the underlying source texts!

### Putting it all *even more together*

If you're really super lazy, the `markov_generate_from_lines_in_file()` function below does allll the work for you. It takes an n-gram length, an open filehandle to read from, the number of lines to generate, and the string `char` for a character-level Markov model and `word` for a word-level model. It returns the requested number of lines generated from a Markov model of the desired order and level.

In [194]:
def markov_generate_from_lines_in_file(n, filehandle, count, level='word', max_gen=100):
    if level == 'word':
        glue = ''
        sequences = [item.strip() for item in filehandle.readlines()]
    elif level == 'char':
        glue = ' '
        sequences = [item.strip().split() for item in filehandle.readlines()]
    generated = markov_generate_from_sequences(n, sequences, count, max_gen)
    return [glue.join(item) for item in generated]

So, for example, to generate twenty lines from an order-3 model of `textInspiration.txt`:

In [200]:
for item in markov_generate_from_lines_in_file(7, open("textInspiration.txt"), 20, 'char'):
    print(item)

a statue of a bear in the middle of a forest
a sign that says on the side of a building
a building that has a bunch of signs on it
a close up of a pair of scissors on a table
a close up of a clock on a wall
a man in a suit and tie
a bench on a beach near a body of water
a large window in a room with a window
a window that has a clock on it
a woman standing in front of a window holding an umbrella
a blue and white bus driving down a street
a close up of a vandalized street sign
a room filled with lots of wooden furniture
a close up of a pair of scissors on a table
a pair of scissors sitting on top of a table
a clock on a building with a sky background
a bed sitting in a bedroom next to a window
a red stop sign sitting on the side of a road
a close up of a traffic light on a pole
a red and white umbrella with a blue background


Or an order-3 word-level model of `textInspiration.txt`:

In [204]:
for item in markov_generate_from_lines_in_file(6, open("textInspiration.txt"), 5, 'word'):
    print(item)
    print("")

a stop sign with a street signs on it

a pair of scissors sitting next to a bridge

a close up of people standing next to a window

a black and white photo of a wooden floor

a group of a person holding a cell phone



In [209]:
for item in markov_generate_from_lines_in_file(12, open("textInspiration.txt"), 15, 'word'):
    print(item)
    print("")

a small boat in a body of water

a plane flying in the air

a clock mounted to the side of a ramp

a red stop sign

a close up of a person holding a nintendo wii controller

a woman walking down a street

a pair of skis sitting on top of a table

a room with a window

a close up of a pair of scissors

a group of people standing next to a surfboard

a picture of a person holding a tennis racket on a court

a clock hanging from the ceiling of a building

a woman holding a pair of scissors sitting on top of a wooden table

a red and white fire hydrant

a man riding a wave on top of a table

