# Text Generation

## Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, in later project using RNN's or LSTM's may prove to be more fruitful in generating a better text generator. 

## Select Text to Imitate

Generating text based on each special. 

In [35]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('corpus.pkl')
data

Unnamed: 0,transcript,titles
age_spin,This is Dave. He tells dirty jokes for a livin...,age_spin
deep_texas,[Morgan Freeman] He’s in the trance. He isn’t ...,deep_texas
equanimity,"“Equanimity” was shot in Washington, D.C., and...",equanimity
killin_softly,Wooo! Ya’ll gone make me lose my mind. Up in h...,killin_softly
stick_stones,Sticks & Stones is Dave Chappelle’s fifth Netf...,stick_stones
the_bird_revelation,Recorded at the Comedy Store in Los Angeles in...,the_bird_revelation
worth,Why’d you pick San Francisco to shoot your spe...,worth


In [45]:
# Extracting each special 
chapelle_text = data.transcript.loc['stick_stones']
chapelle_text[:200]

'Sticks & Stones is Dave Chappelle’s fifth Netflix special.\nIn the promotional trailer Morgan Freeman narrates as Chappelle swaggers across a salt flat in leather pants, aviator shades and a remarkably'

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [46]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [47]:
# Create the dictionary for Ali's routine, take a look at it
chapelle_dict = markov_chain(chapelle_text)
chapelle_dict

{'Sticks': ['&', '&'],
 '&': ['Stones', 'Stones'],
 'Stones': ['is', 'streamed'],
 'is': ['Dave',
  'Dave.',
  'this?',
  'the',
  '45',
  'perfect.',
  'my',
  'my',
  'I’m',
  'the',
  'the',
  'awkward',
  'to',
  'different.',
  'very',
  'an',
  'the',
  'the',
  'damn',
  'precisely',
  'Atlanta.',
  'many',
  'the',
  'it…',
  'it',
  'we',
  'that',
  'this,',
  'gay.',
  'the',
  'it’s',
  'not',
  'that,',
  'how',
  'killing',
  'not',
  'serious.”',
  'the',
  'the',
  'shame',
  'masturbating',
  'your',
  'theirs.',
  'their',
  'fair.',
  'an',
  'duck.”',
  'school',
  'training',
  'terrifying.',
  'real',
  'looking',
  'raising',
  'that',
  'a',
  'African',
  'your',
  'incumbent',
  'a',
  'an',
  'a',
  'this',
  'in',
  'buckshot.',
  'a',
  'not',
  'it?',
  'your',
  'an',
  'black,',
  'that',
  'carrying',
  'the',
  'funnier',
  'MAGA',
  'that',
  'the',
  'a',
  'my',
  'when',
  'on',
  'protected'],
 'Dave': ['Chappelle’s', 'get', 'Chappelle', 'Chappell

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Wife wakes me and I told me. I’d never forget… that I’m not funny? And if you’re proposing. And I.'

>'Dude trespassing on a lot of us is different. I say. It’s mine. I don’t know what you just know.'

In [48]:
import random

def generate_sentence(chain, count=20):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [49]:
generate_sentence(chapelle_dict)

'Good. It’s… My mind’s telling jokes in their lives. But you decide to overcome one thing as they don’t know.'