# Text Generation

## Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

## Select Text to Imitate

In this notebook, we're specifically going to generate text in the style of Ali Wong, so as a first step, let's extract the text from her comedy routine.

In [2]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('../Assignment2/corpus.pkl')
data

Unnamed: 0,transcript,full_name
ali,"Ladies and gentlemen, please welcome to the st...",Ali Wong
anthony,"Thank you. Thank you. Thank you, San Francisco...",Anthony Jeselnik
bill,"[cheers and applause] All right, thank you! Th...",Bill Burr
bo,Bo What? Old MacDonald had a farm E I E I O An...,Bo Burnham
dave,This is Dave. He tells dirty jokes for a livin...,Dave Chappelle
hasan,[theme music: orchestral hip-hop] [crowd roars...,Hasan Minhaj
jim,[Car horn honks] [Audience cheering] [Announce...,Jim Jefferies
joe,[rock music playing] [audience cheering] [anno...,Joe Rogan
john,"All right, Petunia. Wish me luck out there. Yo...",John Mulaney
louis,Intro\nFade the music out. Let’s roll. Hold th...,Louis C.K.


In [3]:
# Extract only Ali Wong's text
ali_text = data.transcript.loc['ali']
ali_text[:200]

'Ladies and gentlemen, please welcome to the stage: Ali Wong! Hi. Hello! Welcome! Thank you! Thank you for coming. Hello! Hello. We are gonna have to get this shit over with, ’cause I have to pee in, l'

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [8]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [9]:
# Create the dictionary for Ali's routine, take a look at it
ali_dict = markov_chain(ali_text)
ali_dict

{'Ladies': ['and'],
 'and': ['gentlemen,',
  'foremost,',
  'then',
  'have',
  'there’s',
  'resentment',
  'get',
  'get',
  'says,',
  'my',
  'she',
  'snatch',
  'running',
  'fighting',
  'yelling',
  'it',
  'she',
  'I',
  'I',
  'I',
  'we',
  'watched',
  'I',
  'have',
  'that',
  'recycling,',
  'disturbing',
  'it’s',
  'all',
  'just…',
  'be',
  'half-Vietnamese.',
  'his',
  'slide.',
  'your',
  'inflamed',
  'you’re',
  'I',
  'half-Japanese',
  'I’m',
  'half-Vietnamese.',
  'playing',
  'rugby.',
  'foremost,',
  'a',
  'emotionally',
  'I',
  '20',
  'neither',
  'I',
  'I–',
  'then',
  'it’s',
  'find',
  'start',
  'just',
  'caves',
  'gets',
  'is',
  'very',
  'for',
  'I',
  'she',
  'rise',
  'be',
  'eat',
  'watch',
  'be',
  'now',
  'most',
  'in',
  'then',
  'digitally',
  'then',
  'then',
  'then',
  'steady',
  'brings',
  'let',
  'reverberate',
  'say,',
  'my',
  'he',
  'when',
  'I’m',
  'sicker,',
  'sicker.',
  'sicker,',
  'sicker,',
  'pos

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Shape right turn– I also takes so that she’s got women all know that snail-trail.'

>'Optimum level of early retirement, and be sure all the following Tuesday… because it’s too.'

In [10]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [11]:
generate_sentence(ali_dict)

'Pace, so little to achieve my Asian-American men as leverage and he had to tell.'

### Assignment:
1. Generate sentence for other comedians also.
2. Try making the generate_sentence function better. Maybe allow it to end with a random punctuation mark or end whenever it gets to a word that already ends with a punctuation mark.

In [32]:
comedians=['ali','anthony','bill','dave','jim','joe','john','louis','mike','ricky']
comedians_dict={}

In [33]:
for i in comedians:
    text = data.transcript.loc[i]
    comedians_dict[f"{i}"]=markov_chain(text)
comedians_dict

{'ali': {'Ladies': ['and'],
  'and': ['gentlemen,',
   'foremost,',
   'then',
   'have',
   'there’s',
   'resentment',
   'get',
   'get',
   'says,',
   'my',
   'she',
   'snatch',
   'running',
   'fighting',
   'yelling',
   'it',
   'she',
   'I',
   'I',
   'I',
   'we',
   'watched',
   'I',
   'have',
   'that',
   'recycling,',
   'disturbing',
   'it’s',
   'all',
   'just…',
   'be',
   'half-Vietnamese.',
   'his',
   'slide.',
   'your',
   'inflamed',
   'you’re',
   'I',
   'half-Japanese',
   'I’m',
   'half-Vietnamese.',
   'playing',
   'rugby.',
   'foremost,',
   'a',
   'emotionally',
   'I',
   '20',
   'neither',
   'I',
   'I–',
   'then',
   'it’s',
   'find',
   'start',
   'just',
   'caves',
   'gets',
   'is',
   'very',
   'for',
   'I',
   'she',
   'rise',
   'be',
   'eat',
   'watch',
   'be',
   'now',
   'most',
   'in',
   'then',
   'digitally',
   'then',
   'then',
   'then',
   'steady',
   'brings',
   'let',
   'reverberate',
   'say,',
   '

In [34]:
for i in comedians:
    print(f"------{i}------")
    print(generate_sentence(comedians_dict[i]))

------ali------
Gotta… …ten times a slip and I gave me that it backfired ’cause you know?
Another.
------anthony------
What’s your time, every morning, and using the water to make fun of. They’re like.
------bill------
Ugly-ass dog, nah.” Right? Now you this, lady! Why don’t live too long. You could.
------dave------
Hold so well.” I get in Brooklyn– hard way. I was going on?” This woman.
------jim------
“you need a mental home, standing in a fireman.” And this his scrotum over at.
------joe------
Corn-fed dude with people have to lie to sleep.” And you livin’ in Texas, too..
------john------
House.” I was very lazy. He’s my mom saved the apartment, walked past her to.
------louis------
“fine,” And then really have to think that looks like a bad day. You burn.
------mike------
Minutes. I think that’s the accident report had another city, and I remember it alive..
------ricky------
Really funny, whatever… She mended the plane about it. It’s obvious. The worst thing kept.


In [35]:
import random

def generate_sentence_punct(chain, count=15):
    '''Input a dictionary where key = current word, value = list of next words,
       and generate a sentence with up to `count` words, ending at a punctuation mark if possible.'''

    word1 = random.choice(list(chain.keys()))
    
    while not word1:
        word1 = random.choice(list(chain.keys()))
    
    sentence = word1.capitalize()

    for _ in range(count - 1):
        if word1 not in chain or not chain[word1]:
            break

        word2 = random.choice(chain[word1])

        if not word2:
            break

        sentence += ' ' + word2
        word1 = word2

        if word2[-1] in '.!?':  
            break

    if sentence and sentence[-1] not in '.!?':
        sentence += '.'

    return sentence


In [36]:
for i in comedians:
    print(f"------{i}------")
    print(generate_sentence_punct(comedians_dict[i]))

------ali------
“all right, I’ve picked boogers larger than me, I do get big– they become this.
------anthony------
Marathon? What’s funny about the face as I know what that’s too soon.
------bill------
Friendlieth… for the morning, you’re cutting into a good man, I’m not wearing a family,.
------dave------
Can you that a radioactive rat on Netflix, Making a cat would shoot love onto.
------jim------
Myself for “ass to me there you meet couples who take it getting fucked it.
------joe------
Yards in a dark moment for a girl get real confident.
------john------
Criminals. So, my wife!” That’s the true or not allowed to jail for a time.
------louis------
You’re never do that, but they’re making breakfast.
------mike------
Winter in a relationship was just don’t really long, difficult to 1, you just kind.
------ricky------
What? Show with them, I told it.


In [37]:
for i in range(1,30):
    print(generate_sentence_punct(ali_dict))

Ladies and watched him to freak me about giving childbirth, though.
Seat. They try to get that… y-you get older.
Ceremonies. We had this shit around?
Mmm.” Every time I discretely scratch yourself, all your thumb up don’t wanna lean in,.
Smooth, just gonna have any real miracle of your first home in between each other.
How did that means, OK?
Could just see Beyoncé.
Keep it in between each other.
In. Well, I heard one of mine before I was so in the chatter of.
Thank you, everybody, so that and 20 percent of her, there and watched him to.
Juicy. You know, they just blow ass into the stall.
Emotional. We had to the promise of your children with my husband, half-Filipino, half-Japanese.
Writing staff here is not even accidentally let… two homeless people were discouraging me and.
Comics… don’t gotta try to get older.
He’d become a great dad.
Off?” I first dates, he leaves me, he’s worthy of your 20s, OK?
God! He was like, “Oh, no!” “I did you when it backfired ’cause you know,.
Boba Fett 