# Text Generation

# Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

# Select Text to Imitate

In this notebook, we're specifically going to generate text in the style of Ali Wong, so as a first step, let's extract the text from her comedy routine.

In [33]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('corpus.pkl')
data.reset_index()

Unnamed: 0,index,Lyric,Genre
0,100778,the sun goes down and the moon comes up my hea...,Rock
1,100779,s moy w stevenson duet with rod stewart produc...,Rock
2,100780,babys got sunshine cause she dances through th...,Rock
3,100781,darling yes tine its time to get next to me ho...,Rock
4,100782,when the feeling is ended there aint no use pr...,Rock
...,...,...,...
295,100973,hold up goddamnit this thing alright wait hold...,Hip Hop
296,100974,lighten up shine like the sun,Hip Hop
297,100975,well back to the beat yall down with the sound...,Hip Hop
298,100976,i know where you got to be going but you dont ...,Hip Hop


In [34]:
rock_df = pd.DataFrame(data.iloc[0:100,0])
pop_df = pd.DataFrame(data.iloc[100:200,0])
hiphop_df = pd.DataFrame(data.iloc[200:300,0])

In [6]:
rock_df.reset_index()

Unnamed: 0,index,Lyric
0,100778,the sun goes down and the moon comes up my hea...
1,100779,s moy w stevenson duet with rod stewart produc...
2,100780,babys got sunshine cause she dances through th...
3,100781,darling yes tine its time to get next to me ho...
4,100782,when the feeling is ended there aint no use pr...
...,...,...
95,100873,her face is a map of the world is a map of the...
96,100874,i cant believe the news today i cant close my ...
97,100875,i was so high i did not recognize the fire bur...
98,100876,well sometimes i go out by myself and i look a...


In [36]:
pop_df

Unnamed: 0,Lyric
0,hey slow it down what do you want from me what...
1,died last night in my dreams walking the stree...
2,ive paid my dues time after time ive done my s...
3,so i got my boots on got the right amount of l...
4,i wish that this night would never be over the...
...,...
95,i got drive so ill survive in hollywood where ...
96,left right step up to the spotlight why you ac...
97,empty spaces what are we living for abandoned...
98,love is a burning thing and it makes a firery ...


In [39]:
hiphop_df.reset_index()

Unnamed: 0,index,Lyric
0,100878,i cant stand it i know you planned it im a set...
1,100879,intergalactic planetary planetary intergalacti...
2,100880,kick it verse you wake up late for school man...
3,100881,you cant you wont and you dont stop mike d com...
4,100882,chorus no sleep til brooklyn foot on the ped...
...,...,...
95,100973,hold up goddamnit this thing alright wait hold...
96,100974,lighten up shine like the sun
97,100975,well back to the beat yall down with the sound...
98,100976,i know where you got to be going but you dont ...


In [23]:
rock_lyric_list = list(rock_df.Lyric)

In [25]:
rock_lyric_list

['the sun goes down and the moon comes up my heart is pumping for you and a mad thing starts never in your wildest dreams did you ever get this feeling never in your wildest dreams never in your wildest dreams could it ever be this easy never in your wildest dreams the night is hot outside your window i hear people walking people talking i smell your skin i feel you breathing dont let me go not yet not yet not yet not yet never in your wildest dreams did you ever get this feeling never in your wildest dreams never in your wildest dreams did it ever get this easy never in your wildest dreams the world is slowly turning as it turns i see your face touch your eyes your lips space weve arrived at the place where they open hearts and fill them up with love filled with love filled with love this ones pumping for you as a mad thing starts never in your wildest dreams did you ever get this feeling never in your wildest dreams never in your wildest dreams did it ever feel this easy never in you

In [41]:
pop_lyric_list = list(pop_df.Lyric)

In [42]:
pop_lyric_list 

['hey slow it down what do you want from me what do you want from me yeah im afraid what do you want from me what do you want from me there might have been a time i would give myself away ooh once upon a time i didnt give a damn but now here we are so what do you want from me what do you want from me just dont give up im workin it out please dont give in i wont let you down it messed me up need a second to breathe just keep coming around hey what do you want from me what do you want from me yeah its plain to see that baby youre beautiful and its nothing wrong with you its me – im a freak but thanks for lovin me cause youre doing it perfectly there might have been a time when i would let you step away i wouldnt even try but i think you could save my life just dont give up im workin it out please dont give in i wont let you down it messed me up need a second to breathe just keep coming around hey what do you want from me what do you want from me just dont give up on me i wont let you dow

In [43]:
hiphop_lyric_list = list(hiphop_df.Lyric)

In [44]:
hiphop_lyric_list

['i cant stand it i know you planned it im a set straight this watergate i cant stand rocking when im in here cause your crystal ball aint so crystal clear so while you sit back and wonder why i got this fucking thorn in my side oh my god its a mirage im tellin yall its sabotage so listen up cause you cant say nothin youll shut me down with a push of your button but im out and im gone ill tell you now i keep it on and on cause what you see you might not get and we can bet so dont you get souped yet youre scheming on a thing thats a mirage im trying to tell you now its sabotage why our backs are now against the wall listen all of yall its a sabotage  i cant stand it i know you planned it im a set straight this watergate i cant stand rockin when im in this place because i feel disgraced because youre all in my face but make no mistakes and switch up my channel im buddy rich when i fly off the handle what could it be its a mirage youre scheming on a thing thats sabotage',
 'intergalactic 

In [27]:
listToStr_rock = ' '.join([str(elem) for elem in rock_lyric_list]) 

In [28]:
listToStr_rock



In [45]:
listToStr_pop = ' '.join([str(elem) for elem in pop_lyric_list]) 

In [46]:
listToStr_pop 



In [47]:
listToStr_hiphop = ' '.join([str(elem) for elem in hiphop_lyric_list]) 

In [48]:
listToStr_hiphop

'i cant stand it i know you planned it im a set straight this watergate i cant stand rocking when im in here cause your crystal ball aint so crystal clear so while you sit back and wonder why i got this fucking thorn in my side oh my god its a mirage im tellin yall its sabotage so listen up cause you cant say nothin youll shut me down with a push of your button but im out and im gone ill tell you now i keep it on and on cause what you see you might not get and we can bet so dont you get souped yet youre scheming on a thing thats a mirage im trying to tell you now its sabotage why our backs are now against the wall listen all of yall its a sabotage  i cant stand it i know you planned it im a set straight this watergate i cant stand rockin when im in this place because i feel disgraced because youre all in my face but make no mistakes and switch up my channel im buddy rich when i fly off the handle what could it be its a mirage youre scheming on a thing thats sabotage intergalactic plane

# Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [9]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [29]:
rock_dict = markov_chain(listToStr_rock)
rock_dict

{'the': ['sun',
  'moon',
  'night',
  'world',
  'place',
  'best',
  'collected',
  'dark',
  'light',
  'pain',
  'movies',
  'movies',
  'moonlight',
  'rain',
  'pain',
  'window',
  'sun',
  'stars',
  'morning',
  'sun',
  'stars',
  'ground',
  'sun',
  'stars',
  'very',
  'preacher',
  'preacher',
  'love',
  'feelin',
  'good',
  'feelin',
  'feeling',
  'aggravation',
  'stage',
  'boy',
  'pain',
  'page',
  'boys',
  'only',
  'only',
  'way',
  'way',
  'way',
  'alligator',
  'watusi',
  'watusi',
  'jerk',
  'watusi',
  'load',
  'same',
  'shell',
  'things',
  'guy',
  'next',
  'morning',
  'next',
  'morning',
  'rules',
  'only',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'night',
  'one',
  'one',
  'twist',
  'twist',
  'blues',
  'song',
  'radio',
  'crowd',
  'tears',
  'while',
  'day',
  'price',
  'cause',
  'memory',
  'words',
  'heart',
  'heart',
  'heart',
  'he

In [49]:
pop_dict = markov_chain(listToStr_pop)
pop_dict

{'hey': ['slow',
  'what',
  'what',
  'what',
  'whataya',
  'i',
  'i',
  'tears',
  'tears',
  'lets',
  'hey',
  'oh',
  'hey',
  'gotta',
  'i',
  'i',
  'you',
  'you',
  'i',
  'i',
  'with',
  'with',
  'im',
  'tears',
  'tears'],
 'slow': ['it',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'me',
  'high',
  'high',
  'sexual'],
 'it': ['down',
  'out',
  'messed',
  'perfectly',
  'out',
  'messed',
  'out',
  'messed',
  'out',
  'messed',
  'i',
  'a',
  'dont',
  'chorus',
  'up',
  'chorus',
  'might',
  'might',
  'dont',
  'to',
  'only',
  'to',
  'seems',
  'into',
  'out',
  'gonna',
  'outta',
  'used',
  'we',
  'we',
  'time',
  'coming',
  'hear',
  'feel',
  'show',
  'shine',
  'shine',
  'shine',
  'shine',
  'out',
  'for',
  'out',
  'stays',
  'stays',
  'out',
  'straight',
  'now',
  'straight',
  'now',
  'straight',
  'now',
  'straight',
  'now',
  'straight',
  'now',
  'straight',
 

In [50]:
hiphop_dict = markov_chain(listToStr_hiphop)
hiphop_dict

{'i': ['cant',
  'know',
  'cant',
  'got',
  'keep',
  'cant',
  'know',
  'cant',
  'feel',
  'fly',
  'said',
  'hear',
  'run',
  'will',
  'like',
  'will',
  'am',
  'wrote',
  'start',
  'kick',
  'do',
  'keep',
  'go',
  'strap',
  'kojak',
  'was',
  'set',
  'come',
  'want',
  'want',
  'wanna',
  'keep',
  'use',
  'do',
  'do',
  'sip',
  'get',
  'was',
  'get',
  'get',
  'just',
  'get',
  'like',
  'get',
  'vex',
  'said',
  'think',
  'said',
  'think',
  'talk',
  'walk',
  'realize',
  'want',
  'said',
  'said',
  'reach',
  'said',
  'said',
  'have',
  'got',
  'use',
  'rock',
  'give',
  'shine',
  'write',
  'can',
  'keep',
  'wont',
  'do',
  'wont',
  'didnt',
  'get',
  'respect',
  'think',
  'say',
  'rap',
  'guess',
  'said',
  'didnt',
  'stole',
  'work',
  'add',
  'wear',
  'dont',
  'go',
  'dont',
  'bring',
  'am',
  've',
  'flow',
  'drink',
  'rock',
  'got',
  'dwell',
  'drink',
  'pour',
  'reached',
  'put',
  'offered',
  'smack',
  'g

# Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Shape right turn– I also takes so that she’s got women all know that snail-trail.'

>'Optimum level of early retirement, and be sure all the following Tuesday… because it’s too.'

In [51]:
import random

def generate_lyric(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [52]:
generate_lyric(rock_dict)

'Friend in the edge crying the deed i need im smiling baby let go i.'

In [53]:
generate_lyric(pop_dict)

'Messes that every step you like the night i kiss and roll oh yeah from.'

In [54]:
generate_lyric(hiphop_dict)

'Bag and then we be sour too sweet yall i feel put that you never.'