<a href="https://colab.research.google.com/github/MayankRawat06/NLP-Assignments/blob/main/NLP_in_Python_6_(Text_Generation).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Generation

## Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

## Select Text to Imitate

In this notebook, we're specifically going to generate text in the style of Ali Wong, so as a first step, let's extract the text from her comedy routine.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('/content/drive/MyDrive/NLP in Python/pickle/corpus.pkl')
data

Unnamed: 0,transcript,full_name
ali,"Ladies and gentlemen, please welcome to the st...",Ali Wong
anthony,"Thank you. Thank you. Thank you, San Francisco...",Anthony Jeselnik
bill,"[cheers and applause] All right, thank you! Th...",Bill Burr
bo,Bo What? Old MacDonald had a farm E I E I O An...,Bo Burnham
dave,This is Dave. He tells dirty jokes for a livin...,Dave Chappelle
hasan,[theme music: orchestral hip-hop] [crowd roars...,Hasan Minhaj
jim,[Car horn honks] [Audience cheering] [Announce...,Jim Jefferies
joe,[rock music playing] [audience cheering] [anno...,Joe Rogan
john,"All right, Petunia. Wish me luck out there. Yo...",John Mulaney
louis,Intro\nFade the music out. Let’s roll. Hold th...,Louis C.K.


In [4]:
# Extract only Ali Wong's text
ali_text = data.transcript.loc['ali']
ali_text[:200]

'Ladies and gentlemen, please welcome to the stage: Ali Wong! Hi. Hello! Welcome! Thank you! Thank you for coming. Hello! Hello. We are gonna have to get this shit over with, ’cause I have to pee in, l'

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [5]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [6]:
# Create the dictionary for Ali's routine, take a look at it
ali_dict = markov_chain(ali_text)
ali_dict

{'Ladies': ['and'],
 'and': ['gentlemen,',
  'foremost,',
  'then',
  'have',
  'there’s',
  'resentment',
  'get',
  'get',
  'says,',
  'my',
  'she',
  'snatch',
  'running',
  'fighting',
  'yelling',
  'it',
  'she',
  'I',
  'I',
  'I',
  'we',
  'watched',
  'I',
  'have',
  'that',
  'recycling,',
  'disturbing',
  'it’s',
  'all',
  'just…',
  'be',
  'half-Vietnamese.',
  'his',
  'slide.',
  'your',
  'inflamed',
  'you’re',
  'I',
  'half-Japanese',
  'I’m',
  'half-Vietnamese.',
  'playing',
  'rugby.',
  'foremost,',
  'a',
  'emotionally',
  'I',
  '20',
  'neither',
  'I',
  'I–',
  'then',
  'it’s',
  'find',
  'start',
  'just',
  'caves',
  'gets',
  'is',
  'very',
  'for',
  'I',
  'she',
  'rise',
  'be',
  'eat',
  'watch',
  'be',
  'now',
  'most',
  'in',
  'then',
  'digitally',
  'then',
  'then',
  'then',
  'steady',
  'brings',
  'let',
  'reverberate',
  'say,',
  'my',
  'he',
  'when',
  'I’m',
  'sicker,',
  'sicker.',
  'sicker,',
  'sicker,',
  'pos

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Shape right turn– I also takes so that she’s got women all know that snail-trail.'

>'Optimum level of early retirement, and be sure all the following Tuesday… because it’s too.'

In [7]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [8]:
generate_sentence(ali_dict)

'Freak me from a dictator’s gonna before him, and do fancy Asians are never throw.'

### Assignment:
1. Generate sentence for other comedians also.
2. Try making the generate_sentence function better. Maybe allow it to end with a random punctuation mark or end whenever it gets to a word that already ends with a punctuation mark.

In [9]:
bill_text = data.transcript.loc['bill']
bill_text[:200]

'[cheers and applause] All right, thank you! Thank you very much! Thank you. Thank you. Thank you. How are you? What’s going on? Thank you. It’s a pleasure to be here in the greater Atlanta, Georgia, a'

In [10]:
dave_text = data.transcript.loc['dave']
dave_text[:200]

'This is Dave. He tells dirty jokes for a living. That stare is where most of his hard work happens. It signifies a profound train of thought, the alchemist’s fire that transforms fear and tragedy into'

In [11]:
bill_dict = markov_chain(bill_text)
bill_dict

{'[cheers': ['and', 'and', 'and', 'and', 'and', 'and', 'and', 'and'],
 'and': ['applause]',
  'all',
  'when',
  'causes',
  'say',
  'just',
  'applause]',
  'age.',
  'complaining',
  'your',
  'complaining.',
  'watch',
  'the',
  'a',
  'applause]',
  'some',
  'the',
  'hit',
  'asked',
  'all',
  'listen',
  'listen',
  'a',
  'it',
  'a',
  'living.',
  'then',
  'it’s',
  'it',
  'you',
  'white',
  'when',
  'now',
  'pound',
  'I’m',
  'people',
  'applause]',
  'make',
  'the',
  'he',
  'lepers,',
  'he’s',
  'the',
  'applause]',
  'everybody',
  'they',
  'they’re',
  'they',
  'they',
  'where',
  'what',
  'the',
  'came',
  'that',
  'I.',
  'other',
  'I',
  'applause]',
  'all',
  'applause]',
  'I',
  'over',
  'what',
  'that’s',
  'then',
  'you',
  'now',
  'it',
  'help',
  'friends',
  'the',
  'it',
  'down,',
  'down,',
  'everything',
  'everybody',
  'see',
  'be',
  'turn',
  'make',
  'applause]',
  'this',
  'the',
  'higher',
  'that’s',
  'to',
  'when

In [12]:
dave_dict = markov_chain(dave_text)
dave_dict

{'This': ['is',
  'happens',
  'woman',
  'is',
  'guy',
  'was',
  'is',
  'is',
  'was',
  'was',
  'country',
  'guy’s',
  'is',
  'guy',
  'motherfucker',
  'is',
  'is'],
 'is': ['Dave.',
  'where',
  'very',
  'also',
  'broken.',
  'going',
  'outside',
  'gonna',
  'not',
  'in',
  'because,',
  'fucking',
  'fucked',
  'like,',
  'about',
  'a',
  'what',
  'blessed',
  'he',
  'that',
  'not',
  'what',
  'Asian.',
  'still',
  'not',
  'gayer,',
  'the',
  'like',
  'sometimes',
  'scary',
  'right',
  'this',
  'why',
  'taking',
  'the',
  'just',
  'a',
  'older',
  'Top',
  'when',
  'suffering.',
  'wrong',
  'wrong?',
  'just',
  'a',
  'Kevin',
  'when',
  'the',
  'directly',
  'this:',
  'not',
  'right',
  'too',
  'a',
  'your'],
 'Dave.': ['He'],
 'He': ['tells',
  'refused',
  'said,',
  'was',
  'said,',
  'said,',
  'said,',
  'said,',
  'said,',
  'was',
  'goes,',
  'was',
  'keeps',
  'was',
  'beats',
  'said,',
  'stopped,',
  'leaned',
  'looked',
  'alw

In [13]:
generate_sentence(bill_dict)

'Isn’t music!” You know why it is. Yeah. Absolutely, recycle. You hear the whole “Hold.'

In [14]:
generate_sentence(dave_dict)

'Chest before. I was involved. She didn’t even know who did it. My point is.'

In [15]:
import random
import string
def new_generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += random.choice(list(string.punctuation))
    return(sentence)

In [16]:
new_generate_sentence(dave_dict)

'Designed for people are young. Imagine if you know anything wrong. Not only did give\\'

In [17]:
generate_sentence(ali_dict)

'About? I’m fucked. So, you know… where they like wiping and get sicker, and start.'