# Text Generation

## Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

## Select Text to Imitate

In [4]:
# Read in the corpus, including punctuation!
import pandas as pd
data = pd.read_pickle('corpus.pkl')
data

Unnamed: 0,transcript,full_name
DESPICABLE ME 3,"In the animated adventure Despicable Me 3, Fel...",ELEMENTAL (2023)
ELEMENTAL,"The film journeys alongside an unlikely pair, ...",THE MONKEY KING (2023)
KLAUS,When Jesper (Jason Schwartzman) distinguishes ...,DESPICABLE ME 3 (2017)
THE BOSS BABY,Tim and his Boss Baby little bro Ted have beco...,KLAUS (2019)
THE MONKEY KING,"Inspired by an epic Chinese tale, THE MONKEY K...",THE BOSS BABY: FAMILY BUSINESS (2021)


In [5]:
# Extract only Ali Wong's text
#extracting Masters of the air text
The_monkey_king_text = data.transcript.loc['THE MONKEY KING']
The_monkey_king_text[:200]

'Inspired by an epic Chinese tale, THE MONKEY KING is an action-packed family comedy that follows a charismatic Monkey (Jimmy O. Yang) and his magical fighting Stick on an epic quest for victory over 1'

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [6]:
from collections import defaultdict

def markov_chain(text):

    words = text.split(' ')
    
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [7]:
def build_markov_chain(text):
    words = text.split()  # Split the text into individual words
    chain = {}
    for i in range(len(words) - 1):
        current_word = words[i]
        next_word = words[i + 1]
        if current_word in chain:
            chain[current_word].append(next_word)
        else:
            chain[current_word] = [next_word]
    return chain

In [8]:
# Create the dictionary for Ali's routine, take a look at it
The_monkey_king_dict = build_markov_chain(The_monkey_king_text)
The_monkey_king_dict

{'Inspired': ['by'],
 'by': ['an',
  'the',
  'those',
  'one',
  'a',
  'eating',
  'night,',
  'lending',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'sunset.',
  'everyone.',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm,',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm',
  'storm'],
 'an': ['epic',
  'action-packed',
  'epic',
  'eccentric',
  'uncontrollable',
  'important',
  'insignificant',
  'Immortal',
  'accomplishment.',
  'Immortal',
  'insignificant,',
  'all-access',
  'assistant.',
  'original!',
  'enchanted',
  'enchanted',
  'old',
  'immortal',
  'assist.',
  'ancient',
  'evil',
  'illustrious'],
 'epic': ['Chinese', 'quest'],
 'Chinese': ['tale,'],
 'tale,': ['THE'],
 'THE': ['MONKEY'],
 'MONKEY': ['KING'],
 'KING': ['is'],
 'is': ['an',
  'a',
  'meant',
  'an',
  'here!',
  'me!',
  'happening?',
  'it?',
  'brave',
 

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Shape right turn– I also takes so that she’s got women all know that snail-trail.'

>'Optimum level of early retirement, and be sure all the following Tuesday… because it’s too.'

In [9]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [10]:
generate_sentence(The_monkey_king_dict,100)

'Oh! [monkeys gasp] [monkey] Hm. [Stick murmurs] Who’s a big one. I’m gonna buy this. Does this minute! Out of the right up. What’s it rain dance ♪ [both] ♪ Monkey King grunts] [Lin screams in the sun’s rays ♪ ♪ Monkey King? I’ve created a thief! Aw, you’re so exactly that stone monkey shrieks] [dramatic music continues] You’ve done good, girl. Go on. I’m starting to remind me. [Stick murmurs] I can welcome me A monkey squeals] Are in it work. Hai-yah! Agh! Agh! [dramatic music plays] [Stick groans] [groans] You’re so cares about you. He only a bath!.'

### Assignment:
1. Generate sentence for other comedians also.
2. Try making the generate_sentence function better. Maybe allow it to end with a random punctuation mark or end whenever it gets to a word that already ends with a punctuation mark.

In [11]:
# 1. Generating Senetence for other TV Series:
for i in data.index:
    print(f"Generated text for {i}")
    x_text = data.transcript.loc[i]
    x_dict = build_markov_chain(x_text)
    print(generate_sentence(x_dict))
    print("\n")

Generated text for DESPICABLE ME 3
Softly) (SPEAKING MINIONESE) (ALL GASP) Ooh, la-la! You boys have meant be kidding me. Pardon.


Generated text for ELEMENTAL
Side of his feet. Teeth. Whatever. Food upstairs. It might not up being modest. Ember’s.


Generated text for KLAUS
Mail, you care? – Go home, you think it’s your act of the mail, you.


Generated text for THE BOSS BABY
Privacy. Here at this time? (gasps) No, no. Don’t, don’t, don’t. Yes, first, they won’t.


Generated text for THE MONKEY KING
Unattractive people will end up ♪ Name-calling, life-ruining Loneliness-inducing ♪ ♪ And you so could’ve.




In [12]:
# generating sentence sequence of 100 words for each tv series.
for i in data.index:
    print(f"Generated text for {i}")
    x_text = data.transcript.loc[i]
    x_dict = build_markov_chain(x_text)
    print(generate_sentence(x_dict,100))
    print("\n")

Generated text for DESPICABLE ME 3
Go tell you want. I make us unstoppable! Whoa! Ay, chihuahua! I can I should I can handle it. Well, now that I were such short notice. It is it! You’re making me about we can make a high note with me. (EXCLAIMS IN FRUSTRATION) (SINGING IN FRUSTRATION) (SINGING IN FRENCH) (CONTINUES LAUGHING) Oh, you and Gru and Gru loses! Enjoy the perfect idea. (AGNES CHEWING LOUDLY) BOTH: My turn! (GRUNTING) (BOTH LAUGH) Oh, come on. All right, lady, that’s pretty sure I just that poor little boys. (SIGHS) Okay, take good news! (SINGSONG) Ah! How about to an Emmy?.


Generated text for ELEMENTAL
Nice hat, by then, those doors. Water looks alike. You know where I sucked a mess. Nah. I guess. (CLANKS) Tempered glass. (SIGHS) Whew. So, uh, couldn’t go together You’re the show You need to embrace the shop? If you giving me across that city isn’t made me feel it. You don’t want to break my dad will lose temper, then you up. Still have to tell him though. Water, always ri

In [13]:
# Question no 2: modifying the generate_sentence function.
import random
import string 

def generate_sentence(chain, count=15 , punctuation = True):
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        if word2.endswith(tuple(string.punctuation)):
            sentence += ' ' + word2
            break
        else:
            sentence += ' ' + word2
            word1 = word2
        
        if punctuation:
            punctuation = random.choice(string.punctuation)
            sentence += punctuation
        else:
            sentence += "."

    return(sentence)