# Text Generation
Introduction
Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.
Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.
Select Text to Imitate
In this notebook, we're specifically going to generate text in the style of Ali Wong, so as a first step, let's extract the text from her comedy routine.

In [1]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('corpus.pkl')
data



Unnamed: 0,transcript
Deepak Ravishankar Ramkumar,Right up your alley macha.. u can convince int...
Karthik Kottiswaran,Watha unga kozha adi sandaila yen thaliya aruk...
Karthikeyan Narayanaswami,do you really have that much money? 50% un per...
Krishna Kumar,@919840648198 unga english lam padichu meaning...
Raj Kumar,This happens. There are companies that don't e...
Raj Kumar S,U wont be in trouble U r the trouble ‎GIF omit...
Ram,‎sticker omitted I saw something like yesterda...
Ram Balaji Subbaiyan,‎image omitted 🤷🏻‍♂ Enoda maximum eh 1 yr 10mo...
Riyas Khan,‎sticker omitted ‎sticker omitted Yaaru master...
Selva Vignesh,Wat 😏😏 Na yenne pannen Even I remember this Pa...


In [3]:
# Extract only Ali Wong's text
ram_text = data.transcript.loc['Ram']
ram_text[:200]

'\u200esticker omitted I saw something like yesterday. A JD said 12 years of experience in kubernetes where Kubernetes originally introduced less than 10 years ago only 😂 \u200eimage omitted 😂 Guys, i ve changed'

## Build a Markov Chain Function
We are going to build a simple Markov chain function that creates a dictionary:
The keys should be all of the words in thecorpus
The values should be a list of the words that follow the keys

In [4]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [5]:
# Create the dictionary for Ali's routine, take a look at it
ram_dict = markov_chain(ram_text)
ram_dict

{'\u200esticker': ['omitted', 'omitted'],
 'omitted': ['I', '😂', 'Same', 'Hahaha', 'what', 'Mac', 'Dai'],
 'I': ['saw',
  'recently',
  'dont',
  'concur.',
  'told',
  'don’t',
  'prolly',
  'think.',
  'do',
  '?',
  'don’t',
  'was',
  'saw',
  'like',
  'was',
  'think',
  'must',
  'can',
  'no',
  've',
  'made',
  'm',
  'think',
  'can',
  'think',
  'dont',
  'am',
  'have',
  'bet',
  'agree',
  'heard',
  'found',
  'don’t'],
 'saw': ['something', 'her,'],
 'something': ['like', 'works', 'will', 'gross'],
 'like': ['yesterday.', 'why', 'i', 'amazon.', 'a', 'an', 'she’s', 'pretty'],
 'yesterday.': ['A'],
 'A': ['JD'],
 'JD': ['said'],
 'said': ['12'],
 '12': ['years'],
 'years': ['of', 'ago', 'da.', 'sleeping'],
 'of': ['experience',
  'amends',
  'companies',
  'your',
  'the',
  'me.',
  'me.',
  'us',
  '@919789987036',
  'photos',
  'macro',
  'pretence.',
  'your',
  'you'],
 'experience': ['in', 'history', 'truly', 'with'],
 'in': ['kubernetes',
  'the',
  'my',
  'my',