<a href="https://colab.research.google.com/github/Gr3gP/NLP-Projects/blob/main/Text_Generation_with_Markovify_Shakespeare.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#NLP: Building a Shakespearean Chatbot with Markovify

The goal of this paper is to build a Chatbot trained with  Hamlet, Macbeth, and Caesar to generate text.

In [None]:
!pip install nltk
!pip install spacy
!pip install markovify
!python -m spacy download en

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.7/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.7/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


In [None]:
import numpy as np
import pandas as pd
import sklearn
import spacy
import re
import markovify
from nltk.corpus import gutenberg
import nltk
import warnings
warnings.filterwarnings('ignore')

nltk.download('gutenberg')
!python -m spacy download en

[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data]   Package gutenberg is already up-to-date!
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.7/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.7/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')


In [None]:
#inspect gutenberg text corpus 
print(gutenberg.fileids())

#Import data we just downloaded
from nltk.corpus import gutenberg

['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']


In [None]:
#we are going to use shakespeare's tragedies for this model

hamlet = gutenberg.raw('shakespeare-hamlet.txt')
macbeth = gutenberg.raw('shakespeare-macbeth.txt')
caesar = gutenberg.raw('shakespeare-caesar.txt')


#print first hundred characters
print('\nRaw:\n', hamlet[:100])
print('\nRaw:\n', macbeth[:100])
print('\nRaw:\n', caesar[:100])


Raw:
 [The Tragedie of Hamlet by William Shakespeare 1599]


Actus Primus. Scoena Prima.

Enter Barnardo a

Raw:
 [The Tragedie of Macbeth by William Shakespeare 1603]


Actus Primus. Scoena Prima.

Thunder and Lig

Raw:
 [The Tragedie of Julius Caesar by William Shakespeare 1599]


Actus Primus. Scoena Prima.

Enter Fla


In [None]:
#utility function for text cleaning
def text_cleaner(text):
  text = re.sub(r'--', ' ', text)
  text = re.sub('[\[].*?[\]]', '', text)
  text = re.sub(r'(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b','', text)
  text = ' '.join(text.split())
  return text

In [None]:
#remove chapter indicator
hamlet = re.sub(r'Chapter \d+', '', hamlet)
macbeth = re.sub(r'Chapter \d+', '', macbeth)
caesar = re.sub(r'Chapter \d+', '', caesar)

#apply cleaning function to corpus
hamlet = text_cleaner(hamlet)
caesar = text_cleaner(caesar)
macbeth = text_cleaner(macbeth)

In [None]:
#parse cleaned novels
nlp = spacy.load('en')
hamlet_doc = nlp(hamlet)
macbeth_doc = nlp(macbeth)
caesar_doc = nlp(caesar)

In [None]:
hamlet_sents = ' '.join([sent.text for sent in hamlet_doc.sents if len(sent.text) > 1])
macbeth_sents = ' '.join([sent.text for sent in macbeth_doc.sents if len(sent.text) > 1])
caesar_sents = ' '.join([sent.text for sent in caesar_doc.sents if len(sent.text) > 1])

In [None]:
shakespeare_sents = hamlet_sents + macbeth_sents + caesar_sents

In [None]:
shakespeare_generator = markovify.Text(shakespeare_sents, state_size=3)

In [None]:
#We will randomly generate three sentences
for i in range(3):
    print(shakespeare_generator.make_sentence())

#We will randomly generate three more sentences of no more than 100 characters
for i in range(3):
    print(shakespeare_generator.make_short_sentence(max_chars=100))

None
No more that Thane of Cawdor too: went it not so?
E'ene so, my Lord Brut.
Peace, peace, you durst not so haue beene Durst I haue done the deed: Didst thou not heare a noyse?
Once more goodnight, And when you do them- Brut.
If you call me Iephta my Lord, I would know that Polon.


In [None]:
#next we will use spacy's part of speech to generate some more legible text

class POSifiedText(markovify.Text):

    def word_split(self, sentence):
        return ['::'.join((word.orth_, word.pos_)) for word in nlp(sentence)]

    def word_join(self, words):
        sentence = ' '.join(word.split('::')[0] for word in words)
        return sentence

In [None]:
shakespeare_generator = POSifiedText(shakespeare_sents, state_size=3)

In [None]:
#now we will use the above generator to generate sentences
for i in range(5):
    print(shakespeare_generator.make_sentence())


#print 100 characters or less sentences
for i in range(5):
    print(shakespeare_generator.make_short_sentence(max_chars=100))

If Brutus will vouchsafe , that Antony May safely come to him , or he to Hecuba , That he is growne so picked , that the day will end , And then is heard no more .
I , fashion you may call it , go too , go too , go too , go too , go too , go too Ophe .
None
Indeed I heard it not  but you , and Spundge you shall be dry againe Rosin .
You shall confesse , that you can let this goe ?
I , in my Heart of heart , As I shall finde time .
Caska , you and I behinde an Arras then , Marke the encounter  If he but blench I know my course .
You are merrie , my Lord Brut .
Not so sicke my Lord , if your Lordship would vouchsafe the Answere Ham .
Not this by no meanes vulgar  The friends thou hast , and their Damme At one fell swoope ?
