In this notebook, we'll explore using word pronuncation dictionaries and predictive models to represent both poetic meter and rhyme.

In [None]:
!pip install textdistance

In [None]:
!pip install g2p_en

In [None]:
import sys
import nltk
import string
import textdistance
import re
from g2p_en import G2p
import numpy as np

1. The CMU pronunciation dictionary lists words along with their pronunciation (using the ARPABET -- see [here](https://en.wikipedia.org/wiki/ARPABET) for a mapping between ARPABET and IPA, with example words).  Query this resource for the pronunciation of specific words.

In [None]:
arpabet = nltk.corpus.cmudict.dict()

In [None]:
word="foil"
arpabet[word]

2. The CMU dictionary doesn't have pronunciations for all words, but there are several systems that have been trained to generate pronunciations, including [g2p](https://github.com/Kyubyong/g2p).  Find some words that don't exist in the CMU dictionary and generate pronunciations for them through g2p.  How accurate is it?

In [None]:
g2p = G2p()

In [None]:
word="chatbot"
g2p(word)

3. Now let's use this pronunciation information (along with the word *stress* information it includes) to build a simple system for metrical analysis, where we'll estimate whether a given piece of text is more predomantly iambic (da DUM da DUM), trochaic (DUM da DUM da), spondaic (DUM DUM DUM DUM) or dactylic (DUM da da DUM da da).

In [None]:
def get_pronunciation(word):
    if word in arpabet:
        # pick the first pronunciation
        return arpabet[word][0]

    else:
        return g2p(word)

In [None]:
def get_stress(pron):
    stress=[]
    for sym in pron:
        final=sym[-1]
        try:
            sym_stress="1" if int(final) > 0 else "0"
            stress.append(sym_stress)
        except:
            pass
    return stress

In [None]:
def get_metrical_feet(num_syllables):
    
    """
    For a given number of syllables, let's get an ideal line in each of the metrical feet we're examining.
    e.g. for a line with syllables, we'd expect the following:
    
    iamb:    010101010
    trochee: 101010101
    spondee: 111111111
    dactyl:  100100100
    
    """
    iamb="01"*(int(num_syllables/2)+1)
    trochee="10"*(int(num_syllables/2)+1)
    spondee="11"*(int(num_syllables/2)+1)
    dactyl="100"*(int(num_syllables/3)+1)

    return list(iamb)[:num_syllables], list(trochee)[:num_syllables], list(spondee)[:num_syllables], list(dactyl)[:num_syllables]


In [None]:
def proc(text):
    
    """
    
    Now we'll compare the stress of a given piece of text to each of the idealized metrical lines; the best fit
    will be the one with the smallest distance (here we'll use the Levenshtein distance).
    
    Since both pronunciation methods often treat words with one syllable as stressed, we'll mainly use evidence
    from multi-syllabic words.
    
    """
    text_tokens=nltk.word_tokenize(text.lower())
    iamb, trochee, spondee, dactyl=get_metrical_feet(len(text_tokens))
    stress=[]
    
    for word in text_tokens:
        if not word in string.punctuation:
            pron=get_pronunciation(word)
            
            word_stress=get_stress(pron)

            if len(word_stress) == 1:
                stress.extend("-")
            else:
                stress.extend(word_stress)
                    

    print(''.join(stress))
    
    iamb_dist=textdistance.levenshtein(iamb, stress)
    trochee_dist=textdistance.levenshtein(trochee, stress)
    spondee_dist=textdistance.levenshtein(spondee, stress)
    dactyl_dist=textdistance.levenshtein(dactyl, stress)
    
    return iamb_dist, trochee_dist, spondee_dist, dactyl_dist

In [None]:
def line_by_line_meter(text):
    labels=["iamb", "trochee", "spondee", "dactyl"]
    scores=np.zeros(4)
    for line in text.split("\n"):
        scores+=np.array(proc(line))

    print (labels)
    print(scores)
    print("Best guess: %s" % labels[np.argmin(scores)])


In [None]:
def meter(text):
    labels=["iamb", "trochee", "spondee", "dactyl"]
    for line in text.split("\n"):
        scores=np.array(proc(line))

    print (labels)
    print(scores)
    print("Best guess: %s" % labels[np.argmin(scores)])



In [None]:
text="""Shall I compare thee to a summers day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summers lease hath all too short a date;
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimmd;
And every fair from fair sometime declines,
By chance or nature’s changing course untrimmd;
But thy eternal summer shall not fade,
Nor lose possession of that fair thou owst;
Nor shall death brag thou wanderst in his shade,
When in eternal lines to time thou growst:
   So long as men can breathe or eyes can see,
   So long lives this, and this gives life to thee."""

line_by_line_meter(text)

In [None]:
text="""The only news I know
Is bulletins all day
From Immortality.
The only shows I see,
Tomorrow and Today,
Perchance Eternity."""

line_by_line_meter(text)

Now can we use this to get a sense of the metrical qualities of prose texts?

In [None]:
text=""" When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation."""
meter(text)

4. Here are the pronunciation for two terms -- can you write a function that determines whether they rhyme and use that function to determine the rhyme scheme for the following lyrics?

In [None]:
get_pronunciation("cycle")

In [None]:
get_pronunciation("michael")

In [None]:
tribe_excursions="""Back in the days when I was a teenager
Before I had status and before I had a pager
You could find the Abstract listening to hip hop
My pops used to say, it reminded him of be-bop
I said, well daddy don't you know that things go in cycles
The way that Bobby Brown is just ampin' like Michael"""