# Notebook N2 : measure of rhyme quality

Ce notebook montre comment déterminer si deux vers rimes ou non.  La fonction proposée (`similar_endings`) retourne en réalité le nombre de phonèmes identiques à la fin de deux vers qui sont donnés sous forme de chaîne (*string*).  On peut décider que ceux-ci riment s'il y a au moins 3 ou 4 phonèmes identiques.  Naturellement, cette fonction est susceptible d'être améliorée, si on dispose d'une bonne représentation des syllabes des mots.

La fonction utilise un dictionnaire phonétique de l'anglais généré par Àlex Atrio à partir d'un dictionnaire de prononciation (USA) du système CMU Sphinx, avec des règles pour définir les rimes (parfaites, ou assonantes, voir https://rhymenow.com/types-of-rhymes).  Il peut être téléchargé depuis Switch Drive ([rhyming_dictionaries.pickle](https://drive.switch.ch/index.php/f/5482152834), 5.9 Mo).

In addition, we provide functions that given a poem, we can calculate an accuracy over rhymes, and compute the proportion of assonant rhymes, perfects rhymes and other metrics.

In [1]:
import pickle
import numpy as np

In [2]:
def get_rhyming_dictionary(path="./rhyming_dictionaries.pickle"):
    """Pickle contains 3 dictionaries for faster search (hashes: O(1))
    {word: [perfect_rhyme, assonant_rhyme]}, {perfect_rhyme : [words...]} 
    and {assonant_rhyme : [words...]}, but we only need the first one.
    """
    with open(path,"rb") as fd:
        word2rhymes, _, _ = pickle.load(fd)
    return word2rhymes

In [3]:
word2rhymes = get_rhyming_dictionary()

In [4]:
print('Nombre de mots:', len(word2rhymes))
print('Exemple d\'entrée :', word2rhymes['campaign']) # perfect rhyme, assonant rhyme
print('Exemple d\'entrée :', word2rhymes['do']) # short word
print('Exemple d\'entrée :', word2rhymes['wouldn\'t']) # contraction (also don't & others)

Nombre de mots: 123631
Exemple d'entrée : ['eyn', 'ey']
Exemple d'entrée : ['uw', 'uw']
Exemple d'entrée : ['ahnt', 'ah']


In [5]:
from nltk.tokenize import word_tokenize
# nltk.download('punkt') # may be needed the first time

In [6]:
import difflib

In [7]:
def verse2rhyme(verse, rhyme_dict):
    """
    Returns an array with two elements: (1) the perfect rhyme and (2) the assonant 
    rhyme of the verse (its last word, except punctuations and contractions).
    Returns an empty array if the word is not found in the dictionary (but we
    could use ).
    """
    punctuation = ['.', ',', ',', ':', ';', '!', '?', ' ', '-', '...', '_']
    verse = verse.replace('’', '\'')
    v = word_tokenize(verse)
    v = [w for w in v if w not in punctuation] # remove all punctuations
    if len(v) == 0:
        return []
    if v[-1] == 'n\'t':             # tokenizer's output on contraction: don't -> do, n't
        final_word = v[-2] + v[-1]  # restore full form (v[-2] is necessarily present)
    elif v[-1] == '\'d':            # for contraction of past participle (Shakespeare!)
        final_word = v[-2] + 'ed'
    else:
        final_word = v[-1]
    final_word = final_word.lower()
    if final_word not in rhyme_dict: # find a similar word that *is* in the dictionary
        similar_words = difflib.get_close_matches(final_word, rhyme_dict.keys(), n=1) # time consuming
        # print(final_word, '->', similar_words)
        if similar_words == []:
            return ['', ''] # if we really couldn't find anything
        else:
            return rhyme_dict[similar_words[0]]
    return rhyme_dict[final_word]

In [8]:
poem_ED = """To make a prairie it takes a clover and one bee,
One clover, and a bee.
And revery.
The revery alone will do,
If bees are few."""

In [9]:
for line in poem_ED.split('\n'):
    print(verse2rhyme(line, word2rhymes))

['iy', 'iy']
['iy', 'iy']
['iy', 'iy']
['uw', 'uw']
['uw', 'uw']


In [10]:
def test_rhyme(verse1, verse2, rhyme_dict):
    """
    Returns '2' if verses have "perfect rhyme", '1' for "assonant rhyme", and '0' otherwise.
    See https://rhymenow.com/types-of-rhymes for definitions.
    """
    rh1 = verse2rhyme(verse1, rhyme_dict)
    rh2 = verse2rhyme(verse2, rhyme_dict)
    
    if (len(rh1) == 0) or (len(rh2) == 0):
        return 0
    elif rh1[0] == rh2[0]:
        return 2
    elif rh1[1] == rh2[1]:
        return 1
    else:
        return 0

In [11]:
test_rhyme('When love has changed to kindliness\n',
  'Oh , love , our hungry lips that press\n', word2rhymes)

0

In [12]:
# not optimal, because we get the rhymes n^2 times instead of n 
# but test_rhyme does not take pronunciations
for line1 in poem_ED.split('\n'):
    for line2 in poem_ED.split('\n'):
        print(test_rhyme(line1, line2, word2rhymes), end=' ')
    print()

2 2 2 0 0 
2 2 2 0 0 
2 2 2 0 0 
0 0 0 2 2 
0 0 0 2 2 


In [13]:
# by default, round to 4 decimals
def round_4(x):
    return np.round(x, 4)

In [14]:
def split_in_list(poem, sep_elem):
    paragraph = []
    for verse in poem:
        if verse == sep_elem:
            yield paragraph
            paragraph = []
        else:
            paragraph.append(verse)
    yield paragraph

## For AABB pattern

In [15]:
# this function must take a in input a poem which has this form :
# AA BB CC ... (there is a blank line between 2 pairs of rhymes)
# It must return a list which elements are the pairs (A,A), (B,B) ...
# This must be consistent in case there is 3 verses produces in the poem (instead of 2) and must then select the first 2 and discard the 3rd verse.
# If there is only 1 verse, it discards it.

def process_and_pair_poem(poem):
    
    # IF THE POEM IS A TEXT : split the poem in paragraph (normally 2 verses each)
    # poem_split = poem.split('\n\n')
    
    # IF THE POEM IS A LIST : 
    # parcourir la list et les group ensemble, separer par ceux qui sont vides
    
    poem_split = split_in_list(poem, '\n')
    
    # will contain the correct pairs of verses
    poem_by_para = []
    
    # number of discraded lines
    discarded_lines = 0
    
    for p in poem_split:
        if len(p) == 2:
            # There is exatly 2 lines, that's what we want.
            # We add them as a pair : (A,A)
            poem_by_para.append((p[0], p[1]))
        elif len(p) >= 3:
            # We have too much verse, just keep the first and second one
            #print("Length >= 3 : At least a line was deleted as she didn't have a partner to rhyme with.")
            poem_by_para.append((p[0], p[1]))
            discarded_lines = discarded_lines + (len(p)-2)
        elif len(p) == 1:
            # there is only 1 line that has been generated in this paragraph. We can't keep it.
            #print("Length == 1 : A line was deleted as she didn't have a partner to rhyme with.")
            discarded_lines = discarded_lines + 1
        #elif len(p) == 0:
            
            #print("Discarded a superfluous blanked line.")
            
    print("Number of total rhymes : ", len(poem_by_para))
    print("Number of discarded lines : ", discarded_lines)
    
    return poem_by_para, discarded_lines

In [16]:
def get_ryhme_accuracy(poem_input, rhyme_dict=word2rhymes):
    """
    Arguments : 
        poem_input : list that should consist of AA BB CC ... with spaces meaning blank lines in between pairs of rhyming verses
        rhyme_dict : dictionnary of rhyme that helps grading rhymes.
        
    Return : 
        accuracy : rhyming score divided by the max rhyming possible : perfect rhyme gives 2 points, assonant rhyme gives 1 point
        the formula for the chosen accuracy is : (#perfect * 2 + #assonant*1) / (#total_rhymes * 2)
        perfect_rhymes : number of perfect rhymes
        assonant_rhymes : number of perfect rhymes
        NON_rhymes : number of Non-rhyme verses
        non_rhyme_pairs : list that contain the non rhyming verses
    """
        
    # match by pair each verse (the structure is AA BB CC...)
    poem, discarded = process_and_pair_poem(poem_input)
        
    total_number_rhymes = len(poem)
    
    # values, counts to get statistics
    score_poem = []
    number_perfect_ryhmes = 0
    number_assonant_ryhmes = 0
    number_NON_ryhmes = 0
    
    non_rhyme_pairs = []
    
    # for each pair compute 'how much' they rhyme with _test_rhyme_
    for pair in poem:
        score = test_rhyme(pair[0], pair[1], rhyme_dict)
        
        # counts are updated
        if score == 2:
            number_perfect_ryhmes += 1
        elif score == 1:
            number_assonant_ryhmes += 1
        else: 
            non_rhyme_pairs.append(pair)
            number_NON_ryhmes += 1
            
        score_poem.append(score)

    # compute percentage to display
    accuracy = 100 * (np.sum(score_poem) / (2*total_number_rhymes))
    perfect_rhymes = 100 * (number_perfect_ryhmes / (total_number_rhymes))
    assonant_rhymes = 100 * (number_assonant_ryhmes / (total_number_rhymes))
    NON_rhymes = 100 * (number_NON_ryhmes / (total_number_rhymes)) 
    
    print('The accuracy is :', round_4(accuracy), "%")
    print('The percentage of perfect rhyme is :', round_4(perfect_rhymes), "%")
    print('The percentage of assonant rhyme is :', round_4(assonant_rhymes), "%")
    print('The percentage of NON rhyme is :', round_4(NON_rhymes), "%")
    
    
    return accuracy, perfect_rhymes, assonant_rhymes, NON_rhymes, non_rhyme_pairs

In [63]:
with open('poems_generated_AABB/rhyming_poems_100_epoch_task1.txt', encoding='utf-8') as f:
    generation_to_test = f.readlines()

In [64]:
acc, perfect, good, bad, non_rhyme_pairs = get_ryhme_accuracy(generation_to_test, word2rhymes)

Number of total rhymes :  466
Number of discarded lines :  233
The accuracy is : 57.618 %
The percentage of perfect rhyme is : 55.1502 %
The percentage of assonant rhyme is : 4.9356 %
The percentage of NON rhyme is : 39.9142 %


## For ABAB pattern

In [17]:
# this function must take a in input a poem which has this form :
# AA BB CC ... (there is a blank line between 2 pairs of rhymes)
# It must return a list which elements are the pairs (A,A), (B,B) ...
# This must be consistent in case there is 3 verses produces in the poem (instead of 2) and must then select the first 2 and discard the 3rd verse.
# If there is only 1 verse, it discards it.

def process_and_pair_poem_ABAB(poem):
    
    # IF THE POEM IS A TEXT : split the poem in paragraph (normally 2 verses each)
    # poem_split = poem.split('\n\n')
    
    # IF THE POEM IS A LIST : 
    # parcourir la list et les group ensemble, separer par ceux qui sont vides
    
    poem_split = split_in_list(poem, '\n')
    
    # will contain the correct pairs of verses
    poem_by_para = []
    
    # number of discraded lines
    discarded_lines = 0
    
    for p in poem_split:
        if len(p) == 4:
            # There is exatly 2 lines, that's what we want.
            # We add them as a pair : (A,A)
            poem_by_para.append((p[0], p[2]))
            poem_by_para.append((p[1], p[3]))
        else:
            for x in p:
                if p != '\n':                
                    discarded_lines = discarded_lines + 1
                    #print("Discarded ", 1, "lines")

    print("Number of total rhymes : ", len(poem_by_para))
    print("Number of discarded lines : ", discarded_lines)
    
    return poem_by_para, discarded_lines

In [18]:
def get_ryhme_accuracy_ABAB(poem_input, rhyme_dict=word2rhymes):
    """
    Arguments : 
        poem_input : list that should consist of AA BB CC ... with spaces meaning blank lines in between pairs of rhyming verses
        rhyme_dict : dictionnary of rhyme that helps grading rhymes.
        
    Return : 
        accuracy : rhyming score divided by the max rhyming possible : perfect rhyme gives 2 points, assonant rhyme gives 1 point
        the formula for the chosen accuracy is : (#perfect * 2 + #assonant*1) / (#total_rhymes * 2)
        perfect_rhymes : number of perfect rhymes
        assonant_rhymes : number of perfect rhymes
        NON_rhymes : number of Non-rhyme verses
        non_rhyme_pairs : list that contain the non rhyming verses
    """
        
    # match by pair each verse (the structure is AA BB CC...)
    poem, discarded = process_and_pair_poem_ABAB(poem_input)
        
    total_number_rhymes = len(poem)
    
    # values, counts to get statistics
    score_poem = []
    number_perfect_ryhmes = 0
    number_assonant_ryhmes = 0
    number_NON_ryhmes = 0
    
    non_rhyme_pairs = []
    
    # for each pair compute 'how much' they rhyme with _test_rhyme_
    for pair in poem:
        score = test_rhyme(pair[0], pair[1], rhyme_dict)
        
        # counts are updated
        if score == 2:
            number_perfect_ryhmes += 1
        elif score == 1:
            number_assonant_ryhmes += 1
        else: 
            non_rhyme_pairs.append(pair)
            number_NON_ryhmes += 1
            
        score_poem.append(score)

    # compute percentage to display
    accuracy = 100 * (np.sum(score_poem) / (2*total_number_rhymes))
    perfect_rhymes = 100 * (number_perfect_ryhmes / (total_number_rhymes))
    assonant_rhymes = 100 * (number_assonant_ryhmes / (total_number_rhymes))
    NON_rhymes = 100 * (number_NON_ryhmes / (total_number_rhymes)) 
    
    print('The accuracy is :', round_4(accuracy), "%")
    print('The percentage of perfect rhyme is :', round_4(perfect_rhymes), "%")
    print('The percentage of assonant rhyme is :', round_4(assonant_rhymes), "%")
    print('The percentage of NON rhyme is :', round_4(NON_rhymes), "%")
    
    
    return accuracy, perfect_rhymes, assonant_rhymes, NON_rhymes, non_rhyme_pairs

In [65]:
with open('poems_generated_ABAB/poem_ABAB_100.txt', encoding='utf-8') as f:
    generation_to_test = f.readlines()

In [66]:
acc, perfect, good, bad, non_rhyme_pairs = get_ryhme_accuracy_ABAB(generation_to_test, word2rhymes)

Number of total rhymes :  302
Number of discarded lines :  574
The accuracy is : 47.5166 %
The percentage of perfect rhyme is : 43.0464 %
The percentage of assonant rhyme is : 8.9404 %
The percentage of NON rhyme is : 48.0132 %


#### Here is other functions that can compute and display rhyming informations given a generated poem. 

In [12]:
# faster (and uglier), but does not use test_rhymes, in fact
def print_all_rhymes(poem, rhyme_dict):
    lines = poem.split('\n')
    rhymes = [verse2rhyme(line, rhyme_dict) for line in lines]
    print('Rhymes with previous verses\n', ' ' * 8, end='')
    for i in range(1, len(lines)):
        print(i % 10, end=' ')
    print('')
    for i in range(len(lines)):
        print('verse', (i+1) % 10, end=': ')
        for j in range(len(lines)-1):
            if j >= i:
                print('.', end=' ')
            elif rhymes[i][0] == rhymes[j][0]:
                print('2', end=' ')
            elif rhymes[i][1] == rhymes[j][1]:
                print('1', end=' ')
            else:
                print('0', end=' ')
        print('')
    return rhymes

In [13]:
print_all_rhymes(poem_ED, word2rhymes)

Rhymes with previous verses
         1 2 3 4 
verse 1: . . . . 
verse 2: 2 . . . 
verse 3: 2 2 . . 
verse 4: 0 0 0 . 
verse 5: 0 0 0 2 


[['iy', 'iy'], ['iy', 'iy'], ['iy', 'iy'], ['uw', 'uw'], ['uw', 'uw']]

In [14]:
poem_WS = """Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer’s lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm’d;
And every fair from fair sometime declines,
By chance or nature’s changing course untrimm’d;
But thy eternal summer shall not fade
Nor lose possession of that fair thou owest;
Nor shall Death brag thou wander’st in his shade,
When in eternal lines to time thou growest:
So long as men can breathe or eyes can see,
So long lives this and this gives life to thee."""

In [15]:
print_all_rhymes(poem_WS, word2rhymes)

Rhymes with previous verses
         1 2 3 4 5 6 7 8 9 0 1 2 3 
verse 1: . . . . . . . . . . . . . 
verse 2: 0 . . . . . . . . . . . . 
verse 3: 2 0 . . . . . . . . . . . 
verse 4: 1 0 1 . . . . . . . . . . 
verse 5: 0 0 0 0 . . . . . . . . . 
verse 6: 0 0 0 0 0 . . . . . . . . 
verse 7: 0 0 0 0 2 0 . . . . . . . 
verse 8: 0 0 0 0 1 0 1 . . . . . . 
verse 9: 1 0 1 1 0 0 0 0 . . . . . 
verse 0: 0 1 0 0 0 0 0 0 0 . . . . 
verse 1: 1 0 1 1 0 0 0 0 2 0 . . . 
verse 2: 0 0 0 0 0 0 0 0 0 0 0 . . 
verse 3: 0 0 0 0 0 0 0 0 0 0 0 0 . 
verse 4: 0 0 0 0 0 0 0 0 0 0 0 0 2 


[['ey', 'ey'],
 ['aht', 'ah'],
 ['ey', 'ey'],
 ['eyt', 'ey'],
 ['aynz', 'ay'],
 ['ihmd', 'ih'],
 ['aynz', 'ay'],
 ['ayd', 'ay'],
 ['eyd', 'ey'],
 ['ahst', 'ah'],
 ['eyd', 'ey'],
 ['erz', 'er'],
 ['iy', 'iy'],
 ['iy', 'iy']]

In [16]:
poem_WEH = """Out of the night that covers me,
Black as the pit from pole to pole,
I thank whatever gods may be
For my unconquerable soul.
In the fell clutch of circumstance
I have not winced nor cried aloud.
Under the bludgeonings of chance
My head is bloody, but unbowed.
Beyond this place of wrath and tears
Looms but the Horror of the shade,
And yet the menace of the years
Finds and shall find me unafraid.
It matters not how strait the gate,
How charged with punishments the scroll,
I am the master of my fate :
I am the captain of my soul."""

In [17]:
print_all_rhymes(poem_WEH, word2rhymes)

Rhymes with previous verses
         1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
verse 1: . . . . . . . . . . . . . . . 
verse 2: 0 . . . . . . . . . . . . . . 
verse 3: 2 0 . . . . . . . . . . . . . 
verse 4: 0 2 0 . . . . . . . . . . . . 
verse 5: 0 0 0 0 . . . . . . . . . . . 
verse 6: 0 0 0 0 0 . . . . . . . . . . 
verse 7: 0 0 0 0 2 0 . . . . . . . . . 
verse 8: 0 1 0 1 0 0 0 . . . . . . . . 
verse 9: 0 0 0 0 0 0 0 0 . . . . . . . 
verse 0: 0 0 0 0 0 0 0 0 0 . . . . . . 
verse 1: 0 0 0 0 0 0 0 0 0 0 . . . . . 
verse 2: 0 0 0 0 0 0 0 0 0 2 0 . . . . 
verse 3: 0 0 0 0 0 0 0 0 0 1 0 1 . . . 
verse 4: 0 2 0 2 0 0 0 1 0 0 0 0 0 . . 
verse 5: 0 0 0 0 0 0 0 0 0 1 0 1 2 0 . 
verse 6: 0 2 0 2 0 0 0 1 0 0 0 0 0 2 0 


[['iy', 'iy'],
 ['owl', 'ow'],
 ['iy', 'iy'],
 ['owl', 'ow'],
 ['aens', 'ae'],
 ['awd', 'aw'],
 ['aens', 'ae'],
 ['owd', 'ow'],
 ['ehrz', 'eh'],
 ['eyd', 'ey'],
 ['ihrz', 'ih'],
 ['eyd', 'ey'],
 ['eyt', 'ey'],
 ['owl', 'ow'],
 ['eyt', 'ey'],
 ['owl', 'ow']]