<a href="https://colab.research.google.com/github/anthonypinter/atls45195519/blob/main/Day3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Impractical Project #7: Counting Syllables

Objective: Write a Python program that counts the number of syllables in an English word or phrase.

Things to consider:
- Look for numbers at the end of phenomes to indicate vowels (and syllables)

Pseudocode:
 - Take some input from the user
 - Load CMU dictionary
 - Load missing_words.json 
 - Loop on all the words in the input
  - Check if in missing_words.json
    - Take the value from missingwords and add to a counter 
  - Else look in the dictionary file for the word(s)
    - Look for numbers at the end of phenomes to indicate vowels (and syllables)
      - For each vowel, add 1 to counter


In [9]:
"""Find words in haiku corpus missing from cmudict & build exceptions dict."""
import nltk
nltk.download('cmudict')
  

import sys
from string import punctuation
import pprint
import json
from nltk.corpus import cmudict

cmudict = cmudict.dict()  # Carnegie Mellon University Pronouncing Dictionary

def load_haiku(filename):
    """Open and return training corpus of haiku as a set."""
    with open(filename) as in_file:
        haiku = set(in_file.read().replace('-', ' ').split())
        return haiku

def cmudict_missing(word_set):
    """Find and return words in word set missing from cmudict."""
    exceptions = set()
    for word in word_set:
        word = word.lower().strip(punctuation)
        if word.endswith("'s") or word.endswith("’s"):
            word = word[:-2]
        if word not in cmudict:
            exceptions.add(word)
    print("\nexceptions:")
    print(*exceptions, sep='\n')
    print("\nNumber of unique words in haiku corpus = {}".format(len(word_set)))
    print("Number of words in corpus not in cmudict = {}"
          .format(len(exceptions)))
    membership = (1 - (len(exceptions) / len(word_set))) * 100
    print("cmudict membership = {:.1f}{}".format(membership, '%'))
    return exceptions

def make_exceptions_dict(exceptions_set):
    """Return dictionary of words and syllable counts from set of words."""
    missing_words = {}
    print("Input # syllables in word. Mistakes can be corrected at end. \n")
    for word in exceptions_set:
        while True:
            num_sylls = input("Enter number syllables in {}: ".format(word))
            if num_sylls.isdigit():
                break
            else:
                print("                   Not a valid answer!", file=sys.stderr)                    
        missing_words[word] = int(num_sylls)              
    print()
    pprint.pprint(missing_words, width=1)

    print("\nMake Changes to Dictionary Before Saving?")
    print("""
    0 - Exit & Save
    1 - Add a Word or Change a Syllable Count 
    2 - Remove a Word
    """)

    while True:
        choice = input("\nEnter choice: ")   
        if choice == '0':
            break
        elif choice == '1':
            word = input("\nWord to add or change: ")
            missing_words[word] = int(input("Enter number syllables in {}: "
                                            .format(word)))
        elif choice == '2':
            word = input("\nEnter word to delete: ")
            missing_words.pop(word, None)
            
    print("\nNew words or syllable changes:")
    pprint.pprint(missing_words, width=1)

    return missing_words

def save_exceptions(missing_words):
    """Save exceptions dictionary as json file."""
    json_string = json.dumps(missing_words)
    f = open('missing_words.json', 'w')
    f.write(json_string)
    f.close()
    print("\nFile saved as missing_words.json")



[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Package cmudict is already up-to-date!


In [10]:
haiku = load_haiku('train.txt')
exceptions = cmudict_missing(haiku)
build_dict = input("\nManually build an exceptions dictionary (y/n)? \n")
if build_dict.lower() == 'n':
    sys.exit()
else:
    missing_words_dict = make_exceptions_dict(exceptions)
    save_exceptions(missing_words_dict)
    


exceptions:
furue
priestling
hibiscus
camellia
dewdrop
swordhand
windblown
tendrilled
skims
yowl
mooing
morningglory
wisteria
tendrils
moonrise
wintery
storks
stretchings
oranged
deepener
cloudbanks
atsuta
whippoorwill
foregather
woodcutter
carven
paperweights
creepers
inuyasha
spiritless
petaled
nightingales
evenfall
nursemaid
fie
cumulus
lichened
samisen
windless
treeline
archways
samuri
pattering
shadeless
scatters
persimmons
beholders
froglings
bathwater
dusky
watersplash
ridgelines
dragonfly
cloudbank
colour
asakura
battlers
treehouse

Number of unique words in haiku corpus = 1523
Number of words in corpus not in cmudict = 58
cmudict membership = 96.2%

Manually build an exceptions dictionary (y/n)? 
y
Input # syllables in word. Mistakes can be corrected at end. 

Enter number syllables in furue: 2
Enter number syllables in priestling: 2
Enter number syllables in hibiscus: 3
Enter number syllables in camellia: 3
Enter number syllables in dewdrop: 2
Enter number syllables in sword

In [11]:
cmudict

{'a': [['AH0'], ['EY1']],
 'a.': [['EY1']],
 'a42128': [['EY1',
   'F',
   'AO1',
   'R',
   'T',
   'UW1',
   'W',
   'AH1',
   'N',
   'T',
   'UW1',
   'EY1',
   'T']],
 'aaa': [['T', 'R', 'IH2', 'P', 'AH0', 'L', 'EY1']],
 'aaberg': [['AA1', 'B', 'ER0', 'G']],
 'aachen': [['AA1', 'K', 'AH0', 'N']],
 'aachener': [['AA1', 'K', 'AH0', 'N', 'ER0']],
 'aaker': [['AA1', 'K', 'ER0']],
 'aalseth': [['AA1', 'L', 'S', 'EH0', 'TH']],
 'aamodt': [['AA1', 'M', 'AH0', 'T']],
 'aancor': [['AA1', 'N', 'K', 'AO2', 'R']],
 'aardema': [['AA0', 'R', 'D', 'EH1', 'M', 'AH0']],
 'aardvark': [['AA1', 'R', 'D', 'V', 'AA2', 'R', 'K']],
 'aaron': [['EH1', 'R', 'AH0', 'N']],
 "aaron's": [['EH1', 'R', 'AH0', 'N', 'Z']],
 'aarons': [['EH1', 'R', 'AH0', 'N', 'Z']],
 'aaronson': [['EH1', 'R', 'AH0', 'N', 'S', 'AH0', 'N'],
  ['AA1', 'R', 'AH0', 'N', 'S', 'AH0', 'N']],
 "aaronson's": [['EH1', 'R', 'AH0', 'N', 'S', 'AH0', 'N', 'Z'],
  ['AA1', 'R', 'AH0', 'N', 'S', 'AH0', 'N', 'Z']],
 'aarti': [['AA1', 'R', 'T', 'IY2'

Pseudocode:
 - Take some input from the user
 - Load CMU dictionary
 - Load missing_words.json 
 - Loop on all the words in the input
  - Check if in missing_words.json
    - Take the value from missingwords and add to a counter 
  - Else look in the dictionary file for the word(s)
    - Look for numbers at the end of phenomes to indicate vowels (and syllables)
      - For each vowel, add 1 to counter

In [60]:
import sys
from string import punctuation
import json
from nltk.corpus import cmudict

cmudict = cmudict.dict()

with open('missing_words.json') as f:
  missing_words = json.load(f)
#print(missing_words)

def count_syllables(words): # some user input (word or a phrase)
  num_sylls = 0
  #print(words)
  
  #strip punctuation

  for word in words.lower().split():
    #print(word)
    if word in missing_words:
      #print(missing_words[word])
      num_sylls += missing_words[word]
    elif  word in cmudict.keys():
      #print("yes")
      # 'aardvark': [['AA1', 'R', 'D', 'V', 'AA2', 'R', 'K']]
      for phonemes in cmudict[word][0]:
        for phoneme in phonemes:
          #print(phoneme)
          if phoneme.isdigit():
            num_sylls += 1
    else:
      a_missing_word = input("We don't have that word, how many syllables does it have?" )
      a_missing_word = int(a_missing_word)
      num_sylls = num_sylls + a_missing_word

  return num_sylls


user_word = input("Enter a word or phrase")
num_syllables = count_syllables(user_word)
print(num_syllables)

Enter a word or phraseaaa
3


Impractical Project #8: Writing Haikus

Objective: Write a program that generates haiku using Markov chain analysis. Allow the user to modify the haiku by independently regenerating lines two and three.

Pseudocode:
- Load training file
- Build models
  - Markov 1
  - Markov 2
- line creation function (leverages Markov 1 and 2)


In [84]:
import sys
import logging
import random
from collections import defaultdict

with open('train.txt') as f:
  raw_haiku = f.read()
  print(raw_haiku)

corpus = raw_haiku.replace('\n', ' ')
corpus = corpus.replace('!', '')
corpus = corpus.replace(',', '')
corpus = corpus.split()
print(corpus)

def map_word_to_word(corpus):
  limit = len(corpus)-1
  dict1_to_1 = defaultdict(list)
  
  for index, word in enumerate(corpus):
    if index < limit:
      suffix = corpus[index+1]
      dict1_to_1[word].append(suffix)

  return(dict1_to_1)

def map_2_words_to_word(corpus):
  limit = len(corpus)-2
  dict2_to_1 = defaultdict(list)
  
  for index, word in enumerate(corpus):
    if index < limit:
      key = word + ' ' + corpus[index+1]
      suffix = corpus[index+2]
      #suffix = corpus[index+1]
      dict2_to_1[key].append(suffix)

  return(dict2_to_1)

#map_word_to_word(corpus)
#map_2_words_to_word(corpus)

def random_word(corpus):
  word = random.choice(corpus)
  #print(word)
  num_sylls = count_syllables(word)
  #print(num_sylls)
  return(word,num_sylls)

def word_after(prefix, suffix_map, current_syls, target_syls):
  accepted_words = []
  suffixes = suffix_map.get(prefix)
  #print(suffixes)
  for candidate in suffixes:
    num_syls = count_syllables(candidate)
    if current_syls + num_syls <= target_syls:
      accepted_words.append(candidate)    
  final_word = random.choice(accepted_words)
  return(final_word)

word1, num_syls1 = random_word(corpus)

first = word_after(word1, map_word_to_word(corpus), num_syls1, 5)
#word_after('in these', map_2_words_to_word(corpus), 2, 5)
second = word_after(first, map_word_to_word(corpus), num_syls1, 5)
print(first + second)

def write_poetry(goal_syll):
  while syll < goal_syll:
    #run word_after, append to a string

# Disregard Markov Model 2;
# Clean up corpus (remove punctuation)
# Write a function that generates a line of poetry using word_after and lets the user do
# either 5 or 7 syllables.

in these dark waters drawn up from my frozen well glittering of spring standing still at dusk listen in far distances the song of froglings! in silent midnight our old scarecrow topples down weird hollow echo women planting rice ugly every bit about them but their ancient song wild geese write a line flap flapping across the sky comical dutch script dead my old fine hopes and dry my dreaming but still iris, blue each spring in this windy nest open your hungry mouth now in vain my stepchild ballet in the air twin butterflies until, twice white they meet, they mate black cloud bank broken scatters in the night now see moon lighted mountains! seek on high bare trails sky-reflecting violets mountaintop jewels for a lovely bowl let us arrange these flowers since there is no rice now that eyes of hawks in dusky night are darkened chirping of the quails my two plum trees are so gracious see, they flower one now, one later one fallen flower returning to the branch? oh no! a white butterfly clo