Your task in this option is create a random Haiku generator program. A Haiku is a poem like:

Whitecaps on the bay: <br>
A broken signboard banging <br>
In the April wind. <br>
— Richard Wright, collected in Haiku: This Other World, 1998, copied from <a href = "https://en.wikipedia.org/wiki/Haiku">Wikipedia</a>

A haiku is defined not by a rhyme pattern, but by the number of syllables in each line. Traditionally, a haiku has three lines: <br>
First: Five syllables. <br>
Seven in the second line, <br>
and Five in the third. <br>
— Matt Haberland

Your random haiku genarator will generate haikus worthy of literary praise almost surely. Of course, it will generate many, many more bad haikus, like:

gnatcatcher julep <br>
renewable unite male <br>
miscreation loll <br>
— Matt Haberland's random haiku generator

Before you begin, you might need to use the NLTK downloader to get the corpora <tt>cmudict</tt> and <tt>words</tt>. If they are already installed, the following should succeed.

In [89]:
from nltk.corpus import cmudict
from nltk.corpus import words

In [98]:
words.words()

AttributeError: 'list' object has no attribute 'words'

If not, you need to get them.

In [78]:
import nltk
nltk.download()

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True

I'm going to give you some help. <tt>cmudict.dict()</tt> returns a Python dictionary in which each key is a word and the corresponding value is a <i>list</i> containing ways of pronouncing the word. When there is more than one pronunciation, the list has more than one element. I suggest you explore some entries in <tt>cmyduct.dict()</tt> to get a better sense of what's going on. Try looking up the pronunciation "hello" and "goodbye". <i>Don't worry at all if you don't understand how to interpret the pronunciations. I don't. It's irrelevant for this problem. The only point is that each key is a word and the corresponding value is a <i>list</i> containing some representation of ways of pronouncing the word</i>.

In [79]:
d = cmudict.dict()
# Explore this object.
# Suggestion: don't print it all. 
# That would take a while...

In [80]:
d.keys()
print d["hello"]

[[u'HH', u'AH0', u'L', u'OW1'], [u'HH', u'EH0', u'L', u'OW1']]


Based on this, we can write a function that determines the number of syllables in a given word.

In [81]:
def nsyl(word):
  return [len(list(y for y in x if y[-1].isdigit())) for x in d[word.lower()]]
# I did not write this. 
# Tomorrow I'll share the source...
# if you ask nicely.

You don't need to understand <i>exactly</i> how the function works, because that would require understanding how the dictionary represents pronunciations. But in short, <tt>nsyl</tt> does some processing on the pronunciations to determine the number of syllables in each pronunciation. Before proceeding, I suggest you try it out on "hello", "goodbye", and maybe some other common words to get a sense of how it works. 

In [82]:
nsyl("hello")

[2, 2]

1) Create a dictionary <tt>d2</tt> in which each key is an integer and the corresponding value is a list of all words with that many syllables. <i>For words with multiple pronunciations, consider only the first pronunciation. </i>

In [99]:
d2 = {}
words = []
keys = sorted(d.keys())
for i in keys:
    if i in words.words():
        if not nsyl(i)[0] in d2:
            d2[nsyl(i)[0]] = [i]
        else:
            (d2[nsyl(i)[0]]).append(i)

AttributeError: 'list' object has no attribute 'words'

2) One word in the dictionary contains more syllables than any other. Print this word.

In [None]:
print max(d2.keys()), d2[max(d2.keys())] 

3) Print the number of words with a given number of syllables like:
<pre>
0: 4
1: 16240
2: 56982
</pre>
etc... <br>
Note that there are are some words with zero syllables. That's fine. Not all "words" in the dictionary are real English words. We'll revisit this in the very last step.

In [None]:
for i in d2.keys():
    print i, ":", len(d2[i])

4) Create a histogram with the title "Number of Syllables in English Words" using Plotly (offline). I want you to use the <tt>Histogram</tt> trace, but if you can't figure out how to do that you may use a different type of trace (slight penalty). Whatever you do, the frequency/count of words of a given number of syllables should be represented by a vertical bar.

In [None]:
import plotly.plotly as py
#py.init_notebook_mode()
from plotly.graph_objs import Layout, Histogram, Figure, Data

In [None]:
new_x = []

for i in d2.keys():
    for j in range(len(d2[i])):
        new_x.append(i)
trace = Histogram(x = new_x)
data = [trace]
layout = Layout(title = "Number of Syllables in English Words")
fig = Figure(data = data, layout = layout)
py.iplot(fig)

5) Write a function <tt>sylPattern(n)</tt> that returns a list of random integers that sum to <tt>n</tt>. (Later, you will choose for each element of the list a random word with the given number of syllables.) Your function should work for any n > 1. Test it for n = 15. For example:
<pre>
x = sylPattern(15)
print x
print sum(x) == 15
</pre>
should print something like:
<pre>
[1, 5, 1, 1, 5, 2]
True
</pre>
<i>Hint: You don't need to know any special functions to do this; all you need is basic random number generation capability from the </i><tt>random</tt><i> module. The rest of the algorithm is up to you.</i>

In [84]:
import random
def sylPattern(n):
    numlist = []
    k = random.randint(1,n)
    numlist.append(k)
    remain = n - k
    while sum(numlist) < n:
        k = random.randint(1, remain)
        remain = remain - k
        
        if remain < 0:
            numlist = []
        elif remain == 0:
            numlist.append(k)
            break
        numlist.append(k)

    return numlist   
    
x = sylPattern(15)
print x
print sum(x) == 15

[7, 5, 1, 1, 1]
True


6) Write and test a function <tt>randWord(n)</tt> that returns a random word with <tt>n</tt> syllables. For instance:
<pre>
print randWord(6)
</pre>
shows something like:
<pre>
amiability
</pre>

In [85]:
def randWord(n):
    randindex = random.randint(1,len(d2[n]))
    return d2[n][randindex]
print randWord(6)

atherosclerosis


7) Write and test a function <tt>randLine(n)</tt> that returns a line with <tt>n</tt> syllables (separated by spaces). For instance:
<pre>
randLine(10)
</pre>
shows something like:
<pre>
porcupine melodrama gable scot
</pre>

In [86]:
def randLine(n):
    words = [randWord(i) for i in sylPattern(n)]
    wordstr = ""
    for i in words:
         wordstr += " {}".format(i)
    return wordstr
print randLine(10)

 scratch algeo applicability


8) Finally, write and test haiku(). For instance:
<pre>
print haiku()
</pre>
should show a haiku formatted like:
<pre>
psalm degenerate
lapsed land mend holl franchiser
chia ill pint draft
</pre>

In [87]:
def haiku():
    wordlist = [randLine(5), randLine(7), randLine(5)]
    wordstr = ""
    for i in wordlist:
        wordstr += "{}\n".format(i)
    return wordstr
print haiku()

 editha blaesing
 licitra mentions mann theall
 marietta pih



9) You might not recognize all the words in your haikus. That's partly because <tt>d = cmudict.dict()</tt> contains many proper nouns and some strings that aren't really words, like "ths". For the last little bit of credit, go back and make sure that your dictionary <tt>d2</tt> only contains words that are <i>also</i> in <tt>words.words</tt>, which is a list of true English words. Note that if you are not careful, assembling your <tt>d2</tt> with this criterion can take a very long time. If you want these points, your solution should be reasonably efficient. (Assembling <tt>d2</tt> should take no more than a few seconds.) <i>Hint: in general, searching through a list is slow.</i>