# Pronouncing Tutorial and Cookbook

By [Allison Parrish](http://www.decontextualize.com/)

This tutorial will demonstrate how to perform several common tasks with the
Pronouncing library and provide a few examples of how the library can be used
creatively. It's an updated version of the [tutorial and cookbook from the official documentation](https://pronouncing.readthedocs.io/en/latest/tutorial.html).

## Word pronunciations

Let's start by using Pronouncing to get the pronunciation for a given word. Here's the code:

In [2]:
import pronouncing
pronouncing.phones_for_word("permit")

['P ER0 M IH1 T', 'P ER1 M IH2 T']

The `pronouncing.phones_for_word()` function returns a list of all
pronunciations for the given word found in the CMU pronouncing dictionary.
Pronunciations are given using a special phonetic alphabet known as ARPAbet.
[Here's a list of ARPAbet symbols and what English sounds they stand for]
(http://www.speech.cs.cmu.edu/cgi-bin/cmudict#phones). Each token in a
pronunciation string is called a "phone." The numbers after the vowels indicate
the vowel's stress. The number `1` indicates primary stress; `2` indicates
secondary stress; and `0` indicates unstressed. ([Wikipedia has a good
overview of how stress works in English](https://en.wikipedia.org/wiki/Stress_and_vowel_reduction_in_English),
if you're interested.)

Sometimes, the pronouncing dictionary has more than one pronunciation for the
same word. "Permit" is a good example: it can be pronounced either with the
stress on the first syllable ("do you have a permit to program here?") or
on the second syllable ("will you permit me to program here?"). For this
reason, `pronouncing.phones_for_word()` returns a list of
possible pronunciations. (You'll need to come up with your own criteria for
deciding which pronunciation is best for your purposes.)

Here's how to calculate the most common sounds in a given text:

In [4]:
from collections import Counter                               
text = "april is the cruelest month breeding lilacs out of the dead"
count = Counter()                                             
words = text.split()
for word in words:
    pronunciation_list = pronouncing.phones_for_word(word)
    if len(pronunciation_list) > 0:
      count.update(pronunciation_list[0].split(" "))

count.most_common(5)

[('AH0', 4), ('L', 4), ('R', 3), ('D', 3), ('DH', 2)]

## Pronunciation search

Pronouncing has a helpful function `pronouncing.search()` which allows you
to search the pronouncing dictionary for words whose pronunciation matches a
particular regular expression. For example, to find words that have within them
the same sounds as the word "sighs":

In [8]:
phones = pronouncing.phones_for_word("sighs")[0]
pronouncing.search(phones)[:5]

['incise', 'incised', 'incisor', 'incisors', 'malloseismic']

For convenience, word-boundary anchors (`\b`) are added automatically to the
beginning and end of the pattern you pass to `pronouncing.search()`. You're
free to include any other regular expression syntax in the pattern. Here's
another example, which finds all of the words that end in "-iddle":

In [9]:
pronouncing.search("IH1 D AH0 L$")[:5]

['biddle', 'criddle', 'diddle', 'fiddle', 'friddle']

### Rewriting by phone

Another example, which re-writes a text by taking each word and replacing it
with a random word that begins with the same first two phones:

In [30]:
import random
text = 'april is the cruelest month breeding lilacs out of the dead'
out = list()
for word in text.split():
    phones = pronouncing.phones_for_word(word)[0]
    first2 = phones.split()[:2]
    out.append(random.choice(pronouncing.search("^" + " ".join(first2))))
print(' '.join(out))

apec izzy's them crosley multifoods brunches lifer outlined of themselves devils


## Counting syllables

To get the number of syllables in a word, first get one of its pronunciations
with `pronouncing.phones_for_word()` and pass the resulting string of
phones to `pronouncing.syllable_count()`, like so:

In [12]:
pronunciation_list = pronouncing.phones_for_word("programming")
pronouncing.syllable_count(pronunciation_list[0])

3

The following example calculates the total number of syllables in a text
(assuming that all of the words are found in the pronouncing dictionary):

In [15]:
text = "april is the cruelest month breeding lilacs out of the dead"
phones = [pronouncing.phones_for_word(p)[0] for p in text.split()]
sum([pronouncing.syllable_count(p) for p in phones])

15

## Meter

Pronouncing includes a number of functions to help you isolate metrical
characteristics of a text. You can use `pronouncing.stresses()`
to get a string that represents the "stress pattern" of a string of
phones:

In [16]:
phones_list = pronouncing.phones_for_word("snappiest")
pronouncing.stresses(phones_list[0])

'102'

A "stress pattern" is a string that contains only the stress values from a
sequence of phones. (The numbers indicate the level of stress: ``1`` for
primary stress, ``2`` for secondary stress, and ``0`` for unstressed.)

You can use `pronouncing.search_stresses()` to find words based on their
stress patterns. For example, to find words that have two iambs in them
("iamb" is a metrical foot consisting of one unstressed syllable followed by
a stressed syllable):

In [26]:
pronouncing.search_stresses("0101")

['champaign-urbana',
 'chlorofluorocarbons',
 'coagulation',
 'computer-generated',
 'eosinophilia',
 'eosinophilic',
 'evacuation',
 'evacuations',
 'exclamation-point',
 'garrido-luna',
 'intergenerational',
 'japanimation',
 'kamehameha',
 "kamehameha's",
 'laryngoscopic',
 'laryngoscopic',
 'laryngoscopical',
 'laryngoscopical',
 'laryngoscopicaly',
 'laryngoscopicaly',
 'miscalculation',
 'miscalculations',
 'moholy-nagy',
 'momigliano',
 'overwhelmability',
 'regeneration',
 'revaluation',
 'ring-around-the-rosy',
 'rio-de-janeiro',
 'yabbadabbadoo',
 'ylang-ylang']

If you sound out a few of the words above, you'll notice that (like `pronouncing.search()`), this function does *not* anchor regular expressions by default. The list above includes all words that *contain* two iambs, not all words that consist *only* of two iambs. To find the latter, add the `^` and `$` regular expression anchors to the beginning and end of the search string:

In [31]:
pronouncing.search_stresses("^0101$")

['ylang-ylang']

You can use regular expression syntax inside of the patterns you give to
`pronouncing.search_stresses()`. For example, because the CMU Pronouncing Dictionary's transcriptions are not especially systematic when it comes to assigning primary and secondary stress, it's often useful to search for syllables that have either of the two assigned. Repeating the above search for words consisting of two consecutive iambs, but looking for both primary and secondary stress returns many more results (so many that I just show the first handful of results):

In [33]:
pronouncing.search_stresses("^0[12]0[12]$")[:12]

['abbreviate',
 'abbreviates',
 'abdulaziz',
 'abilities',
 'ability',
 'accelerate',
 'accelerates',
 'accentuates',
 'accessorize',
 'accessorized',
 'accommodate',
 'accommodates']

Another quick example: find all words wholly consisting of two anapests (unstressed, unstressed, stressed), with "stressed" meaning either primary stress or secondary stress:

In [27]:
pronouncing.search_stresses("^00[12]00[12]$")

['neopositivist', 'undercapitalize', 'undercapitalized']

### Rewriting by stress

The following example rewrites a text, replacing each word with a random word
that has the same stress pattern::

In [35]:
out = []
text = 'april is the cruelest month breeding lilacs out of the dead'
for word in text.split():
    pronunciations = pronouncing.phones_for_word(word)
    pat = pronouncing.stresses(pronunciations[0])
    replacement = random.choice(pronouncing.search_stresses("^"+pat+"$"))
    out.append(replacement)
' '.join(out)

"clubbing thune hye pilots' strolled bolduc equine last's maxed who've flo"

## Rhyme

Pronouncing includes a simple function, `pronouncing.rhymes()`, which
returns a list of words that (potentially) rhyme with a given word. You can use
it like so:

In [37]:
pronouncing.rhymes("failings")

['mailings', 'railings', 'tailings']

The `pronouncing.rhymes()` function returns a list of all possible rhymes
for the given word---i.e., words that rhyme with any of the given word's
pronunciations. If you only want rhymes for one particular pronunciation, the
the `pronouncing.rhyming_part()` function gives a smaller part of a string
of phones that can be used with `pronouncing.search()` to find rhyming
words. The following code demonstrates how to find rhyming words for two
different pronunciations of "uses":

In [38]:
pronunciations = pronouncing.phones_for_word("uses")
sss = pronouncing.rhyming_part(pronunciations[0])
zzz = pronouncing.rhyming_part(pronunciations[1])
print(pronouncing.search(sss + "$")[:5])
print(pronouncing.search(zzz + "$")[:5])

["bruce's", 'juices', 'medusas', 'produces', 'reduces']
['abuses', 'cabooses', 'disabuses', 'excuses', "goose's"]


Use the ``in`` operator to check to see if one word rhymes with another::

In [39]:
"wheeze" in pronouncing.rhymes("cheese")

True

In [40]:
"geese" in pronouncing.rhymes("cheese")

False

### Rewriting with rhymes

The following example rewrites a text, replacing each word with a rhyming
word (when a rhyming word is available):

In [42]:
text = 'april is the cruelest month breeding lilacs out of the dead'
out = list()
for word in text.split():
    rhymes = pronouncing.rhymes(word)
    if len(rhymes) > 0:
        out.append(random.choice(rhymes))
    else:
        out.append(word)
print(' '.join(out))

april chandeliers the coolest month reading counterattacks gout above the red


## Next steps

Hopefully this is just the beginning of your rhyme- and meter-filled journey.
Consult [the pronouncing documentation](https://pronouncing.readthedocs.io/en/latest/) for more information about individual functions in the library.

Pronouncing is just one possible interface for the CMU pronouncing dictionary,
and you may find that for your particular purposes, a more specialized
approach is necessary. In that case, feel free to [peruse Pronouncing's source
code](http://github.com/aparrish/pronouncingpy) for helpful hints and tidbits.