<a href="https://colab.research.google.com/github/MK316/workshop22/blob/main/class03_pronunciation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#📘 **Topic 03 Pronunciation teaching**

**Table of Contents:**  
using **{gTTS}** Text-to-Speech & CMU pronunciation dictionary.  

* Exposure to Keyword pronunciation (using 📍_frequency distribution, gTTS_)
* English rhyming (using 📍_CMU dictionary_): e.g., night, right, bite, etc.
* Learning English vowels with rhyming words.


* [🚫 Not Yet Available] Phontics for adult learners (using 📍_gTTS_: ): Sound letter correspondences

💾 Sample text: [Aesop fable: The Heron](https://raw.githubusercontent.com/MK316/workshop22/main/data/TheHeron.txt) Copy and get it ready to past below :-)

In [1]:
#@markdown 🔳 Paste the text here for analysis: (text)
text = input()

 A Heron was walking sedately along the bank of a stream, his eyes on the clear water, and his long neck and pointed bill ready to snap up a likely morsel for his breakfast. The clear water swarmed with fish, but Master Heron was hard to please that morning. "No small fry for me," he said. "Such scanty fare is not fit for a Heron. "Now a fine young Perch swam near. "No indeed," said the Heron. "I wouldn't even trouble to open my beak for anything like that!" As the sun rose, the fish left the shallow water near the shore and swam below into the cool depths toward the middle. The Heron saw no more fish, and very glad was he at last to breakfast on a tiny Snail. Do not be too hard to suit or you may have to be content with the worst or with nothing at all.


> 🔲 **Preprocessing of the text**

Set up for processing: no action required.

In [23]:
#@markdown 🔳 Install and import packages
%%capture
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
nltk.download("punkt")

!pip install corpus-toolkit

#@markdown 🔳 Create a foloder named "txtdata" for further processing
import os
os.mkdir("txtdata")

#@markdown 🔳 Write text to a file under 'txtdata' folder

with open('txtdata/mytext.txt','w') as f:
  f.write(text)


> 🔲 **Frequency analysis:** your actions required (2 times)

In [None]:
#@markdown 🔳 Tokenize, getting frequency list with tagging information

from nltk.tokenize import RegexpTokenizer
retokenize =RegexpTokenizer("[\w]+")
words = retokenize.tokenize(text)
words
print('Before stopwords: %d'%len(words))

# Lower case
wlist = []
for w in words:
  w1 = w.lower()
  wlist.append(w1)

words = wlist
#@markdown 🔳 Remove stopwords for frequency distribution analysis:

# import stopwords from nltk.corpus

from nltk.corpus import stopwords
nltk.download("stopwords")

words = [w for w in words if not w in stopwords.words('english')]
print('After stopwords: %d'%len(words))

#@markdown 🔳 POS tagging

from corpus_toolkit import corpus_tools as ct

brown_corp = ct.ldcorpus("txtdata") #load and read text files under 'txtdata' directory
tok_corp = ct.tokenize(brown_corp)  #tokenize corpus - by default this lemmatizes as well
brown_freq = ct.frequency(tok_corp) #creates a frequency dictionary

ct.write_corpus("tagged_txt",ct.tag(ct.ldcorpus("txtdata")))

tagged_freq = ct.frequency(ct.reload("tagged_txt"))
# ct.head(tagged_freq, hits = 10)

#@markdown 🔳 Result saving as a csv file with POS information

import pandas as pd
data_dict = tagged_freq
data_items = data_dict. items()
data_list = list(data_items)
df = pd.DataFrame(data_list)

df.columns=['Tagged_words','Freq']

mycol = list(df['Tagged_words'])

# print(df)

# Word, POS into dataframe

wlist = []
cat = []

for w in mycol:
  w1 = w.split("_")
  wlist.append(w1[0])
  cat.append(w1[1])

df['Word'] = wlist
df['POS'] = cat

#@markdown 🔳 🚩 Sorting by? Answer [pop up box]

print("Sorting by Frequency (type '1'), POS & Freq (type '2'), or by Word alphabetically (type '3')")
sorting = input()

for t in sorting:
  if t == "1":
    df = df.sort_values(by=['Freq'], ascending = False)
  if t == "2":
    df = df.sort_values(by=['POS', 'Freq'], ascending = False)
  if t == "3":
    df = df.sort_values(by=['Word'], ascending = True)
  else:
    print("Type 1, 2, or 3")
df['Index'] = range(1,len(df['POS'])+1)

df = df[["Index", "POS", "Word","Freq"]]
# print df.to_string(index=False)

#@markdown 🔳 🚩 Saving file? Answer [pop up box]

print('Save it as a file? (y/n)')
saving = input()

for s in saving:
  if s == "y":
    with open('pos_wordlist.csv','w') as f:
      df.to_csv(f)
    print('File is saved: pos_wordlist.csv')
  if s == "n":
    print('No file will be saved.')

df.head()


## **[1] Generating audio file of word reading**  
Result file => df

In [110]:
#@markdown 🚩 {gTTS} package installation and import
%%capture
!pip install gTTS
from gtts import gTTS
from IPython.display import Audio

In [109]:
#@markdown Word reading by gTTS:

#@markdown 🚩 Select word POS: 

word_POS_select = "NOUN" #@param = ["NOUN","VERB","ADJ","ADV","PROPN","ALL"]

wordlist = df[df['POS'] == word_POS_select]
wordlist = wordlist.sort_values(by=['Word'], ascending = True)

collist = list(wordlist['Word'])

print(collist)

#@markdown 🚩 Language to choose: (english, korean, french, spanish)
def tts(mytext):
  text_to_say = mytext

# Step ⓵ Language to choose:
  language_to_choose = "en" #@param ["en", "ko", "fr", 'es']
  # lang = language_to_choose
  language = language_to_choose
  print("Play language accent: %s"%language_to_choose)

  gtts_object = gTTS(text = text_to_say,
                     lang = language,
                    slow = False) 
  
  gtts_object.save("mytext.wav")
  return Audio("mytext.wav")

text_to_say = '. '.join(collist)
intro_text = "Okay. I'm going to read a wordlist, so repeat after me."
text_to_say = intro_text + text_to_say
tts(text_to_say)


['bank', 'beak', 'bill', 'breakfast', 'depth', 'eye', 'fare', 'fish', 'fry', 'middle', 'morning', 'morsel', 'neck', 'scanty', 'shore', 'stream', 'sun', 'swam', 'water']
Play language accent: en


##**[2] Rhyming words**

**_Note:_** In the following, you'll get rhyming words from CMU dictionary.  
* [CMU pronounciation dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict)

* [CMU tools](http://www.speech.cs.cmu.edu/tools/lextool.html)


😊 Let's first set up target words for today.

In [136]:
#@markdown **Step 1:** Install {pronouncing}
%%capture
!pip install pronouncing

import pronouncing

\[Description]: 

* For a target word (your choice), we'll find rhyming words with same number of syllables. _e.g., 'grow' (1 syllable) => The result will show one-syllabled word list (randomly chosen from cmu dictionary)_

* We'll then create an audio file reading the rhyming wordlist.

In [170]:
#@markdown **Step 2:** 	🚩 Find rhyming words with ____:

print("Write a word to search rhyming words:")
rhyme_with = input()
word = rhyme_with

phones = pronouncing.phones_for_word(word)
syll_count = pronouncing.syllable_count(phones[0])

print("Okay, I'm searching for rhyming words with %s."%word)

result = pronouncing.rhymes(word)

print('How many rhyming words to search? (1~20')
n_words = input()
n_words = int(n_words)

# Among the result, select words with same syllable count

wlist = []
throw = []

def syllcount(x):
  phones = pronouncing.phones_for_word(x)
  n = pronouncing.syllable_count(phones[0])
  return n

for w in result:
  if syllcount(w) == syll_count:
    wlist.append(w)
  else:
    throw.append(w)

# random sample
import random
from random import shuffle

b = random.sample(wlist, n_words)

temp = b[:n_words]

wlist = '. '.join(temp)
intro = "These words rhyme with " + str(rhyme_with) + ". " + "Are you ready to listen?"

wlist = intro + wlist
print("Rhyming words: %s"%wlist)

#@markdown tts() function should be defined above [1]:
tts(wlist)


Write a word to search rhyming words:
grow
Okay, I'm searching for rhyming words with grow.
How many rhyming words to search? (1~20
20
Rhyming words: These words rhyme with grow. Are you ready to listen?noh. m'bow. noe. ro. au. glow. vo. floe. owe. sgro. breau. choe. lo. gogh. roe. stow. koh. crow. beaux. crowe
Play language accent: en


##**[3] Larning English vowels with rhyming words**

### English vowels: 14 vowels

![](https://github.com/MK316/workshop22/raw/main/img/englishvowels1.png)

In [171]:
#@markdown **Step 1:** Install {pronouncing}
%%capture
!pip install pronouncing
!pip install gTTS

from gtts import gTTS
from IPython.display import Audio
import pronouncing

In [195]:
#@markdown **Step2:** 🚩 Vowels to learn:

Target_vowel_as_in = "mud" #@param ["bean","bin","Ben","ban","mood","would","bought","nod", "mud", "go","how","bay","bye","boy"]

word = Target_vowel_as_in

# phones = pronouncing.phones_for_word(word)
# syll_count = pronouncing.syllable_count(phones[0])

result = pronouncing.rhymes(word)

# syllable number counting as function
def syllcount(x):
  phones = pronouncing.phones_for_word(x)
  n = pronouncing.syllable_count(phones[0])
  return n

wlist = []

for w in result:
  if (syllcount(w) == 1 or 2) and (len(w) > 1):
    wlist.append(w)
  else:
    throw.append(w)

print("Rhyming words: %d"%len(wlist))

How_many_words_to_show = "10" #@param = ["5","10","15","20","30"]
n_words = int(How_many_words_to_show)
# random sample
import random
from random import shuffle

b = random.sample(wlist, n_words)

temp = b[:n_words]

# wordlist as dataframe to display
import pandas as pd
dft = pd.DataFrame()
dft['Words'] = temp

wlist = '. '.join(temp)
intro = "These are randomly chosen " + str(n_words) + " words that rhyme with " + str(word) + ". " + "Are you ready to listen? "

wlist = intro + wlist
print("Rhyming words: \n %s"%wlist)
print("*** Target vowel in '%s'"%word)
print(dft)
#@markdown tts() function should be defined above [1]:
tts(wlist)


Rhyming words: 26
Rhyming words: 
 These are randomly chosen 10 words that rhyme with mud. Are you ready to listen? flood. dud. spud. thud. budd. dudd. mudd. judd. uhde. rud
*** Target vowel in 'mud'
   Words
0  flood
1    dud
2   spud
3   thud
4   budd
5   dudd
6   mudd
7   judd
8   uhde
9    rud
Play language accent: en


## [4] Phonics for adult learners: Letter-Sound correspondences