Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace cmudict-0.4 dictionary with amepd. #36

Merged
merged 1 commit into from May 18, 2016
Merged

Replace cmudict-0.4 dictionary with amepd. #36

merged 1 commit into from May 18, 2016

Conversation

forslund
Copy link
Collaborator

@forslund forslund commented May 5, 2016

What it is

Amepd (The american pronounciation dictionary) is a dictionary based on cmudict-0.7 with many additions. The source dictionary is available at http://github.com/rhdunn/amepd.

This would resolve the wrongly pronounced words listed in #15. @rhdunn the maintainer of the dictionary will add "Mycroft" to the dictionary in the next update and this should not be pulled before that time.

In the meantime it would be great if the dictionary was used and verified.

@rhdunn
Copy link
Contributor

rhdunn commented May 6, 2016

I have added Mycroft in the latest master commit to amepd, along with several new words and some pronunciation fixes.

@forslund
Copy link
Collaborator Author

forslund commented May 6, 2016

Great! I'm currently out camping, but I'll update the PR with your updates when I return home (tomorrow evening, sunday at the latest)

@ryanleesipes
Copy link

I'm also pulling this down to play with it. I'll try to find some time to provide feedback.

@rhdunn
Copy link
Contributor

rhdunn commented May 10, 2016

Not a problem with the dictionary, but as minic is removing ' entries:

  1. 'd sounds wrong in places -- it is not always /IH0 D/:

a. word ends with /D/ or /T/, or is an adjective: use /IH0 D/;
b. the word ends with a voiced consonant or a vowel (e.g. CALL'D): use /D/;
c. otherwise the word ends with an unvoiced vowel (e.g. BLESS'D as a verb): use /T/.

NOTE: This also needs to handle words like ENCLOS'D, where the O takes the OW rule for oCe words (an o, followed by a consonant, followed by an e).
2. 'st is not supported -- found e.g. in William Shakespeare's "The Phoenix and the Turtle" (Project Gutenberg 1525), such as GIV'ST (i.e. an -est ending, expanded to GIVEST): should be /IH0 S T/

@forslund
Copy link
Collaborator Author

Is 1 worse with the new dict than with the old cmudict-0.4? Can you give an example where mimic fails?

@forslund
Copy link
Collaborator Author

Just noticed the examples. Sorry about that :)

@rhdunn
Copy link
Contributor

rhdunn commented May 10, 2016

Not worse -- those words I gave as examples are where stripping out ' entries does not always give the correct pronunciations in the listed cases with mimics handling of ' entries. I'm using "The Phoenix and the Turtle" as a test case.

@forslund
Copy link
Collaborator Author

Ok. I'll have to dig into the code and see if we can improve this. At least for systems with larger memory...

@zeehio
Copy link
Contributor

zeehio commented May 16, 2016

If I am not mistaken @rhdunn is taking care of the apostrophes in #39.

If I understand this properly, @forslund you rebuilt the cmulex using the amepd dictionary and this pull request has the resulting models. To build this you used the lang/cmulex/make_cmulex script, replacing manually the lang/cmulex/festival/lib/dict/cmudict-0.4.scm file by the amepd dictionary, am I right?

@forslund
Copy link
Collaborator Author

@zeehio basically, yes. I changed the Makefile target for cmudict-0.4.out so that it's currently

cat amepd.scm cmudict_extensions.scm >all.scm
        ${ESTDIR}/../festival/bin/festival -b cmudict_compile.scm
        rm -f all.scm

and rebuilt the outfile. Also I created a no-setup target in the make-cmulex script

if [ $1 = "no-setup" ]
then
   $0 lts  || exit 1
   $0 lex  || exit 1
   $0 compresslex || exit 1
   $0 install || exit 1
   echo "make_cmulex finished successfully"
   exit 0
fi

So the files weren't overwritten.

@zeehio
Copy link
Contributor

zeehio commented May 18, 2016

@forslund, keeping the first commit of this PR does not make much sense. Could you please squash them into one commit?

As I see it, feel free to merge :-)

Amepd (The american pronounciation dictionary) is a dictionary based on cmudict-0.7 with many additions. The source dictionary is available at http://github.com/rhdunn/amepd.

The updated dictionary is based on amepd from May 6th.
@codecov-io
Copy link

Current coverage is 15.85%

Merging #36 into master will not change coverage

@@             master        #36   diff @@
==========================================
  Files            89         89          
  Lines          9487       9487          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           1504       1504          
  Misses         7983       7983          
  Partials          0          0          

Powered by Codecov. Last updated by e30edcd...6c64a3d

@forslund forslund merged commit 25d6601 into development May 18, 2016
@forslund
Copy link
Collaborator Author

After doing the manual squash I noticed the "squash and merge" github button. Anyway it's squashed and merged now.

@zeehio
Copy link
Contributor

zeehio commented May 18, 2016

If I had known it existed I'd have merged myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants