# Immersion dictionary generation
2018.05 mitchellpkt@protonmail.com

**Generation of {English text} to {IPA symbols} dictionary.**

Imports the dictionary specified by variable `DictStr`, then uses mphilli's `eng_to_ipa` to convert to IPA using the CMU database.

Part of a project for in-browser English-to-IPA conversion, for practicing IPA by written immersion... In the process, I added multi-OS support to `eng_to_ipa`, merged in here: https://github.com/mphilli/English-to-IPA/pull/4

The output json files can be loaded into FoxReplace extension en masse using import from file or URL.

## Import IPA module

In [1]:
import sys
import os 
basepath = os.getcwd();
sys.path.insert(0,  os.path.join(basepath, "English-to-IPA-master"));
sys.path.insert(0,  os.path.join(basepath, "English-to-IPA-master","eng_to_ipa"));
sys.path.insert(0,  os.path.join(basepath, "English-to-IPA-master","eng_to_ipa","resources"));

# use eng_to_ipa by mphilli for text-to-symbol conversion
# https://github.com/mphilli/English-to-IPA
import eng_to_ipa as ipa

## Test IPA module

In [2]:
ipa.convert("Immersion: If the output contains this string in IPA, then eng_to_ipa is correctly configured")

'ˌɪˈmərʒən: ɪf ðə ˈaʊtˌpʊt kənˈtenz ðɪs strɪŋ ɪn ipa*, ðɛn eng_to_ipa* ɪz kərˈɛktli kənˈfɪgjərd'

## Import English dictionary

In [3]:
DictStr = "top1e5"
# options: 
#     "top1e5" = top 10k English words from Google
#     "pbs" = Public Brand Software dictionary, ~1e6 words... 
#     "CMU_dict" = extracted back from CMU IPA dictionary. ~1.3e6 words

EngDictFile = open(os.path.join("Wordlists",DictStr+'.txt'), 'r'); # open dictionary
EngDict = EngDictFile.read();
EngDictWords = EngDict.split()

## Use eng_to_ipa to store the IPA symbols

Beware, for large dictionaries this can take on the order of 10 minutes. Set qVerbose=1 for intermittent feedback.

In [4]:
qVerbose = 1; # [0 or 1 ... gives word-by-word translation in loop... only for subsets]

e2iDict = [];
counter = 0;
for word_Eng in EngDictWords:
    word_IPA = ipa.convert(word_Eng);
    e2iDict.append([word_Eng, word_IPA]);
    if qVerbose:
        if (counter % 1000)==0:
            print(str(round(counter/len(EngDictWords)*100))+" %");
    if qVerbose > 1:
        print(word_Eng+" --> "+word_IPA);
    
    counter += 1;

0 %
10 %
20 %
30 %
40 %
51 %
61 %
71 %
81 %
91 %


## Review dictionary (iff qVerbose == 1)

In [5]:
qVerbose = 0; # [0 or 1 ... gives word-by-word translation in loop... Do not run on 1e6 words!

if qVerbose:
    from pprint import pprint
    pprint(e2iDict);

## Export dictionary for use with browser plug-in, etc

FoxReplace-style .json

Two variations:
-      IPA alone ... e.g. ˌɪˈmərʒən
-      Eng/IPA ... e.g. ˌɪˈmərʒən (immersion)

### IPA alone

In [6]:
qVerbose = 1;
fileIPA = open("IPA_only_"+DictStr+".json","w");
headerstring = "{\"version\": \"2.1\",\"groups\": [{\"name\": \"\",\"urls\": [],\"substitutions\": [";
tailstring = "],\"html\": \"output\",\"enabled\": true,\"mode\": \"auto&manual\"}]}";
fileIPA.write(headerstring);

count = 0;
for wordpair in e2iDict:
    fileIPA.write("\n{\n");
    fileIPA.write("\"input\": \""+wordpair[0]+"\",\n");
    fileIPA.write("\"inputType\": \"wholewords\",\n");
    fileIPA.write("\"output\": \""+wordpair[1]+"\",\n");
    fileIPA.write("\"caseSensitive\": false\n}");
    count+=1;
    if count < (len(e2iDict)):
        fileIPA.write(",")
        
    if qVerbose:
        if (count % 1000)==0:
            print(str(round(count/len(e2iDict)*100))+" %");
    
fileIPA.write(tailstring)
fileIPA.close();


10 %
20 %
30 %
40 %
51 %
61 %
71 %
81 %
91 %


### IPA (Eng)

In [7]:
fileIPA_ENG = open("IPA_ENG_"+DictStr+".json","w");
headerstring = "{\"version\": \"2.1\",\"groups\": [{\"name\": \"\",\"urls\": [],\"substitutions\": [";
tailstring = "],\"html\": \"inputoutput\",\"enabled\": true,\"mode\": \"auto&manual\"}]}";
fileIPA_ENG.write(headerstring);

count = 0;
for wordpair in e2iDict:
    fileIPA_ENG.write("\n{\n");
    fileIPA_ENG.write("\"input\": \""+wordpair[0]+"\",\n");
    fileIPA_ENG.write("\"inputType\": \"text\",\n");
    fileIPA_ENG.write("\"output\": \""+wordpair[1]+" ("+wordpair[0]+")"+"\",\n");
    fileIPA_ENG.write("\"caseSensitive\": false\n}");
    count+=1;
    if count < (len(e2iDict)):
        fileIPA_ENG.write(",")
        
    if qVerbose:
        if (count % 1000)==0:
            print(str(round(count/len(e2iDict)*100))+" %");
    
fileIPA_ENG.write(tailstring)
fileIPA_ENG.close();


10 %
20 %
30 %
40 %
51 %
61 %
71 %
81 %
91 %


In [8]:
ipa.convert('Immersion dictionary generation complete')

'ˌɪˈmərʒən ˈdɪkʃəˌnɛri ˌʤɛnərˈeʃən kəmˈplit'