### Helper Classes

First we get all of our helper modules. The prepare_EMG module will prepare the EMG data for phoneme recognition. The prepare_outputs module will prepare our target labels and align them with our EMG data. The module 'prepare_data' will help us read data from CSV into a dataframe. Finally, 'vis' will help visualize EMG data in both time and frequency domains. 

In [22]:
%load_ext autoreload
%autoreload 2

import prepare_EMG, prepare_outputs, prepare_data, vis
autodetector = Output_Prep.detector
EMG_Prep = prepare_EMG.EMG_preparer(window_size=30.0)
Output_Prep = prepare_outputs.output_preparer(subvocal_detector = autodetector, window_size=30.0)
Data_Prep = prepare_data.data_preparer()



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Labeling the Data

First, we need to visualize a few EMG voltage graphs to find some sections that most likely contain no subvocalization. Then, we'll need to find some regions that almost certainly do. These two classes of EMG readouts will serve to train an identifier to help us automatically label EMG windows with phonemes. The model used here will most likely be an SVC, inside "prepare_outputs". It will process each EMG window in order, and when it finds one that most likely contains subvocalization, it applies the next phoneme as that window's label. 

In [2]:
data_1 = Data_Prep.load('Sat Mar  4 00:44:23 2017')
data_1[:3000]

data_no_sv = Data_Prep.sv_detection()
data_no_sv

(       time   voltage
 0         2 -0.412500
 1         3 -0.812109
 2         4 -1.057031
 3         5 -1.237500
 4         6 -1.353516
 5         7 -1.456641
 6         8 -1.521094
 7         9 -1.559766
 8        10 -1.598437
 9        11 -1.611328
 10       12 -1.624219
 11       13 -1.637109
 12       14 -1.650000
 13       15 -1.650000
 14       16 -1.650000
 15       17 -1.650000
 16       18 -1.650000
 17       19 -1.637109
 18       20 -1.624219
 19       21 -1.624219
 20       22 -1.624219
 21       23 -1.637109
 22       24 -1.637109
 23       25 -1.637109
 24       26 -1.637109
 25       27 -1.624219
 26       28 -1.611328
 27       29 -1.624219
 28       30 -1.637109
 29       31 -1.650000
 ...     ...       ...
 28716  1972 -1.611328
 28717  1973 -1.637109
 28718  1974 -1.650000
 28719  1975 -1.650000
 28720  1976 -1.650000
 28721  1977 -1.650000
 28722  1978 -1.650000
 28723  1979 -1.650000
 28724  1980 -1.637109
 28725  1981 -1.533984
 28726  1982 -1.533984
 28727  198

In [21]:
a = 4/2
b = 4//2

a,b

(2.0, 2)

In [26]:
phoneme_list = Output_Prep.transform('Well Hello There!')
phoneme_list_2 = Output_Prep.transform("What's for dinner?")

print (phoneme_list, phoneme_list_2)



wat W
wat EH
wat L
wat HH
wat AH
wat L
wat OW
wat DH
wat EH
wat R
wat W
wat AH
wat T
wat S
wat F
wat AO
wat R
wat D
wat IH
wat N
wat ER
['W', 'EH1', 'L', 'HH', 'AH0', 'L', 'OW1', 'DH', 'EH1', 'R'] ['W', 'AH0', 'T', 'S', 'F', 'AO1', 'R', 'D', 'IH1', 'N', 'ER0']


### Our Phonemes

We use a counter to see what phonemes nltk actually has in store for us, then we sort and display them in order. We can then better fill-in articulatory features from the 50 or so the rasipuram paper has. The nltk docs say there's only 39, but clearly there are a bit more here. We'll try to fill in articulatory features for all of the phonemes here, duplicating ones where we don't have unique AF's from the paper. We need to specify AF's for all of these because these are the phonemes we'll be classifying. 

In [14]:
from collections import Counter
phonemes = Counter()
values = Output_Prep.arpabet.values()
for list_1 in values:
    for list_2 in list_1:
        if len(list_2) == 1:
            phonemes.update(list_2)
        else:
            for item in list_2:
                phonemes.update([str(item)])

In [15]:
keys = list(phonemes.keys()) 
keys.sort()
keys

[autoreload of prepare_outputs failed: Traceback (most recent call last):
  File "/home/brian/anaconda3/lib/python3.6/site-packages/IPython/extensions/autoreload.py", line 247, in check
    superreload(m, reload, self.old_objects)
  File "/home/brian/Documents/Projects/MLND/p5/MLND-Subvocal/prepare_outputs.py", line 108
    vector =
           ^
SyntaxError: invalid syntax
]


['AA0',
 'AA1',
 'AA2',
 'AE0',
 'AE1',
 'AE2',
 'AH0',
 'AH1',
 'AH2',
 'AO0',
 'AO1',
 'AO2',
 'AW0',
 'AW1',
 'AW2',
 'AY0',
 'AY1',
 'AY2',
 'B',
 'CH',
 'D',
 'DH',
 'EH0',
 'EH1',
 'EH2',
 'ER0',
 'ER1',
 'ER2',
 'EY0',
 'EY1',
 'EY2',
 'F',
 'G',
 'HH',
 'IH0',
 'IH1',
 'IH2',
 'IY0',
 'IY1',
 'IY2',
 'JH',
 'K',
 'L',
 'M',
 'N',
 'NG',
 'OW0',
 'OW1',
 'OW2',
 'OY0',
 'OY1',
 'OY2',
 'P',
 'R',
 'S',
 'SH',
 'T',
 'TH',
 'UH0',
 'UH1',
 'UH2',
 'UW',
 'UW0',
 'UW1',
 'UW2',
 'V',
 'W',
 'Y',
 'Z',
 'ZH']