## Testing pocketsphinx decoder
Code from pocketsphinx decoder_test.py: https://github.com/cmusphinx/pocketsphinx/tree/master/swig/python/test

In [1]:
from os import environ, path
from pocketsphinx import *
from sphinxbase import *

In [2]:
model_dir = get_model_path()
data_dir = 'C:\\Users\\Eleanor\\Documents\\github\\Springboard-Capstone-2\\data\\dataset_arthur-the-rat\\'

In [3]:
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', path.join(model_dir, 'en-us'))
config.set_string('-lm', path.join(model_dir, 'en-us.lm.bin'))
config.set_string('-dict', path.join(model_dir, 'cmudict-en-us.dict'))

In [4]:
# Demonstrate pronunciation lookup
decoder = Decoder(config)
print ("Pronunciation for word 'hello' is ", decoder.lookup_word("hello"))
print ("Pronunciation for word 'abcdf' is ", decoder.lookup_word("abcdf"))

Pronunciation for word 'hello' is  HH AH L OW
Pronunciation for word 'abcdf' is  None


### Test file number one

Files goforward.raw and goforward.mfc from https://github.com/cmusphinx/pocketsphinx/tree/master/test/data

Correct transcription: "*Go forward ten meters*"

In [5]:
# Decode streaming data
decoder.start_utt()
stream = open(path.join(data_dir, 'Audio test\\goforward.raw'), 'rb')
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
        break
decoder.end_utt()

hypothesis = decoder.hyp()
logmath = decoder.get_logmath()
print ('Best hypothesis: ', hypothesis.hypstr, " model score: ", hypothesis.best_score, " confidence: ", logmath.exp(hypothesis.prob))

Best hypothesis:  go forward ten meters  model score:  -7066  confidence:  0.04042641466841839


In [6]:
print ('Best hypothesis segments: ', [seg.word for seg in decoder.seg()])

Best hypothesis segments:  ['<s>', '<sil>', 'go', 'forward', 'ten', 'meters', '</s>']


In [7]:
# Access N best decodings.
print ('Best 10 hypothesis: ')
for best, i in zip(decoder.nbest(), range(10)):
    print (best.hypstr, best.score)

Best 10 hypothesis: 
go forward ten meters -28034
go for word ten meters -28570
go forward and majors -28670
go forward and meters -28681
go forward and readers -28685
go forward ten readers -28688
go forward ten leaders -28695
go forward can meters -28695
go forward and leaders -28706
go for work ten meters -28722


In [8]:
stream = open(path.join(data_dir, 'Audio test\\goforward.mfc'), 'rb')
stream.read(4)
buf = stream.read(13780)
decoder.start_utt()
decoder.process_cep(buf, False, True)
decoder.end_utt()
hypothesis = decoder.hyp()
print ('Best hypothesis: ', hypothesis.hypstr, " model score: ", hypothesis.best_score, " confidence: ", hypothesis.prob)

Best hypothesis:  go forward ten meters  model score:  -7095  confidence:  -32715


### Test file number two

File sense_and_sensibility_01_austen_64kb-0870.wav from https://github.com/cmusphinx/pocketsphinx/tree/master/test/data/librivox

Correct transcription: "*and mister john dashwood had then leisure to consider how much there might be prudently in his power to do for them*"

In [9]:
decoder.start_utt()
stream = open(path.join(data_dir, 'Audio test\\sense_and_sensibility_01_austen_64kb-0870.wav'), 'rb')
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
        break
decoder.end_utt()

hypothesis = decoder.hyp()
logmath = decoder.get_logmath()
print ('Best hypothesis: ', hypothesis.hypstr, " model score: ", hypothesis.best_score, " confidence: ", logmath.exp(hypothesis.prob))

Best hypothesis:  and mr john s. would and then at leisure to consider how much there might be greatly in his power to do for them  model score:  -33908  confidence:  1.5083425365532256e-09


In [10]:
# Access N best decodings.
print ('Best 10 hypothesis: ')
for best, i in zip(decoder.nbest(), range(10)):
    print (best.hypstr, best.score)

Best 10 hypothesis: 
and mr john s. would and then at leisure to consider owl much there might be greatly in his power to do for them -55659
and mr john s. would and then at leisure to consider owl much there might be greatly in his power to do for fun -55783
and mr john s. would and then at leisure to consider owl much there might be greatly in his power to do for love -55729
and mr john guess what and then at leisure to consider owl much there might be greatly in his power to do for them -55666
and mr john guess what and then at leisure to consider owl much there might be greatly in his power to do for fun -55790
and mr john guess what and then at leisure to consider owl much there might be greatly in his power to do for love -55736
and mr john s. would and then at leisure to consider owl much there might be greatly in his power to do for them -55758
and mr john s. would and then at leisure to consider owl much there might be greatly in his power to do for them -55758
and mr john s. 

In [11]:
print ("Pronunciation for word 'prudently' is ", decoder.lookup_word("prudently"))
print ("Pronunciation for word 'dashwood' is ", decoder.lookup_word("dashwood"))

Pronunciation for word 'prudently' is  P R UW D AH N T L IY
Pronunciation for word 'dashwood' is  D AE SH W UH D


### Test file number three

File OSR_us_000_0010_8k.wav from http://www.voiptroubleshooter.com/open_speech/american.html

Source for transcriptions: https://www.cs.columbia.edu/~hgs/audio/harvard.html

Correct transcription:
"*The birch canoe slid on the smooth planks.
Glue the sheet to the dark blue background.
It's easy to tell the depth of a well.
These days a chicken leg is a rare dish.
Rice is often served in round bowls.
The juice of lemons makes fine punch.
The box was thrown beside the parked truck.
The hogs were fed chopped corn and garbage.
Four hours of steady work faced us.
Large size in stockings is hard to sell.*"

In [12]:
decoder.start_utt()
stream = open(path.join(data_dir, 'Audio test\\OSR_us_000_0010_8k.wav'), 'rb')
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
        break
decoder.end_utt()

hypothesis = decoder.hyp()
logmath = decoder.get_logmath()
print ('Best hypothesis: ', hypothesis.hypstr, " model score: ", hypothesis.best_score, " confidence: ", logmath.exp(hypothesis.prob))
print ('Best 10 hypothesis: ')
for best, i in zip(decoder.nbest(), range(10)):
    print (best.hypstr, best.score)

Best hypothesis:  what do all the they go about it oh to the book and a lot of books at about where their bit and but at the top but that what's really the if been i get a bit  model score:  -73208  confidence:  2.9252480354628394e-25
Best 10 hypothesis: 
what do all that but don't talk about it oh that are bought and whatnot cause they at what will that bit and but at it but that aunts breath the if been a bit little bit -97312
what do all that but don't talk about it oh that are bought and whatnot cause they at what will that bit and but at it but that op's breath the if been a bit little bit -97313
but don't do that but don't talk about it oh that are bought and whatnot cause they at what will that bit and but at it but that aunts breath the if been a bit little bit -97318
but don't do that but don't talk about it oh that are bought and whatnot cause they at what will that bit and but at it but that op's breath the if been a bit little bit -97319
what do all that but don't talk abou

### Test file number four

File R0482.wav from https://datashare.is.ed.ac.uk/handle/10283/392

Correct transcription:
"*Once there was a young rat named Arthur, who could never make up his mind. Whenever his friends asked him if he would like to go out with them, he would only answer, "I don't know." He wouldn't say "yes" or "no" either. He could never learn to make a choice. His aunt Helen said to him, "No one is going to care for you if you carry on like this." One rainy day, the rats heard a great noise in the loft where the pine rafters were all rotten. At last the joints gave way and fell to the ground. The walls shook and all the rats' hair stood on end with fear and horror. "This won't do," said the captain. "I'll send scouts out to search for a new home." Three hours later the seven scouts came back and said, "We have found just what we wanted, a stone house where there is room and good food for us all. There is a kindly horse named Nelly, a cow, a calf, and a garden with an elm tree." Just then the old rat saw Arthur. "Stop," he ordered coarsely. "Are you coming?" "Of course I suppose I ought," Arthur sighed, "but the roof may not come down yet." "Well," said the angry old rat, "we can't wait all day for you. Right about face. March!" And they went off. Arthur stood and watched them hurry away. The idea of immediate decision was too much for him.  "Why do they have to go today?" he said calmly to himself.  That night there was a great crash that shook the earth. In the foggy morning some men rode up and looked at the ruins. One of them moved a board and saw a young rat lying on his side, quite dead, half in and half out of his hole. 
One two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen twenty.*"

In [13]:
decoder.start_utt()
stream = open(path.join(data_dir, 'Arthur the Rat (recordings 476-500)\\R0482.wav'), 'rb')
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
    else:
        break
decoder.end_utt()

hypothesis = decoder.hyp()
logmath = decoder.get_logmath()
print ('Best hypothesis: ', hypothesis.hypstr, " model score: ", hypothesis.best_score, " confidence: ", logmath.exp(hypothesis.prob))

Best hypothesis:  of the boob of both of ball if the brown would prefer it up from home to have a rule of hopefully it will have fun and move to move up and blue lived won't for the pop of how to find her they that for a white football move it for all at home often only a few and for the warm home war or outfit and warm out and lose if lot all of the war to move for how to ah moi food that war but moved moved heaven for the illusion that hold for all of a ham what do you what fact is if ah moved from fox who should move back move back to the left cold versus the room left vo and no fish will come from low move will post flew them up if the food food for an of new war who owned half around at you and found move up for a living on it to a whole lived for coffee for a yahoo whoo it all growth of the war of the time to time the notion that moved to move up poop for all all the time and that volvo of dog food time of doing all the time i have one who ah live how you do need it for at home a