Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #130: user changes to examples/audio_transcribe #133

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

combinatorist
Copy link

@combinatorist combinatorist commented Jun 29, 2016

#130

The point of my changes were to:

  • only use pocketsphinx
  • pass in file names from the command line (I'm aiming for a for loop or a file list)

It works for .aif (from inside a garage .band media file) files that are only 5 seconds, but my next smallest is at least a minute and it fails.

@combinatorist
Copy link
Author

Ok, there are instructions in audio_transcribe about how to run this from the command line, but first you would need to download the file: long_interview_example.aif from my public dropbox link and put it in examples/

@combinatorist
Copy link
Author

FYI: There're actually two copies of the short interview: one in examples/, the other deep in examples/short_interview.band/ just to show you where I grabbed it from the garage band project.

@combinatorist
Copy link
Author

Ok, I tried listen instead of record and got a transcription of 810 characters out of the 41 minute "long" interview. So, it seems I only got the first chunk, but that's definitely the most promising thing so far! Do I need to create a loop around listen?

@combinatorist
Copy link
Author

combinatorist commented Jul 13, 2016

Sorry, my participant changed their mind and I had to remove the file, so the dropbox link won't work anymore. I'll try to generate a replacement soon.

Meanwhile, do you have any idea why listen would produce such a short transcription?

@Uberi
Copy link
Owner

Uberi commented Jul 14, 2016

listen only transcribes the first phrase, so you'll need to use a loop around that. It's a bit of a hacky workaround; I'll post an example here in a bit.

No worries about the Dropbox link, by the way. I got the issue to manifest with a long-ish podcast, so there's a good starting point.

@combinatorist
Copy link
Author

I just got this to work with a while loop over listen that stops when it catches the sr.WaitTimeoutError from the timeout argument in listen.

I'm not sure this is reliable, particularly, because I don't really know that it will always generate this timeout error at the end of the file (before, never?), but, at least it worked and I'm really grateful for your help.

I'd love to dig deeper and do some better testing to improve this example or create a new one to your design standards, @Uberi (if you think that would be valuable).

@Uberi
Copy link
Owner

Uberi commented Jul 23, 2016

Glad you got it working! I'd definitely like to include this, so when there's time I'll do a proper review and merge it. Ideally, we'd want to exclude the line ending changes and split out the long examples into their own files.

There's a small issue with listen() that will be fixed in the next release (it's actually done, but still needs to be packaged up and published) - that should get things working pretty robustly. I'll be sure to update in this thread.

@combinatorist
Copy link
Author

Sorry, @Uberi, I'm really eager to make this useful, but I can't figure out what you mean by "the line endings changes"- are you saying you would like me to break the long transcription up into multiple lines, or that I messed up some kind of line endings that used to be there?

Similarly, when you say "split out the long examples into their own files", do you mean write each loop of a transcription into its own file, or move the source code for long transcriptions into a separate file (from audio_transcribe.py) ... or something else?

I think I'll have some time to next week (probably this ⚠️Tuesday) to do some tidying I'd like to do anyway. I would love if you gave me a little direction:

  1. Should I make a separate "example" to demonstrate long transcriptions?
  2. Should I make the long transcription loop work on all the API's? (would need keys)
  3. Should I break up the resulting transcription somehow (line breaks, separate files, etc)?
  4. Anything else?

Thanks!

@Uberi
Copy link
Owner

Uberi commented Aug 1, 2016

Hey @combinatorist,

If you check out the diff for this PR, you'll notice that there are about 3500 lines changed, but the actual number of changes is somewhat less. That seems to be due to the line endings being changed from CRLF to LF. As for those questions:

  1. Ideally, we'd want to have a separate example for that (maybe called long_transcriptions.py).
  2. Just one API (Google's speech recognition maybe, since it doesn't require installing Sphinx or an API key) is totally fine.
  3. Sure, either way works!
  4. Nope, not at the moment. Note that I unfortunately won't have much time to look at it until exams are over.

@combinatorist
Copy link
Author

Hi @Uberi, sorry, I was going to work on this during a flight, but had terrible wifi, so I put it off.

I think I might have some spare time later this month, so for what it's worth:

  1. Agreed
  2. I haven't really used the Google Speech Recognition side of the package, but I noticed it gives better results than Sphinx in the microphone example. I'm not sure what complications might arise from longer segments, but I bet it's worth it.
  3. Ok, I'll go with line breaks.

I'll also fix the line endings. Let me know if you happen to think of anything else!

@combinatorist
Copy link
Author

Actually, when I look at the diff online, I don't see a difference in line breaks. Maybe it's displaying differently, but I think what you're seeing is just where I commented out a lot of code that wasn't relevant to my specific use case.

Regardless, I'm going to start over with my code in a separate example (as you suggested), so I won't be commenting anything out anymore.

@combinatorist
Copy link
Author

Hmm, for some reason I'm getting various networking errors with Google, but notice the shortest phrase worked. I've tried to get really short phrases in case it's a timeout issue, but that didn't appear to work

(speech)[517] examples% python long_transcriptions.py long_interview_example.aif 
time: 04:02.54, loop_count: 1
google error; recognition connection failed: [Errno 32] Broken pipe
time: 04:06.56, loop_count: 2
    google error; recognition request failed: Bad Gateway
time: 04:11.00, loop_count: 3
google error; recognition request failed: Bad Gateway
time: 04:15.03, loop_count: 4
google error; recognition connection failed: [Errno 32] Broken pipe
time: 04:19.06, loop_count: 5
google error; recognition connection failed: [Errno 32] Broken pipe
time: 04:23.08, loop_count: 6
number 10 I don't anticipate any reason I would need to withdraw from the study if you choose to withdraw yourself at some point you can do that no problem whenever you want to do
time: 04:23.14, loop_count: 7
google error; recognition request failed: Bad Gateway
time: 04:27.17, loop_count: 8
google error; recognition connection failed: [Errno 32] Broken pipe
time: 04:31.19, loop_count: 9
^C
Traceback (most recent call last):
  File "long_transcriptions.py", line 63, in <module>
    f.write(' ' + text)
TypeError: cannot concatenate 'str' and 'exceptions.KeyboardInterrupt' objects

@combinatorist
Copy link
Author

OK, FWIW, I downloaded Google's example docs and managed to get a 40 minute clip to transcribe in 4 minutes., but I had to use a URI in Google Cloud Storage (required an account etc).

I just need to add this to the python module and we're set!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants