Some words are missing #48

ibndias · 2022-02-17T02:45:20Z

Hi, thanks for the great project!

I have a problem with some words that are missing in the transcript.
But if I just do transcribe the same audio using only deepspeech project (not autosub with ds engine), there are no missing words.

Are there any tweaks that can be done by parameter? or is it because silent segment removal process?

Here is the txt output from autosub with ds engine

biggest . 

people make when larry english and probably one of the most common miss. 

people think that they. 





don't study. 

live in. 

an out let me explain what i . 

one does studying men and how do people usually approach this pro. 

and how do people.

And here is the deepspeech output.

the biggest mistake people make when morning english and probably one of the most common misconceptions is that people think that they need to study english and usedn't study english live english an outlet explain what i mean one does studying men and how do people

As you can see, some words are missing on autosub output.

I am using same deepspeech 0.9.3 version and model both on autosub and deepspeech.

The text was updated successfully, but these errors were encountered:

abhirooptalasila · 2022-02-17T03:50:48Z

Hi
Are you using the latest version of AutoSub? If yes, I switched the default inference to Coqui STT as it has better support for different languages. You can change this by setting --engine to "ds" while running main.py and checking again.
If you are sure that you're using DeepSpeech, you can play around with the default parameter values here.

ibndias · 2022-02-17T06:30:23Z

Are you using the latest version of AutoSub?

Yes I am using the latest master branch

If yes, I switched the default inference to Coqui STT as it has better support for different languages. You can change this by setting --engine to "ds" while running main.py and checking again.

Yes I also did change the engine to deepspeech

(sub) derry@10700k:~/ws/AutoSub$ python3 autosub/main.py --engine ds --file ./qjbBeORPUA4-oo9mOmdonl.mp4 
[INFO] ARGS: Namespace(dry_run=False, engine='ds', file='./qjbBeORPUA4-oo9mOmdonl.mp4', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=5)
[INFO] Model: /home/derry/ws/AutoSub/deepspeech-0.9.3-models.pbmm
[INFO] Scorer: /home/derry/ws/AutoSub/deepspeech-0.9.3-models.scorer
[INFO] Input file: ./qjbBeORPUA4-oo9mOmdonl.mp4
[INFO] Extracted audio to audio/qjbBeORPUA4-oo9mOmdonl.wav
[INFO] Splitting on silent parts in audio file
[INFO] Running inference...
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
...

play around with the default parameter values here.

Thanks for the hints, I got 'better' results using these numbers smoothing_window=0.5, weight=0.01.
However, I don't really understand how this parameter works. And also this magic number for st_win and st_step. Can you explain a little bit?

I think adding a switch to disable for silence removal is needed for non-movie video (full conversation). :)

abhirooptalasila · 2022-02-21T17:09:41Z

This is a better explanation.
Thanks for the suggestion about silence removal. I'll think about how to decouple it from splitting the file.

abhirooptalasila closed this as completed Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some words are missing #48

Some words are missing #48

ibndias commented Feb 17, 2022 •

edited

abhirooptalasila commented Feb 17, 2022

ibndias commented Feb 17, 2022

abhirooptalasila commented Feb 21, 2022

Some words are missing #48

Some words are missing #48

Comments

ibndias commented Feb 17, 2022 • edited

abhirooptalasila commented Feb 17, 2022

ibndias commented Feb 17, 2022

abhirooptalasila commented Feb 21, 2022

ibndias commented Feb 17, 2022 •

edited