Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some words are missing #48

Closed
ibndias opened this issue Feb 17, 2022 · 3 comments
Closed

Some words are missing #48

ibndias opened this issue Feb 17, 2022 · 3 comments

Comments

@ibndias
Copy link

ibndias commented Feb 17, 2022

Hi, thanks for the great project!

I have a problem with some words that are missing in the transcript.
But if I just do transcribe the same audio using only deepspeech project (not autosub with ds engine), there are no missing words.

Are there any tweaks that can be done by parameter? or is it because silent segment removal process?

Here is the txt output from autosub with ds engine

biggest . 

people make when larry english and probably one of the most common miss. 

people think that they. 





don't study. 

live in. 

an out let me explain what i . 

one does studying men and how do people usually approach this pro. 

and how do people. 

And here is the deepspeech output.

the biggest mistake people make when morning english and probably one of the most common misconceptions is that people think that they need to study english and usedn't study english live english an outlet explain what i mean one does studying men and how do people 

As you can see, some words are missing on autosub output.

I am using same deepspeech 0.9.3 version and model both on autosub and deepspeech.

@abhirooptalasila
Copy link
Owner

Hi
Are you using the latest version of AutoSub? If yes, I switched the default inference to Coqui STT as it has better support for different languages. You can change this by setting --engine to "ds" while running main.py and checking again.
If you are sure that you're using DeepSpeech, you can play around with the default parameter values here.

@ibndias
Copy link
Author

ibndias commented Feb 17, 2022

Are you using the latest version of AutoSub?

Yes I am using the latest master branch

If yes, I switched the default inference to Coqui STT as it has better support for different languages. You can change this by setting --engine to "ds" while running main.py and checking again.

Yes I also did change the engine to deepspeech

(sub) derry@10700k:~/ws/AutoSub$ python3 autosub/main.py --engine ds --file ./qjbBeORPUA4-oo9mOmdonl.mp4 
[INFO] ARGS: Namespace(dry_run=False, engine='ds', file='./qjbBeORPUA4-oo9mOmdonl.mp4', format=['srt', 'vtt', 'txt'], model=None, scorer=None, split_duration=5)
[INFO] Model: /home/derry/ws/AutoSub/deepspeech-0.9.3-models.pbmm
[INFO] Scorer: /home/derry/ws/AutoSub/deepspeech-0.9.3-models.scorer
[INFO] Input file: ./qjbBeORPUA4-oo9mOmdonl.mp4
[INFO] Extracted audio to audio/qjbBeORPUA4-oo9mOmdonl.wav
[INFO] Splitting on silent parts in audio file
[INFO] Running inference...
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
...

play around with the default parameter values here.

Thanks for the hints, I got 'better' results using these numbers smoothing_window=0.5, weight=0.01.
However, I don't really understand how this parameter works. And also this magic number for st_win and st_step. Can you explain a little bit?

I think adding a switch to disable for silence removal is needed for non-movie video (full conversation). :)

@abhirooptalasila
Copy link
Owner

This is a better explanation.
Thanks for the suggestion about silence removal. I'll think about how to decouple it from splitting the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants