Chinese characters are spoken faster than English words, will this model work on Chinese? #51

zwfcrazy · 2020-04-09T03:03:37Z

I want to build a dataset of Chinese characters to train this model.
I applied speech recognition on some Chinese news videos (by CCTV).
The recognition part was fine, but I found that Chinese characters are too short in terms of pronounce time because each of them has only one syllable.
The average number of video frames it takes to show the lip movement of a single Chinese character is only 5 (fps=25), and It can be even as low as 2 frames. This is much less than the required 29 frames. Obviously, interpolation won't work well in this case.
So I would like to know if you guys have considered Chinese? Will this model work? Is there any workaround?

Hangz-nju-cuhk · 2020-04-15T09:48:27Z

You can get rid of the recognition and adversarial part of the model. Then it can work regardless of language and input lengths. Although a crucial part is removed, I think at least reasonable results can be obtained in this way with acceptable performance. It will be better if the pretrained weights of our model can be loaded then finetuned on your dataset. However, you may need to modify the code (delete several parts, modify input length) for it to work well.

ak9250 · 2020-04-28T14:22:46Z

@zwfcrazy have you tried this https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose
seems to work regardless of language

@Hangz-nju-cuhk this paper https://arxiv.org/pdf/2004.12992.pdf cites this work and is able to handle head pose and speaker awareness

Hangz-nju-cuhk · 2020-04-30T08:38:20Z

@ak9250 Thanks for your reference. I am familiar with both these papers and even have seen their videos before they are on arxiv. They are both great works. I would definitely recommend researchers to try the state-of-the-art models, as mine seems a little out-of-date for now.

zwfcrazy · 2020-05-06T03:53:49Z

@ak9250 @Hangz-nju-cuhk sorry for the late reply. Thank you both! I will close this issue for now.

zwfcrazy closed this as completed May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinese characters are spoken faster than English words, will this model work on Chinese? #51

Chinese characters are spoken faster than English words, will this model work on Chinese? #51

zwfcrazy commented Apr 9, 2020

Hangz-nju-cuhk commented Apr 15, 2020

ak9250 commented Apr 28, 2020

Hangz-nju-cuhk commented Apr 30, 2020

zwfcrazy commented May 6, 2020 •

edited

Chinese characters are spoken faster than English words, will this model work on Chinese? #51

Chinese characters are spoken faster than English words, will this model work on Chinese? #51

Comments

zwfcrazy commented Apr 9, 2020

Hangz-nju-cuhk commented Apr 15, 2020

ak9250 commented Apr 28, 2020

Hangz-nju-cuhk commented Apr 30, 2020

zwfcrazy commented May 6, 2020 • edited

zwfcrazy commented May 6, 2020 •

edited