Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese characters are spoken faster than English words, will this model work on Chinese? #51

Closed
zwfcrazy opened this issue Apr 9, 2020 · 4 comments

Comments

@zwfcrazy
Copy link

zwfcrazy commented Apr 9, 2020

I want to build a dataset of Chinese characters to train this model.
I applied speech recognition on some Chinese news videos (by CCTV).
The recognition part was fine, but I found that Chinese characters are too short in terms of pronounce time because each of them has only one syllable.
The average number of video frames it takes to show the lip movement of a single Chinese character is only 5 (fps=25), and It can be even as low as 2 frames. This is much less than the required 29 frames. Obviously, interpolation won't work well in this case.
So I would like to know if you guys have considered Chinese? Will this model work? Is there any workaround?

@Hangz-nju-cuhk
Copy link
Owner

You can get rid of the recognition and adversarial part of the model. Then it can work regardless of language and input lengths. Although a crucial part is removed, I think at least reasonable results can be obtained in this way with acceptable performance. It will be better if the pretrained weights of our model can be loaded then finetuned on your dataset. However, you may need to modify the code (delete several parts, modify input length) for it to work well.

@ak9250
Copy link

ak9250 commented Apr 28, 2020

@zwfcrazy have you tried this https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose
seems to work regardless of language

@Hangz-nju-cuhk this paper https://arxiv.org/pdf/2004.12992.pdf cites this work and is able to handle head pose and speaker awareness

@Hangz-nju-cuhk
Copy link
Owner

@ak9250 Thanks for your reference. I am familiar with both these papers and even have seen their videos before they are on arxiv. They are both great works. I would definitely recommend researchers to try the state-of-the-art models, as mine seems a little out-of-date for now.

@zwfcrazy
Copy link
Author

zwfcrazy commented May 6, 2020

@ak9250 @Hangz-nju-cuhk sorry for the late reply. Thank you both! I will close this issue for now.

@zwfcrazy zwfcrazy closed this as completed May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants