Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some problems about lip sync #21

Closed
tju-zxy opened this issue Apr 10, 2020 · 8 comments
Closed

Some problems about lip sync #21

tju-zxy opened this issue Apr 10, 2020 · 8 comments

Comments

@tju-zxy
Copy link

tju-zxy commented Apr 10, 2020

Hi, @prajwalkr. Thanks for sharing the revolutionary work. However, when I run the code and input the same image which you gave in the previous issues, I cannot get a satisfactory result. My result_video is listed as follows. Could you give me some advice on improving the result or correcting my possible mistakes? Thanks a lot.
https://www.youtube.com/watch?v=beuf71Wrg3g

@prajwalkr
Copy link
Collaborator

Hello, for some reason the face is not detected properly. This is not a failure of LipGAN but rather of face detection. You can adjust the detected box with padding with this parameter:

parser.add_argument('--pads', nargs='+', type=int, default=[0, 0, 0, 0], help='Padding (top, bottom, left, right)')

You can find an example mentioned in another similar issue: #14 (comment)

Please experiment with this padding a little bit to ensure the detected face box covers most of the face.

@tju-zxy
Copy link
Author

tju-zxy commented Apr 10, 2020

Thanks for your help! It indeed works. Your advice is very valuable. Now, the image can generate a decent result. However, when I input the video and the audio, the lip movements of the generated video are almost like those in the source video. I has tried to extract some frames from the video to test the model. The result generated from the frame is acceptable, So I sincerely hope you give me some advice on improving the result generated from a video. My result is listed as follows.
The souce video :
https://youtu.be/vM2HlaztgCM
The result generated from video:
https://youtu.be/3b9p1h7Df4c
The result generated from frame:
https://youtu.be/8YFsRaRJaPo
Thanks a lot !

@prajwalkr
Copy link
Collaborator

Hello

Glad the result improved. For the result generated from a single-frame, I think you can improve it further if you correct the padding to cover just until the chin at the bottom and the sides of the face.

Results from a static frame will always be superior compared to results on moving frames. As ours is a frame-based model, you will observe temporal inconsistencies and thus you will observe poor results in some cases, especially during silences. We are working on a future work to resolve these issues and we will update this repo accordingly.

@ak9250
Copy link

ak9250 commented Apr 26, 2020

@tju-zxy have you tried https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose for your video input?

@shikhar-scs
Copy link

Hey @Rudrabha , for a different problem, if I want to mask the complete face with the ground truth in the face encoder, do I make any changes in the pre-process part or only in the batch_inference part, adjusting the paddings ?

@prajwalkr
Copy link
Collaborator

want to mask the complete face with the ground truth in the face encoder

I am sorry, I do not understand. Please explain more. But I can assure you nothing is to be done in the preprocess part.

@shikhar-scs
Copy link

Hey, I probably figured out that part. No worries.
Meanwhile if you could have a look at https://stackoverflow.com/questions/61608295/attributeerror-nonetype-object-has-no-attribute-inbound-nodes-add-conv-la, it would be great help

@prajwalkr
Copy link
Collaborator

However, when I input the video and the audio, the lip movements of the generated video are almost like those in the source video

Please switch to this latest improved work: https://github.com/Rudrabha/Wav2Lip :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants