-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some problem about the paper #2
Comments
Hello, have you reimplemented the project ? |
I guess 137 is a typing error. |
Haven't solved yet? I'm also curious about the number 137.... 👍🏻 |
I have re-implemented this paper, and the experiment has achieved initial results.
Here is a free-view talking head demo ( animation with different roll, yaw, pitch) of my implementation: concat-14.mp4 |
|
Right now, it seems like the model does work, but there are still a lot of problems to be solved. Training my model (for visualization, 3D key-points are projected to 2D): In addition, I found that the detection of 3D key-points was not accurate or even reasonable. |
Update: I'will release the code later. @XiaoWen-AI @charan223 |
Nice to see the reimplementation! Sorry for the delay in the code release, we are still in the long tedious process of getting company approval... hopefully it can get approved soon. |
Thank you very much for your explanation of "137". Can you tell me more details about feature compression, like how many 3D convolution layers you use and what's the kernel size? |
|
Thank you very much for this explanation! I'm also starting to re-implement this paper, but I have confusions about the architecture figure, some of which are already mentioned in comments before:
|
Thanks to your advices, my reimplementation seems work now. But the generated occlusion_map is vary different from yours. And the same problem such as eyes and mouth opening and closing isn't work. What revises do you made to fix it? In addition, the key points is vary dense and without semantic. I will try to solve these problems in in my free time. Thanks again! @zhanglonghao1992 |
I have a similar question as @zhengkw18 @tcwang0509 Can you explain how the input of the U-Net in Motion field estimator is formed?
|
Well, I read the code of FOMM and found most of the questions solved. This paper is similar to FOMM in many aspects.
I'm implementing this paper these days to see if my understanding holds. Besides, I wonder whether the author uses some special methods to model the motion of eyes and mouth. In the provided demo, we can even control eye rotation. I think such thing is not mentioned in the paper. |
FOMM is definitely a good reference repo. I recommend everybody who wants to work on motion transfer reads the FOMM paper. Here, I want to point out several major differences between our model and FOMM to help reproduce the results.
|
@XiaoWen-AI |
Hello, |
Your project is vary awesome! I am trying to play your methor by myself, but I have some comfusions. Can you give me some suggestions?
In one words, I'm not sure the output shape of some Modules. Follow your describe, and given a input Tensor with shape 1x3x256x256. I got the fs with shape 1x32x16x64x64, the output of last UpBlock3D of L is 1x32x512x256x256(that cost so much in calculate Jc,k).
1.Appearance feature ectractor F: this is simple, but I want to make sure if the output(that is called fs) will get a shape 1x32x16x64x64 (The shape of input is 1x3x256x256). The fs after warping will be feed into the Motion field estimator M, but there are 5 DownBlock3D, the D with 16 just need 4 downblocks will become 1, why should we need 5 donwblocks?
2. The path of occlusion in Motion field estimator M: Why that will have a Reshape C137*D16->C2192, the output of the last UpBlock3D will have 32 channels, how much about the D? 137x16/32=68.5, and I think the D should be 16 just as the same of fs.
3. The path of mask in Motion field estimator M: there is a 7x7x7-Conv-21, k is 20, why C is 21. And is it need a global pooling? The mask is a 20-d number? Just multiple to the every pixel of Wk?
4. I want to make sure the operation of 3D block such as UpBlock3D, will it double D just like the opration to H and W?
The text was updated successfully, but these errors were encountered: