-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about the results #1
Comments
can you re-upload video? |
i think we can switch out the resnet18 for resnet50 - have a look at this - IRFD class Emtn(nn.Module):
|
Thanks, I will try it |
https://github.com/johndpope/SPEAK-hack it needs some data wired up. PRs welcome. UPDATE some progress - but i need generator |
I don't have results yet. Still working on bug fixing. You can check out my prune branch |
Are you currently not using g2d g3d and warping generators? |
this is a new paper from 2024 - IRFD |
It appears to be simpler than megaportraits, and it requires less computation in training phase. However, I doubt it can maintain 3D consistency when the head turns when dropping out 3D volume in megaportraits.
|
good news - it's training. |
check my branch - when you run images with less 224x224 against resnet50 - the feature size collapses from 7x7 -> 1x1. share your code - mine gets stuck in a local minima. ![]() |
My code is located at https://github.com/JaLnYn/talkinghead/tree/new_model I'm not sure what you mean by
Are you talking about lpips loss? Or encoder? |
i want to train net with 64x64 - my branch handles that - but the resnet encoder produces different size features for smaller images than 224. there's a test file in my branch you can see. |
UPDATE
going back to SPEAK paper - they base the gan architecture on stylegan - wondering whether to |
I decided to take style gan code from a blog. Seems to work. You can check my new_model branch I linked above |
a couple of lines to help - Define transformations transform = Compose([
Lambda(lambda x: x.permute(0, 3, 1, 2).float()),
Lambda(lambda x: (x / 255.0)), # Normalize to [0, 1]
Resize((256, 256)) # my videos were 512 - caused me some headaches
])
parser.add_argument('--config_path', type=str, default='./config/local_train.yaml', help='Path to the config') the game with SPEAK should be in forward pass - .1 Training of Disentanglement Module. To train Subsequently, we randomly in forward pass # Randomly swap one type of feature (keeping this functionality)
swap_type = torch.randint(0, 3, (1,)).item()
if swap_type == 0:
fi_s, fi_t = fi_t, fi_s
elif swap_type == 1:
fe_s, fe_t = fe_t, fe_s
else:
fp_s, fp_t = fp_t, fp_s that should make the happy guy emotion / keep identity - but replace with sad lady emotion etc. in my repo theres a dataset using a 2gb Affectnet image set (8 emotions) https://www.kaggle.com/datasets/thienkhonghoc/affectnet Epoch 1/5, Total Loss: 3.3473: 0%|▏ | 76/17833 [00:15<55:35, 5.32it/s]x.shape: torch.Size([2, 3, 256, 256])
Epoch 1/5, Total Loss: 3.7954: 0%|▎ | 77/17833 [00:16<53:18, 5.55it/s]x.shape: torch.Size([2, 3, 256, 256])
Epoch 1/5, Total Loss: 2.1625: 0%|▎ | 78/17833 [00:16<52:48, 5.60it/s]x.shape: torch.Size([2, 3, 256, 256])
Epoch 1/5, Total Loss: 3.1936: 0%|▎ | 79/17833 [00:16<51:22, 5.76it/s]Epoch 1/5, Total Loss: 3.1936: 0%|▏ | 79/17833 [00:16<1:01:58, 4.77it/s]
Traceback (most recent call last):
File "/media/oem/12TB/talkinghead/src/trainer.py", line 239, in <module>
main()
File "/media/oem/12TB/talkinghead/src/trainer.py", line 235, in main
train_model(config, p, video_dataset)
File "/media/oem/12TB/talkinghead/src/trainer.py", line 137, in train_model
for idx, (Xs, Xd, Xsp, Xdp) in enumerate(train_iterator):
File "/home/oem/miniconda3/envs/comfyui/lib/python3.11/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/oem/miniconda3/envs/comfyui/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 629, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/home/oem/miniconda3/envs/comfyui/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 672, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/oem/miniconda3/envs/comfyui/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/oem/miniconda3/envs/comfyui/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
~~~~~~~~~~~~^^^^^
File "/media/oem/12TB/talkinghead/src/dataloader.py", line 29, in __getitem__
video_data = vr.get_batch(frame_indices).asnumpy()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/oem/.local/lib/python3.11/site-packages/decord-0.6.0-py3.11-linux-x86_64.egg/decord/video_reader.py", line 175, in get_batch
arr = _CAPI_VideoReaderGetBatch(self._handle, indices)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/oem/.local/lib/python3.11/site-packages/decord-0.6.0-py3.11-linux-x86_64.egg/decord/_ffi/_ctypes/function.py", line 173, in __call__
check_call(_LIB.DECORDFuncCall(
^^^^^^^^^^^^^^^^^^^^
|
hmmm I've never seen this bug before. I've trained over 10 epochs... I'm using a small voxceleb2 dataset 224x224. I've been re-reading the paper and I'm confused whether they use pretrained encoders or if they are training the encoders from scratch. I use to think they are not pretrained but now I think they are. I going to re-write some code tomorrow to refect this. I'm planning to use vggface2 for face but I haven't found encoders for the others yet. Please let me know if you've found anything. Also I found this project which may interest you. It uses warpings and works pretty well. I've tried it: https://github.com/KwaiVGI/LivePortrait |
from my work with megaportraits - i'd say theyre resnet50 - i'd like to work more closely with you - at least in similiar direction (model / loss / generator / discriminator) i did play with your code - but the noise in stylegan - seems eroneous to pass around everywhere - |
I'd be happy to work more closely with you :). We can get connected off github if you'd like
What is the issue you're having with this? I did some testing and it seems to work? I'll do more testing later
From what I've read, I think the implementation is correct? Can you point to specific location in the code that you think is wrong and point out why? |
came across this new paper the other day - which has code. it has a noteworthy class
@ChenyangWang95 - did you get anywhere with anything? |
@johndpope I think the pretrained liveportrait model may be a good choice for achieving the dismantlement for VASA-1. |
Im currently struggling to generate faces based on input face without disentanglement. Im struggling to do so but maybe my style gans is just not training long enough Ive tried arcface but vggface seems similar if not better. |
i was hoping the emoportrait code would drop - that would effectively allow me to archive the Megaportrait codebase - and reveal the answers to those questions....but hasn't yet. .... in coding the generator - i thought of using CIPS is from https://openaccess.thecvf.com/content/CVPR2021/papers/Anokhin_Image_Generators_With_Conditionally-Independent_Pixel_Synthesis_CVPR_2021_paper.pdf - which has no convolution layers - just fourier backbone - i had some problem with this noise - and this tug of war between using @JaLnYn Alan's stylegan v1 or stylegan from pytorch implementation (that has noise baked in) then switch back to speak-hack the vasa-1 stuff i used the Megaportrait code - to come up with this branch - it uses DPE as per the whitepaper - that has most hope to disentangle stuff johndpope/VASA-1-hack#13 |
Adding some batch norms usually helps my gradient explosions. I'm mostly waiting for my current project to train. |
i got distracted again wanting to improve compute efficiencies - spent all weekend on these - one Apple has a patent on - recently renewed - I do basic proof of concept / and extend further to cuda - but i think it's specific to hardware / asic - wont work with gpu. this liveportrait implematation is rather amazing - https://x.com/purzbeats/status/1812287664240107969?s=46&t=-tkSIrsyNobBjIvQ2IAQwQ - the obscuring of face is awesome. |
Hi, very excited to see the work.
I wonder if there are any results about the disentanglement results?
I tried and revised the repo https://github.com/johndpope/MegaPortrait-hack
but the mouth is fixed and seems to be average emotion.
output_video4.mp4
The text was updated successfully, but these errors were encountered: