Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Original Model 128 worth it? #385

Closed
andenixa opened this issue May 2, 2018 · 89 comments
Closed

Original Model 128 worth it? #385

andenixa opened this issue May 2, 2018 · 89 comments

Comments

@andenixa
Copy link
Contributor

andenixa commented May 2, 2018

I have created a rough version of Original Model with dimensions 128, 128, 3.

Rationale:

There seems to be increasing demand for HD face-swapping while none had any luck with GAN128 as far as I can tell from issue and the playground. In addition it could also cover more face area.

Is releasing Original128 worth it? Still assessing the efficiency. I had to sacrifice some color data to keep up with memory / speed limitations but overally its not very visible (as opposed to GAN128 that discards
original color data). Speed seems to be up to snuff. Trying one-to-many scenarios as well. No LowMem version probably will ever be created for that.

Could also do Orig256 and Orig512 but definably won't fit in consumer GPU RAM.
--cheers

@Kirin-kun
Copy link

Post results eventually.

Something like a short clip before and after. So we can judge if it's worth it?

I think the main problem isn't the resolution, but the averaging.

@andenixa
Copy link
Contributor Author

andenixa commented May 2, 2018

Sure I will. I don't fully understand the theory though a highres data-set may contain more details for reconstruction (and more face coverage as well). I also work on idea if a concatenated Conv2d tensors with different kernels could preserve facial features such as wrinkles, freckles, and moles.

@ruah1984
Copy link

ruah1984 commented May 2, 2018

You can share out, and we run again to try.

@torzdf
Copy link
Collaborator

torzdf commented May 2, 2018

Yeah, I think it's worth it. If you can add a new model, then choice is good.

If it's in a state you can share, then please raise a Pull Request so others can test.

Thanks 👍

@gessyoo
Copy link

gessyoo commented May 2, 2018

I'm willing to test it. Is the Dfaker plugin/model still planned for integration at some point? I can run the Original model with no issues, but can't get the proposed Dfaker code here to work.

@ruah1984
Copy link

ruah1984 commented May 3, 2018

@andenixa we can test it, i believe 1080TI can support your request.

@andenixa
Copy link
Contributor Author

andenixa commented May 3, 2018

@torzdf yes its in in a state which I can share it. The major issue is to see if it has any meaningful results and since 128 models train longer I am still looking if it can perform at the level of Original quality-wise. At this stage it learns rather well but decoder part significantly lags behind the encoder and I can't predict its limitations.
Funny part I can run it with ENCODER_DIM of ~3k and batch size ~42 (there is no -bs size limitation such as even numbers only or has to be power of 2) and it still fits 9080ti memory.
PS: I shamelessly picked how GAN128 is implemented, but my model doesn't share its architecture. I only use some GAN128 tricks to conserve GPU RAM.

@torzdf
Copy link
Collaborator

torzdf commented May 3, 2018

Excellent. Well, whenever your ready, please raise a PR pointing at the Staging branch. Thanks!

@andenixa
Copy link
Contributor Author

andenixa commented May 4, 2018

@torzdf
I might be making a PR to Staging as you've suggested. Perhaps someone could get better result either by getting better data-set and giving more it training or tweaking the model itself while I work at my version.
Though I am yet to see consistent results that would at least outstretch GAN128. I am pretty happy with its learning ability, but the result it generates is a little "low-fi". On a good side it doesn't create aberrations such as twisted lines out of nowhere (the major reason I started making this one over GAN128 trainer).

@andenixa
Copy link
Contributor Author

andenixa commented May 5, 2018

Still tuning the Net. Memory consumption is modest even for middle-level cards. Speed is quite good, but I can't get a crisp picture even for decoder.

@tjdwo13579
Copy link

tjdwo13579 commented May 7, 2018

I'm not trying to nitpick but is conversion not possible with this model?

I've tried adding the "-t OriginalHighRes" on the conversion code but it's not working.

It says:
Reason: Error when checking : expected input_4 to have shape (None, 128, 128, 3) but got array with shape (1, 64, 64, 3)

Was this commit only meant for training as of now?
Mind my ignorance.. I'm not an expert in this field

@andenixa
Copy link
Contributor Author

andenixa commented May 7, 2018

@tjdwo13579 you might be right I have forgotten to add conversion code. I did a PR yet I weren't able to test it with the latest git version.
Somehow new releases became less Windows path friendly specially if you are using SMB paths like in my case.
@iperov I shall be sure to check it. Thanks.

@tjdwo13579
Copy link

@andenixa Thanks for adding the conversion code! I'll try it out now.

@andenixa
Copy link
Contributor Author

andenixa commented May 7, 2018

@iperov I shall try your interleaved Upscale/ResBlock approach on decoder if you don't mind. I also like the face extractor you are using. I want to create something akin to H128 yet maskless. I noticed you reduce memory consumption by using smaller batch sizes. Does it play well for diverse (different lighting condition) data-sets? I noticed bugger batches contribute to more accurate / generalized models. I wasn't able to refine anything with bs < ~45-48

@andenixa
Copy link
Contributor Author

andenixa commented May 7, 2018

@iperov looks excellent. Do you think its possible to preserve TARGET faces details, freckles, perhaps through another special layer? Do you feel that additional conv layers (for Encoder) contribute to better detail preservation? I also want to try deep-deep approach with additional Dense+Dropout layer in the middle of Encoder.

@iperov
Copy link
Contributor

iperov commented May 7, 2018

@andenixa model experiments with result comparison are much welcome.

@andenixa
Copy link
Contributor Author

andenixa commented May 7, 2018

Thanks to @iperov I am currently testing another revision of HighRes model adopting their re-scaling idea.
Memory consumption is somewhat high though but you guys with 6 to 8Gb should be fine. Training speed is slower as there is much more deep layers in Encoder. When I get a model with somewhat good clarity I shall adjust it for more face coverage.

@torzdf
Copy link
Collaborator

torzdf commented May 7, 2018

I'll leave it open as a pr for now. Let me know when you think it's ready for merging.

@andenixa
Copy link
Contributor Author

andenixa commented May 8, 2018

@torzdf sure, I am just trying to see if its not worse than the previous one considering decreased learning speed and raised memory demands. I am also trying a sliced bread design with dropout layer in the middle because previous 64x model (which is the basis for HighResv2) overtrained because of increased number of Conv2D layers.
I shall credit the ideas I might have borrowed from other contributes of course.
Generally I just want a working 128x tensor with HD quality;)

@andenixa
Copy link
Contributor Author

andenixa commented May 12, 2018

Still working on the model. The clarity is fascinating now, but the target vectors sometimes match wrongly aligned faces. I am trying to reduce number of deep layers to see if it helps that but I shall leave the high clarity (very-deep) Encoder in the code as well for those who want to experiment.

@andenixa
Copy link
Contributor Author

@torzdf I've updated my PR for the new model. It seems to be rather sane and stable. It takes some time to train and resource consumption is around 5gb per 24 batch_size. The clarity is rather good with a nice data-set. It seems to work for multi-gpu model as well.

@iperov
Copy link
Contributor

iperov commented May 13, 2018

@andenixa is SeparableConv useful ? what benefits it provides? have you comparison against regular idea with residual layers ?

@andenixa
Copy link
Contributor Author

andenixa commented May 13, 2018

@iperov I think its slightly faster and less accurate with colors as it processes color layers separately (presumably sequentially). It consumes less memory though it probably has worse convergence in general. I try to squeeze more layers while having reasonable training speed and Ram requirements. Also ideally the first conv layers it has to be 2x the retina side size yet I think it's unfeasible with Conv2D memory wise.
If you can fix a proper 128x HalfFace using the regular Conv2D I'd appreciate it.

PS: The reason I can't use OpenFaceSwap it's not compatible with current training sets and I have a lot of manually crafted sets.

@torzdf
Copy link
Collaborator

torzdf commented May 14, 2018

Ok, I haven't got time to test this at the moment, but I will merge it into staging.

If anyone wants to checkout the staging branch, give it a go and report back their findings that would be appreciated.

@iperov
Copy link
Contributor

iperov commented May 14, 2018

@andenixa I made best H128, without suxx residual blocks. I removed res blocks from all models. New super update for all models upcoming...

@andenixa
Copy link
Contributor Author

andenixa commented May 15, 2018

@iperov sounds fascinating if you can make it happen. In fact perhaps we should aim for H256 next. I very excited to give your H128 a try just need a time to time to make a training set.

Are H128 considerably different from full-face? For regular faceswap its just a matter of adjusting the margin matrix and of course training it to catch more "space". I actually changed new HighRes model to cover most of the face which is going to be in the next revision.

@iperov
Copy link
Contributor

iperov commented May 15, 2018

H128 has more details vs full face 128, but doesnt cover one cheek and beard.
Half face good for women fakes whose cheeks occluded by hair.

@andenixa
Copy link
Contributor Author

andenixa commented May 15, 2018

@iperov I am not exactly aiming to create fakes but rather to have one-to-many model where I merge multiple faces in the target data-set to catch unique features of each face. I have been successful with the basic Model by adding extra Conv layer(s) and increasing neuron count at dense layers. The problem of poor generalization and over-fitting still persists. It needs some learning rate decay and a lot of training epochs and still sucks quality-wise.
The major problem is also that the approach faceswap uses puts too much emphasis at matching the color rather than shape which makes it difficult to "melt" multiple sets.

@iperov
Copy link
Contributor

iperov commented May 15, 2018

then what you doing in face swap repo ?

@andenixa
Copy link
Contributor Author

andenixa commented May 15, 2018

@iperov faceswap serves my purpose to some extent. It also doesn't have any working 128 model thus though I could provide one. Still not sure if my "concoction" works good enough (though its gotten much better now). Perhaps you could donate some of your code to create a basic H128 with decent quality and speed for faceswap repo.

@tjess78
Copy link

tjess78 commented Jun 15, 2018

hmn, latest push still says OOM in both modes, even with batch size 2 on GTX 1060 6Gb

@Kirin-kun
Copy link

@tjess78 I think you have to checkout the andenixa-patch-1 branch

@andenixa
Copy link
Contributor Author

andenixa commented Jun 15, 2018

I tested it. Seems to be learning rather well both performance and quality-wise.
Further increasing of image size would require a mask otherwise it won't converge because of the background and Dense limitations (due to memory).
You may expect increase of both image quality as there is much room for the improvement. I might also be implementing re-loading of old saved weights and re-sizing it to fit the new topology if that is reasonably practical.

@Jasas9754
Copy link

Jasas9754 commented Jun 15, 2018

@andenixa
1.What are the benefits of 'shaoanlu' type? I don't think it's learning faster than the original, or clearer. What's the advantage?
2. You have raised encoder dim to 1024(not 512) and changed conv to conv_sep. Do you think encoder dim is more important?

Thank you

@andenixa
Copy link
Contributor Author

andenixa commented Jun 15, 2018

@Jasas9754

  1. I think shaoanlu has better generalization, but you are not required to use it.
  1. conv_sep takes less memory, but the final result is not different from conv, though it might be slower to catch up at later stages. Encoder dim is very important specially with unmasked auto-encoders. In fact its plays a great role for both learning, clarity, and better reconstruction. Though we don't always have the luxury to make it big enough because of video memory.

@tjess78
Copy link

tjess78 commented Jun 15, 2018

Narf, I don't get it. No matter what I do, it will not run on my 1060 6GB.
Here is the error. tried it with last push from gui 3.0 and andenixa patch 1, 2 and 2-1. Could it be an error in my eviroment?

Exception in thread Thread-1:
Traceback (most recent call last):
File "d:\ProgramData\Anaconda3\envs\gui\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "d:\ProgramData\Anaconda3\envs\gui\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "D:\faceswap-and\scripts\train.py", line 97, in process_thread
raise err
File "D:\faceswap-and\scripts\train.py", line 89, in process_thread
self.run_training_cycle(model, trainer)
File "D:\faceswap-and\scripts\train.py", line 124, in run_training_cycle
trainer.train_one_step(epoch, viewer)
File "D:\faceswap-and\plugins\Model_OriginalHighRes\Trainer.py", line 39, in train_one_step
loss_B = self.model.autoencoder_B.train_on_batch(warped_B, target_B)
File "d:\ProgramData\Anaconda3\envs\gui\lib\site-packages\keras\engine\training.py", line 1220, in train_on_batch
outputs = self.train_function(ins)
File "d:\ProgramData\Anaconda3\envs\gui\lib\site-packages\keras\backend\tensorflow_backend.py", line 2661, in call
return self._call(inputs)
File "d:\ProgramData\Anaconda3\envs\gui\lib\site-packages\keras\backend\tensorflow_backend.py", line 2631, in _call
fetched = self._callable_fn(*array_vals)
File "d:\ProgramData\Anaconda3\envs\gui\lib\site-packages\tensorflow\python\client\session.py", line 1454, in call
self._session._session, self._handle, args, status, None)
File "d:\ProgramData\Anaconda3\envs\gui\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[65536,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: training_1/Adam/gradients/model_1/dense_1/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, _class=["loc:@training_1/Adam/gradients/model_1/dense_1/MatMul_grad/MatMul"], transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](model_1/flatten_1/Reshape, training_1/Adam/gradients/model_1/dense_2/MatMul_grad/MatMul)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

@andenixa
Copy link
Contributor Author

andenixa commented Jun 15, 2018

@tjess78 seems to be a genuine OOM. What batch size do you use and are there any other programs that use video memory are running in the background?
I generally test with a clean environment and a dedicated card. You might probably want to disable Aero theme and switch to basic altogether. Also close any applications such as web browser that highly utilize video memory. You should be able to run it with batch sizes 8-13 (depending on free ram).

@Jasas9754
Copy link

People who are short on vram can adjust the settings.

ENCODER_DIM = 512

x = Dense(dense_shape * dense_shape * 1024)(x)
x = Reshape((dense_shape, dense_shape, 1024))(x)

x = self.upscale(512, kernel_initializer=RandomNormal(0, 0.02))(inpt)
x = self.upscale(256, kernel_initializer=RandomNormal(0, 0.02))(x)

I trained up to 100k with it(-bs 8). the quality is still better than the original 64. The speed is also acceptable.

@ruah1984
Copy link

can any one integrate Iperov DeepFacelab to here? Because i think Deepfakes master here should include all kind of faceswap model study . i not sure what will be the different but , i hope master here can include all different deepfakes model such as DFaker, original model, low memory model , LIAEF128YAW (5GB+)

@Kirin-kun
Copy link

Since it's open source, I suppose anyone, with proper credit given, can port Iperov ideas here.

But don't expect him to help.

@andenixa
Copy link
Contributor Author

andenixa commented Jun 19, 2018

@ruah1984 I don't want do put you off as we might actually port some of these models eventually.
But if you are really interested in testing Iperov's models I may suggest just using his fork for the purpose as everything is already in place there.

@Kirin-kun Iperov says he doesn't mind if some of his work is to be ported to faceswap. I actually asked his permission twice and he said I don't need his permission.

@ruah1984
Copy link

I have been try his work. And result looks good .Just hope if we have gui version and put all together inside here as a team.

@torzdf
Copy link
Collaborator

torzdf commented Jun 19, 2018

As @andenixa says. Anyone is welcome to open a PR to port other models.

@oracle9i88
Copy link

TypeError: join() argument must be str or bytes, not 'PosixPath'

@oracle9i88
Copy link

other mode is fine

@torzdf
Copy link
Collaborator

torzdf commented Jun 26, 2018

Should be fixed in latest commit.

@agilebean
Copy link

Yes, I can confirm that the last reported bug (json expects str or bytes) is fixed.
OriginalHighRes works!
Thanks a lot to @andenixa and @torzdf for this extremely short turnaround time, really incredible.

@Kirin-kun
Copy link

Kirin-kun commented Jul 3, 2018

I tested OriginalHighres with a small dataset and it looks really good. The faces are more detailed and have more depth/defined features than with Original.

A few differences:

  • OriginalHighRes seem to learn gaze direction better
  • OriginalhighRes pupils are a little blurrier than Original and the color is lighter/grayish
  • OriginalHighRes conversion seem to blend colors better. I had some patches of lighter skin on the cheeks for Original, but OriginalHighres gave better skin tone overall (same convert params)
  • OriginalHighRes seem to have difficulties learning Half-open/Closed eyes and Open/Smiling mouth. I'm at 130k iterations and the smiles just start to look like the ones produced much faster by Original.

In then end, the only caveats are the pupils that look a little too like lifeless gray circles, in Original, they are more black so it's less visible. And eventually the smiles showing teeth that look blurred, but I will see if it improves after training more.

Overall, I'm really satisfied with this model. It might become my preferred model. And its memory management is amazing. With a GTX 1060 6gb, I can manage a batch size of 16

@andenixa
Copy link
Contributor Author

andenixa commented Jul 4, 2018

@Kirin-kun thank you for your feedback. I think we essentially would need to add mask for eyes to work.
Batch size of 16 has nothing to do with that as it pretty sufficient. For now if you are doing something to produce a video, not just testing the model, I'd suggest to run the training to 300epochs. I know its an overkill but that would probably make the situation a little better with pupils and eye positions.

@Kirin-kun
Copy link

For the moment, I'm not doing videos.

I use photos as source material. It's a lot better looking than flickering videos and it's possible to adjust convert parameters easier than with a video with thousands of frames with different zooming on the faces. I tried to do videos, but I had mixed results (with just about all models). In the end, still pictures of models posing gives the best visually seamless faceswap.

I trained more on the same dataset, from 130k to 150k iterations, and the changes are really minute on the converted faces. When comparing, the differences are barely visible.

Further improvements I see are, eventually, the pupils looking more lifelike and also a way to handle, at convert time, the obstacles, like glasses, hair, hand, hats, etc, that cover parts of the destination face.

@torzdf
Copy link
Collaborator

torzdf commented Sep 22, 2018

Closed as OriginalHiRes is implemented.

@andenixa feel free to open new issues for your new models.

@torzdf torzdf closed this as completed Sep 22, 2018
Repository owner deleted a comment from iperov Jun 29, 2019
Repository owner deleted a comment from iperov Jun 29, 2019
Repository owner deleted a comment from iperov Jun 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests