Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the reasoning behing using PixelShuffler over Transposed Conv #127

Closed
FabianIsensee opened this issue Feb 6, 2018 · 11 comments
Closed

Comments

@FabianIsensee
Copy link

Hi,
while reading the code of the decoder I noticed that you are using PixelShuffler instead of Transposed Conv to increase the spatial dimension of the output. This is done by mapping some of the 'color' channels into the spatial dimension. Why is this approach chosen over convolution transposed?
Kind regards,
Fabian

@Clorr
Copy link
Contributor

Clorr commented Feb 6, 2018

My opinion is that Deepfakes original code is far from optimized and is the result of some experimentations. For me the model is quite strange, but I may not be experienced enough to estimate that. I think that it is really possible to optimize it, and you should try to apply your knowledge there and see if your approach brings better results ;-)

That' really the reason behind the fact I did a plugin architecture, because we should try other solutions, like the GAN one.

@FabianIsensee
Copy link
Author

Hey thanks for getting back to me! I was actually not referring to the Plugin structure itself (which I find kind of hard to read but that's your design choice and like you said certainly makes things much easier when swapping components). What I was referring to is that the increase in spatial dimension in the decoder is done by reshaping the output to have less color channels but larger spatial extend. Some color channels thereby encode similar information but slightly shifted spatially. This is unlike anything I have seen so far. I am not questioning that at all (especially since it seems to work); I would just like to know why this approach was chosen over simply using a transposed convolution which would be the much more obvious choice in this context. Was there a specific reasoning behind it?

What exactly do you mean by 'the GAN one'? Do you have a specific experiment in mind that you would like to try? I guess working with GANs might improve results but will also complicate things. If you ditch the l1 loss the network is currently using to reconstruct images with an adversarial loss you will need to condition the generated image on something to ensure that the same person/expression you presented to the encoder was reconstructed by the decoder. Or am I missing something here?

@Clorr
Copy link
Contributor

Clorr commented Feb 7, 2018

I think there is some misunderstanding here...

I'm not the one who did the original code and I don't know the reason why it is made so. My "opinion" is that there is no strong reason, and that the original code is more the result of an experiment than a strongly theoritized approach.

From here what I say, is: if you think you have a better approach, or even just an alternative, feel free to try and see if you can do better. If so, you can propose alternatives here, thanks to the plugin architecture (which, I agree, is more complex than just a flat script).

The GAN version I'm refering to is here, and I'm on the way to make this a plugin because it is an interesting approach

@Clorr Clorr closed this as completed Feb 14, 2018
@Clorr
Copy link
Contributor

Clorr commented Mar 19, 2018

PixelShuffler seems to be an implementation of this paper: https://arxiv.org/pdf/1609.05158.pdf

@kvrooman
Copy link
Contributor

A good high level summary of the issue and why the sub-pixel shuffle was implemented
https://distill.pub/2016/deconv-checkerboard/

Note: I haven't really read through which method is implemented in Pixel_Shuffler.py There have been dozens of suggested improvements since the original paper came out.

In this thread
keras-team/keras#3940
someone found that standard keras ( Upsample2D followed by a Conv2D ) was superior to the ( Conv2D then Pixel Shuffle ) approach

One potential paper discussing the concepts in a bit more detail as well as a potential upgrade
https://arxiv.org/ftp/arxiv/papers/1707/1707.02937.pdf

Honestly, the best practices should likely be lifted from the winners of the NTIRE 2017 Super-Resolution Challenge

@FabianIsensee
Copy link
Author

Thanks a lot! That is very helpful!

@Clorr
Copy link
Contributor

Clorr commented Mar 22, 2018

Note that i'm currently experimenting different setups for the decoder part, you can check my dedicated branch . It did not give much success for now, but at least it enables me to learn different things about NNs

@Clorr
Copy link
Contributor

Clorr commented Mar 22, 2018

The guy @titu1994 who posted keras-team/keras#3940 is incredible, he has many implementations of common architectures in keras, it is awesome!

@kvrooman
Copy link
Contributor

kvrooman commented Mar 22, 2018

yup, he's added a lot to keras.contrib . I'm almost done implementing a tweak of his version of FCN-densenet ( from the 100 layer tiramisu paper) as a faceswap autoencoder. have high expectations of improvements

we really need to move the models over to using fit() as well so we can create Tensorboards on them

@kvrooman
Copy link
Contributor

kvrooman commented Mar 22, 2018

regardless of whether the upsample2d is better or not that sub-pixel shuffling, I was also thinking of suggesting using the official keras.contrib version rather than our Pixel Shuffler custom code. better dependencies , testing, and update reliance... if the quality turns out to be comparable

@Clorr
Copy link
Contributor

Clorr commented Mar 23, 2018

Yes we have many possibilities here ;-) The thing is that after having checked many different setups, I think updating PixelShuffler or switching to upsample2d will only bring small improvments, so I'm still trying new architectures.

The tiramisu paper seems interesting but is a bit above my understanding for now, but I'll check the FCN approach, it seems very interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants