Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's the difference between pulse and stylegan encoder? #29

Open
danielkaifeng opened this issue Jun 23, 2020 · 4 comments
Open

what's the difference between pulse and stylegan encoder? #29

danielkaifeng opened this issue Jun 23, 2020 · 4 comments

Comments

@danielkaifeng
Copy link

  1. Stylegan encoder predicts face similar to input face, while pulse predicts face whose LR image close to input face.
  2. Just like stylegan encoder, pulse can only predict face based on pretrained GAN like stylegan. If you want to predict something else, you must train a GAN first. In this aspect, it is not general fit SR model.

Thanks for correcting me if I miss anything.

@woctezuma
Copy link

woctezuma commented Jun 23, 2020

I assume that what you call a "StyleGAN encoder" is code like this one. Similarly to what was then suggested in the StyleGAN2 article, the aim is to project a real image in the latent space of the generative model. So what you get is one of the closest generated images to your high-resolution input. Images are close in high-resolution.

The original paper is about training the StyleGAN2 generative model, and the ability to project real images is a nice property. People have then built tools to do so because they wanted to have fun editing real images. Indeed, this allows to edit images by moving from the projection along specified latent directions. You will get results like this one, where Mona Lisa's pose is changed.

StyleGAN2 encoder and editor

In contrast, PULSE looks for plausible images in the latent space which would downscale correctly with respect to the low-resolution image input, e.g. PULSE produces a plausible 1024x1024 image which would downscale close to a 16x16 image input. There are many potential 1024x1024 images, so what you get is one among very many suitable candidates. Images are close in low-resolution.

You can play with Colab.

Figure 3 in the PULSE paper

Overall, I agree that one could find that there are broad similarities between the projects, as in one looks for a plausible image in the latent space in both cases. However, as you pointed out, there are differences in terms of input (high-resolution vs. low-resolution) and objective (find the projection vs. find one of very many equally probable candidates).

One cool experiment could be to play with these two Colab notebooks (projector vs. upsampling) and compare the kind of results that you get by feeding a low-resolution image to the StyleGAN2 projector. I would not expect the projector to perform well at all, because it will try to produce a blurry high-resolution image which would look like the low-resolution image but at a high-resolution. It is quite easy to produce unrealistic images with the projector, while PULSE sticks to producing plausible high-resolution images.

Anyway, let me know if I am mistaken, or if I rehashed information which you already knew. ;)

@danielkaifeng
Copy link
Author

With pretrained projector's weight and tricky finetune skill, you can get very similar HR image, like this one:
face1_01
face1_01

For very blur face, you can use a very small learning rate and increase GAN loss to ensure it doesn't generate blur HR image.
image
1_01

I believe stylegan-encoder's projection method can do a good job in face SR, and PULSE improve it's generalization by changing the optimization target from (input vs predict) to (input vs predict's downsampling low-resolution image).

@danielkaifeng
Copy link
Author

To make a step forward, I think both stylegan-encoder and PULSE couldn't make a general SR application. You can't get a full body person SR unless you train it. So the SR effect rely a lot on your pretrained GAN. You trained a good GAN, you get nice result and vice verse.

Even in face generation, we must align the face first, which means the eyes, mouth etc should in the absolute exact position. It would be much better to generalize it to less preprocess image input.

@yuqiu1233
Copy link

With pretrained projector's weight and tricky finetune skill, you can get very similar HR image, like this one:
face1_01
face1_01

For very blur face, you can use a very small learning rate and increase GAN loss to ensure it doesn't generate blur HR image.
image
1_01

I believe stylegan-encoder's projection method can do a good job in face SR, and PULSE improve it's generalization by changing the optimization target from (input vs predict) to (input vs predict's downsampling low-resolution image).

Can you tell me how to finetune this project? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants