Fixed VGG encoder weights query #7

SidShenoy · 2019-06-03T14:03:32Z

In your paper you mention that 'We use the encoder-decoder architecture with fixed VGG
encoder weights' and you only train the decoder for COCO dataset. However, since you have changed the max pooling/unpooling layers from the original VGG-19 architecture to the wavelet pooling/unpooling layers, you need to train the entire encoder-decoder VGG-19 architecture right? Since these layers are changed, the VGG-19 weights will also get affected as a result of which we cannot use the original ImageNet weights. Can you please provide some clarification on this.

jaejun-yoo · 2019-06-03T15:12:14Z

@SidShenoy Thank you for your comment. That is a very sharp observation!

Yes, indeed, changing the pooling layers affects the following feature maps. Since the output of the max-pooling is now changed to the LL filter (similar to average pooling), the feature map after the pooling layer is slightly different. (Note that the other feature maps from the other three wavelet filters do not propagate to the next layer of the encoder. They are skipped to the decoder so that the only change the encoder has to care about is due to the LL filter change from the max-pooling)

However, as we wrote in the paper, we decided not to touch the encoder but just let the decoder adapt to those changes. You can fine-tune the encoder weights by partially or entirely freeing encoder weight parameters and we actually tried some variants, such as freeing only the following convolution parameters (after pooling layer) so that the change will be dealt in the encoder as well. There was not much difference at the final outcomes so we chose to stick on the simpler training strategy.

This can be explained in two-folds; 1) It is already a well-known phenomenon and a lot of observations were consistently reported that style transfer can be done with changing the max-pooling to average pooling (even though the VGG network was trained using the max-pooling) and the effect is sometimes even better. Similarly, our encoder with LL filter, which is an average pooling with some scaling factor, shares this characteristic. 2) Since the decoder is newly trained, it has enough capacity to deal with such shiftings of the feature maps in the encoder to output a good reconstruction.

Still, your comment is very valuable and we will include our description of the training procedure more in detail to clarify the point. Thx a lot for your attention :)

jaejun-yoo · 2019-06-03T15:19:08Z

Maybe this partial code snippet would help your understanding on what we did:

for param in self.encoder.parameters():
    param.requires_grad = False
self.dec_optim = torch.optim.Adam(
           filter(lambda p: p.requires_grad, self.decoder.parameters()),
           lr = self.lr,
           betas=(self.beta1, self.beta2)
        )
feature, skips = self.encoder(real_image)
recon_image = self.decoder(feature, skips)
feature_recon, _ = self.encoder(recon_image)
recon_loss = self.MSE_loss(recon_image, real_image)
feature_loss = torch.zeros(1).to(self.device)
feature_loss += self.MSE_loss(feature_recon, feature.detach())
loss = recon_loss * self.recon_weight + feature_loss * self.feature_weight
self.reset_grad()
loss.backward()
self.dec_optim.step()

jaejun-yoo closed this as completed Jun 4, 2019

jaejun-yoo mentioned this issue Aug 6, 2019

Can I train with my data? #11

Closed

jaejun-yoo mentioned this issue Dec 2, 2020

About training source #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed VGG encoder weights query #7

Fixed VGG encoder weights query #7

SidShenoy commented Jun 3, 2019

jaejun-yoo commented Jun 3, 2019 •

edited

Loading

jaejun-yoo commented Jun 3, 2019 •

edited

Loading

Fixed VGG encoder weights query #7

Fixed VGG encoder weights query #7

Comments

SidShenoy commented Jun 3, 2019

jaejun-yoo commented Jun 3, 2019 • edited Loading

jaejun-yoo commented Jun 3, 2019 • edited Loading

jaejun-yoo commented Jun 3, 2019 •

edited

Loading

jaejun-yoo commented Jun 3, 2019 •

edited

Loading