Skip to content

Student Network

Thanos Masouris edited this page Aug 16, 2022 · 6 revisions

DiStyleGAN

Overview of DiStyleGAN's architecture

Generator

Initially, the Gaussian random noise vector is projected to 128 dimensions, using a Fully Connected layer. Subsequently, the condition embedding, along with a projected noise vector are concatenated and passed through another Fully Connected layer, which is followed by 3 consecutive Upsampling blocks. Each upsampling block consists of an upsample layer (scale_factor=2, mode='nearest'), a 3x3 convolution with padding, a Batch Normalization layer, and a Gated Linear Unit (GLU). Finally, there is a convolutional block, consisting of a 3x3 convolution with padding and a hyperbolic tangent activation function (tanh) which produces the fake image.

Discriminator

DiStyleGAN's discriminator consists of 4 consecutive Downsampling blocks (4x4 strided-convolution, Spectral Normalization, and a LeakyReLU), with each of them reducing the spatial size of the input image by a factor of 2. Subsequently, the logit is flattened, projected to 128 dimensions, and concatenated with the class condition embedding, before being passed through a final fully connected layer to produce the class-conditional discriminator loss.

The initial four downsampling blocks of the Discriminator are the ones producing the feature maps used in the objective function for the Feature Loss.

References

[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.