Student Network

DiStyleGAN

Overview of DiStyleGAN's architecture

Generator

Initially, the Gaussian random noise vector is projected to 128 dimensions, using a Fully Connected layer. Subsequently, the condition embedding, along with a projected noise vector are concatenated and passed through another Fully Connected layer, which is followed by 3 consecutive Upsampling blocks. Each upsampling block consists of an upsample layer (scale_factor=2, mode='nearest'), a 3x3 convolution with padding, a Batch Normalization layer, and a Gated Linear Unit (GLU). Finally, there is a convolutional block, consisting of a 3x3 convolution with padding and a hyperbolic tangent activation function (tanh) which produces the fake image.

Discriminator

DiStyleGAN's discriminator consists of 4 consecutive Downsampling blocks (4x4 strided-convolution, Spectral Normalization, and a LeakyReLU), with each of them reducing the spatial size of the input image by a factor of 2. Subsequently, the logit is flattened, projected to 128 dimensions, and concatenated with the class condition embedding, before being passed through a final fully connected layer to produce the class-conditional discriminator loss.

The initial four downsampling blocks of the Discriminator are the ones producing the feature maps used in the objective function for the Feature Loss.

References

[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Student Network

Generator

Discriminator

References

Table of Contents

Clone this wiki locally