## Progressive Growing of GANs
*Note: the implementation followed from this Youtube [video](https://www.youtube.com/watch?v=nkQHASviYac)*
*Link to github [here](https://github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/Pytorch/GANs/ProGAN/model.py)*

By now, you probably have realized how difficult it can be to train GANs. They are fairly unstable, especially when trying to generate high-dimensional samples, such as high-resolution images! 

However, researchers never lack ideas to improve them and this 2017 paper made trainings of GANs more stable: *Progressive growing of GANs for improved quality, stability and variation*.

The main idea behind this paper is the following: since training GANs on smaller images is easier, we can progressively grow the network and the generated images' dimensions to make training easier for the network. It is illustrated by the figure below:

<img src='assets/progan2.png' width=70% />


### Layer fading 

Each level, or depth, is training for a certain number of epochs (e.g., 10 epochs). Then a new layer is added in the discriminator and the generator and we start training with these additional layers. However, when a new layer is added, it is faded in smoothly, as described by the following figure:

<img src='assets/layer_fading2.png' width=70% />

The `toRGB` and `fromRGB` layers are the layers projecting the feature vector to the RGB space (HxWx3) and the layer doing the opposite, respectively. 

Let's look at the example:
* **(a)** The network is currrently training at 16x16 resolution, meaning that the generated images are 16x16x3
* **(b)** We are adding two new layers to train at 32x32 resolution. However, we are fading in the new layers by doing the following:
    * For the generator, we take the output of the 16x16 layer and use nearest neighbor image resize to double its resolution to 32x32. The same output will also be fed to the 32x32 layer. Then we calculate the output of the network by doing a weighted sum of $(1- \alpha)$ the upsampled 16x16 image and $\alpha$ the 32x32 layer output. 
    * For the discriminator, we do something similar but to reduce the resolution, we use an average pooling layer
    * The network trains for N epochs at each resolution. During the first $N/2$ epochs, we start with $/alpha = 0$ and increase alpha linearly to $/alpha = 1$. Then we train for the remaining $N/2$ epochs with $/alpha = 1$.
* **(c)** The network is now training at 32x32 resolution

### Architecture used in the paper.
**note**: 
  * the first conv layer of generator is transpose to go from 1x1 to 4x4
  * The other blocks are the same for generator and discriminator except that in discriminator they don't use pixelNorm.
  * After each block we should have an RGB in between blocks.
  * In the end, we only need the RGB in the last layer as ones in between are for progressive training only.
<img src='assets/progan_architecture.png' width=80% />

**Note:** In the paper, the authors are using a new type of normalization, called PixelNormalization. I encourage you to read the paper but for the sake of simplicity, I did not add any normalization here. 

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
# 512 x4 -> 256 -> 128 -> 64 -> 32 -> 16 -> 1
factors = [1, 1, 1, 1, 1/2, 1/4, 1/8, 1/16, 1/32]

In [3]:
class WSConv2d(nn.Module):
    """ 
    Equalized learning rate: scale the weights by a factor.
        in the paper, the authors claim that the learning rate can 
    be big for some weights while small for some other weights.
    so, having the dynamic range weights will make sure they require the same learning rate;
    W = W * sqrt(2/(k**2)*in_channels) for every convblock at farward
    """    
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, gain=2):
        super().__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.scale = (gain/(in_channels*kernel_size**2)) ** 0.5 
        # the bias should not be scaled.
        self.bias = self.conv.bias
        self.conv.bias = None
        
        # initialize conv layer
        nn.init.normal_(self.conv.weight)
        nn.init.zeros_(self.bias)
        
    def forward(self, x):
        return  self.conv(x * self.scale) + self.bias.view(1, self.bias.shape[0], 1, 1)

In [4]:
class WSConvTranspose2d(nn.Module):
    """ 
    Equalized learning rate: scale the weights by a factor.
        in the paper, the authors claim that the learning rate can 
    be big for some weights while small for some other weights.
    so, having the dynamic range weights will make sure they require the same learning rate;
    W = W * sqrt(2/(k**2)*in_channels) for every convblock at farward
    """    
    def __init__(self, z_dim, out_channels, kernel_size=3, stride=1, padding=1, gain=2):
        super().__init__()
        self.tconv = nn.ConvTranspose2d(z_dim, out_channels, kernel_size, stride, padding)
        self.scale = (gain/(z_dim*kernel_size**2)) ** 0.5 
        # the bias should not be scaled.
        self.bias = self.tconv.bias
        self.tconv.bias = None
        
        # initialize conv layer
        nn.init.normal_(self.tconv.weight)
        nn.init.zeros_(self.bias)
        
    def forward(self, x):
        return  self.tconv(x * self.scale) + self.bias.view(1, self.bias.shape[0], 1, 1)

In [5]:
class PixelNorm(nn.Module):
    def __init__(self):
        super().__init__()
        self.epsilon = 1e-8

    def forward(self, x):
        return x / torch.sqrt(torch.mean(x ** 2, dim=1, keepdim=True) + self.epsilon)

In [6]:
class ConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, use_pixelNorm=True):
        super().__init__()
        self.conv1 = WSConv2d(in_channels, out_channels)
        self.conv2 = WSConv2d(out_channels, out_channels)
        self.leaky = nn.LeakyReLU(0.2)
        self.pn = PixelNorm()
        self.use_pn = use_pixelNorm
        
    def forward(self, x):
        x = self.leaky(self.conv1(x))
        x = self.pn(x) if self.use_pn else x
        
        x = self.leaky(self.conv2(x))
        x = self.pn(x) if self.use_pn else x
        
        return x

In [7]:
class GeneratorFirstBlock(nn.Module):
    """
    This block follows the ProGan paper implementation.
    Takes the latent vector and creates feature maps.
    """
    def __init__(self, latent_dim: int):
        super().__init__()
        # initial block 
        self.conv0 = nn.ConvTranspose2d(latent_dim, 512, kernel_size=4) # 1x1 -> 4x4
        self.conv1 = nn.Conv2d(512, 512, kernel_size=3, padding=1) # 4x4 -> 4x4
        self.activation = nn.LeakyReLU(0.2)

    def forward(self, x: torch.Tensor):
        # x is a (batch_size, latent_dim) latent vector, we need to turn it into a feature map
        x = torch.unsqueeze(torch.unsqueeze(x, -1), -1) # batch_size, latent_dim, H, W
        x = self.conv0(x)
        x = self.activation(x)
        
        x = self.conv1(x)
        x = self.activation(x)
        return x

Using the above two blocks, you can implement the Generator module. The end resolution that we want to reach is 512x512 and we will start at a 4x4 resolution. 


#### init
The `__init__` method should contain enough blocks to work at full resolution. We are only instantiating the generator once! So you will need to:
* Create one GeneratorFirstBlock module
* Create enough GeneratorBlocks modules such that the final resolution is 512x512
* Create one `toRGB` layer per resolution. 

The number of filters in each layer is controlled by the `num_filters` function below.


#### forward

The forward method does the following:
* Takes the latent vector, the current resolution and `alpha` as input 
* Runs the latent vector through the different blocks and performs `alpha` fading


In the original paper, the number of filters of convolution layers increases with depth. The `num_filters` function below will help you progammatically increase the number of filters based on the stage (or depth) of the generator. A depth of 1 correspond to 4x4 resolution, a depth of 2 to an 8x8 resolution etc. 

* you can the torch `interpolate` function to double the resolution of an image
* you can use the `np.log2` function to map the resolution of the input image to a "depth" (or stage) level. For example, `np.log2(512) = 9` and `np.log2(4)` = 2.
* when training at 4x4 resolution, you should not perform $\alpha-$fading.
</details>


In [8]:
# import tests

from torch.nn.functional import interpolate

In [9]:
def num_filters(stage: int, 
                fmap_base: int = 8192,
                fmap_decay: float = 1.0,
                fmap_max: int = 512): 
    """
    A small helper function to compute the number of filters for conv layers based on the stage/depth,
    stage = log2(resolution)
    From the original repo https://github.com/tkarras/progressive_growing_of_gans/blob/master/networks.py#L252
    """
    return min(int(fmap_base / (2.0 ** (stage * fmap_decay))), fmap_max)

In [10]:
num_filters(5)
int(np.log2(256))

8

In [11]:
class Generator(nn.Module):
    def __init__(self, z_dim, in_channels, img_channels=3):
        super().__init__()
        self.initial = nn.Sequential(
            WSConvTranspose2d(z_dim, in_channels, kernel_size=4, stride=1, padding=0), # 1x1 -> 4x4
            nn.LeakyReLU(0.2),
            WSConv2d(in_channels, in_channels),
            nn.LeakyReLU(0.2),
            PixelNorm(),
        )
        self.initial_rgb = WSConv2d(in_channels, img_channels, kernel_size=1, stride=1, padding=0)
        self.prog_blocks, self.rgb_layers = nn.ModuleList(), nn.ModuleList([self.initial_rgb])
        
        for i in range(len(factors)-1):
            # factors[i] -> factors[i+1]
            conv_in_c = int(in_channels*factors[i])
            conv_out_c = int(in_channels*factors[i+1])
            self.prog_blocks.append(ConvBlock(conv_in_c, conv_out_c))
            self.rgb_layers.append(WSConv2d(conv_out_c, img_channels, kernel_size=1, stride=1, padding=0))
            
        
    def fade_in(self, alpha, upscaled, generated):
        return torch.tanh(alpha*generated + (1-alpha)*upscaled)
        
    def forward(self, x, alpha, steps):
        out = self.initial(x)
        
        if steps == 0:
            return self.initial_rgb(out)
        
        for step in range(steps):
            upscaled = F.interpolate(out, scale_factor=2, mode="nearest")
            out = self.prog_blocks[step](upscaled)
        
        last_upscaled = self.rgb_layers[steps-1](upscaled) # take the last upscaled result and run it through rgb
        final_output = self.rgb_layers[steps](out) # take the last generated conv image and run it through rgb
        
        return self.fade_in(alpha, last_upscaled, final_output) # final result is a mixture of upscaled and actual output.

In [12]:
generator = Generator(z_dim=128, in_channels=512)

In [13]:
class Discriminator(nn.Module):
    def __init__(self, in_channels, img_channels=3):
        super(Discriminator, self).__init__()
        self.prog_blocks, self.rgb_layers = nn.ModuleList([]), nn.ModuleList([])
        self.leaky = nn.LeakyReLU(0.2)

        # here we work back ways from factors because the discriminator
        # should be mirrored from the generator. So the first prog_block and
        # rgb layer we append will work for input size 1024x1024, then 512->256-> etc
        for i in range(len(factors) - 1, 0, -1):
            conv_in = int(in_channels * factors[i])
            conv_out = int(in_channels * factors[i - 1])
            self.prog_blocks.append(ConvBlock(conv_in, conv_out, use_pixelNorm=False))
            self.rgb_layers.append(
                WSConv2d(img_channels, conv_in, kernel_size=1, stride=1, padding=0)
            )

        # perhaps confusing name "initial_rgb" this is just the RGB layer for 4x4 input size
        # did this to "mirror" the generator initial_rgb
        self.initial_rgb = WSConv2d(
            img_channels, in_channels, kernel_size=1, stride=1, padding=0
        )
        self.rgb_layers.append(self.initial_rgb)
        self.avg_pool = nn.AvgPool2d(
            kernel_size=2, stride=2
        )  # down sampling using avg pool

        # this is the block for 4x4 input size
        self.final_block = nn.Sequential(
            # +1 to in_channels because we concatenate from MiniBatch std
            WSConv2d(in_channels + 1, in_channels, kernel_size=3, padding=1),
            nn.LeakyReLU(0.2),
            WSConv2d(in_channels, in_channels, kernel_size=4, padding=0, stride=1),
            nn.LeakyReLU(0.2),
            WSConv2d(
                in_channels, 1, kernel_size=1, padding=0, stride=1
            ),  # we use this instead of linear layer
        )

    def fade_in(self, alpha, downscaled, out):
        """Used to fade in downscaled using avg pooling and output from CNN"""
        # alpha should be scalar within [0, 1], and upscale.shape == generated.shape
        return alpha * out + (1 - alpha) * downscaled

    def minibatch_std(self, x):
        batch_statistics = (
            torch.std(x, dim=0).mean().repeat(x.shape[0], 1, x.shape[2], x.shape[3])
        )
        # we take the std for each example (across all channels, and pixels) then we repeat it
        # for a single channel and concatenate it with the image. In this way the discriminator
        # will get information about the variation in the batch/image
        return torch.cat([x, batch_statistics], dim=1)

    def forward(self, x, alpha, steps):
        # where we should start in the list of prog_blocks, maybe a bit confusing but
        # the last is for the 4x4. So example let's say steps=1, then we should start
        # at the second to last because input_size will be 8x8. If steps==0 we just
        # use the final block
        cur_step = len(self.prog_blocks) - steps

        # convert from rgb as initial step, this will depend on
        # the image size (each will have it's on rgb layer)
        out = self.leaky(self.rgb_layers[cur_step](x))

        if steps == 0:  # i.e, image is 4x4
            out = self.minibatch_std(out)
            return self.final_block(out).view(out.shape[0], -1)

        # because prog_blocks might change the channels, for down scale we use rgb_layer
        # from previous/smaller size which in our case correlates to +1 in the indexing
        downscaled = self.leaky(self.rgb_layers[cur_step + 1](self.avg_pool(x)))
        out = self.avg_pool(self.prog_blocks[cur_step](out))

        # the fade_in is done first between the downscaled and the input
        # this is opposite from the generator
        out = self.fade_in(alpha, downscaled, out)

        for step in range(cur_step + 1, len(self.prog_blocks)):
            out = self.prog_blocks[step](out)
            out = self.avg_pool(out)

        out = self.minibatch_std(out)
        return self.final_block(out).view(out.shape[0], -1)


In [14]:
if __name__ == "__main__":
    Z_DIM = 100
    IN_CHANNELS = 512
    gen = Generator(Z_DIM, IN_CHANNELS, img_channels=3)
    critic = Discriminator(IN_CHANNELS, img_channels=3)

    for img_size in [4, 8, 16, 32, 64, 128, 256, 512, 1024]:
        num_steps = int(np.log2(img_size / 4))
        x = torch.randn((1, Z_DIM, 1, 1))
        z = gen(x, 0.5, steps=num_steps)
        assert z.shape == (1, 3, img_size, img_size)
        out = critic(z, alpha=0.5, steps=num_steps)
        print(out.shape)
        assert out.shape == (1, 1)
        print(f"Success! At img size: {img_size}")

torch.Size([1, 1])
Success! At img size: 4
torch.Size([1, 1])
Success! At img size: 8
torch.Size([1, 1])
Success! At img size: 16
torch.Size([1, 1])
Success! At img size: 32
torch.Size([1, 1])
Success! At img size: 64
torch.Size([1, 1])
Success! At img size: 128
torch.Size([1, 1])
Success! At img size: 256
torch.Size([1, 1])
Success! At img size: 512
torch.Size([1, 1])
Success! At img size: 1024


# Training

In [None]:
""" Training of ProGAN using WGAN-GP loss"""

import torch
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from utils import (
    gradient_penalty,
    plot_to_tensorboard,
    save_checkpoint,
    load_checkpoint,
    generate_examples,
)
# from model import Discriminator, Generator
from math import log2
from tqdm import tqdm
import config

torch.backends.cudnn.benchmarks = True


def get_loader(image_size):
    transform = transforms.Compose(
        [
            transforms.Resize((image_size, image_size)),
            transforms.ToTensor(),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.Normalize(
                [0.5 for _ in range(config.CHANNELS_IMG)],
                [0.5 for _ in range(config.CHANNELS_IMG)],
            ),
        ]
    )
    batch_size = config.BATCH_SIZES[int(log2(image_size / 4))]
    dataset = datasets.ImageFolder(root=config.DATASET, transform=transform)
    loader = DataLoader(
        dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=config.NUM_WORKERS,
        pin_memory=True,
    )
    return loader, dataset


def train_fn(
    critic,
    gen,
    loader,
    dataset,
    step,
    alpha,
    opt_critic,
    opt_gen,
    tensorboard_step,
    writer,
    scaler_gen,
    scaler_critic,
):
    loop = tqdm(loader, leave=True)
    for batch_idx, (real, _) in enumerate(loop):
        real = real.to(config.DEVICE)
        cur_batch_size = real.shape[0]

        # Train Critic: max E[critic(real)] - E[critic(fake)] <-> min -E[critic(real)] + E[critic(fake)]
        # which is equivalent to minimizing the negative of the expression
        noise = torch.randn(cur_batch_size, config.Z_DIM, 1, 1).to(config.DEVICE)

        with torch.cuda.amp.autocast():
            fake = gen(noise, alpha, step)
            critic_real = critic(real, alpha, step)
            critic_fake = critic(fake.detach(), alpha, step)
            gp = gradient_penalty(critic, real, fake, alpha, step, device=config.DEVICE)
            loss_critic = (
                -(torch.mean(critic_real) - torch.mean(critic_fake))
                + config.LAMBDA_GP * gp
                + (0.001 * torch.mean(critic_real ** 2))
            )

        opt_critic.zero_grad()
        scaler_critic.scale(loss_critic).backward()
        scaler_critic.step(opt_critic)
        scaler_critic.update()

        # Train Generator: max E[critic(gen_fake)] <-> min -E[critic(gen_fake)]
        with torch.cuda.amp.autocast():
            gen_fake = critic(fake, alpha, step)
            loss_gen = -torch.mean(gen_fake)

        opt_gen.zero_grad()
        scaler_gen.scale(loss_gen).backward()
        scaler_gen.step(opt_gen)
        scaler_gen.update()

        # Update alpha and ensure less than 1
        alpha += cur_batch_size / (
            (config.PROGRESSIVE_EPOCHS[step] * 0.5) * len(dataset)
        )
        alpha = min(alpha, 1)

        if batch_idx % 500 == 0:
            with torch.no_grad():
                fixed_fakes = gen(config.FIXED_NOISE, alpha, step) * 0.5 + 0.5
            plot_to_tensorboard(
                writer,
                loss_critic.item(),
                loss_gen.item(),
                real.detach(),
                fixed_fakes.detach(),
                tensorboard_step,
            )
            tensorboard_step += 1

        loop.set_postfix(
            gp=gp.item(),
            loss_critic=loss_critic.item(),
        )

    return tensorboard_step, alpha


def main():
    # initialize gen and disc, note: discriminator should be called critic,
    # according to WGAN paper (since it no longer outputs between [0, 1])
    # but really who cares..
    gen = Generator(
        config.Z_DIM, config.IN_CHANNELS, img_channels=config.CHANNELS_IMG
    ).to(config.DEVICE)
    critic = Discriminator(
        config.IN_CHANNELS, img_channels=config.CHANNELS_IMG
    ).to(config.DEVICE)

    # initialize optimizers and scalers for FP16 training
    opt_gen = optim.Adam(gen.parameters(), lr=config.LEARNING_RATE, betas=(0.0, 0.99))
    opt_critic = optim.Adam(
        critic.parameters(), lr=config.LEARNING_RATE, betas=(0.0, 0.99)
    )
    scaler_critic = torch.cuda.amp.GradScaler()
    scaler_gen = torch.cuda.amp.GradScaler()

    # for tensorboard plotting
    writer = SummaryWriter(f"logs/gan1")

    if config.LOAD_MODEL:
        load_checkpoint(
            config.CHECKPOINT_GEN, gen, opt_gen, config.LEARNING_RATE,
        )
        load_checkpoint(
            config.CHECKPOINT_CRITIC, critic, opt_critic, config.LEARNING_RATE,
        )

    gen.train()
    critic.train()

    tensorboard_step = 0
    # start at step that corresponds to img size that we set in config
    step = int(log2(config.START_TRAIN_AT_IMG_SIZE / 4))
    for num_epochs in config.PROGRESSIVE_EPOCHS[step:]:
        alpha = 1e-5  # start with very low alpha
        loader, dataset = get_loader(4 * 2 ** step)  # 4->0, 8->1, 16->2, 32->3, 64 -> 4
        print(f"Current image size: {4 * 2 ** step}")

        for epoch in range(num_epochs):
            print(f"Epoch [{epoch+1}/{num_epochs}]")
            tensorboard_step, alpha = train_fn(
                critic,
                gen,
                loader,
                dataset,
                step,
                alpha,
                opt_critic,
                opt_gen,
                tensorboard_step,
                writer,
                scaler_gen,
                scaler_critic,
            )

            if config.SAVE_MODEL:
                save_checkpoint(gen, opt_gen, filename=config.CHECKPOINT_GEN)
                save_checkpoint(critic, opt_critic, filename=config.CHECKPOINT_CRITIC)

        step += 1  # progress to the next img size


if __name__ == "__main__":
    main()

2023-11-13 14:51:52.089620: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-13 14:51:52.954797: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-11-13 14:51:52.954903: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/site

Current image size: 128
Epoch [1/30]


100%|██████████| 1875/1875 [08:25<00:00,  3.71it/s, gp=0.119, loss_critic=-11.4]  


=> Saving checkpoint
=> Saving checkpoint
Epoch [2/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0909, loss_critic=-7]    


=> Saving checkpoint
=> Saving checkpoint
Epoch [3/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.73it/s, gp=0.17, loss_critic=3.36]    


=> Saving checkpoint
=> Saving checkpoint
Epoch [4/30]


100%|██████████| 1875/1875 [08:25<00:00,  3.71it/s, gp=0.0745, loss_critic=-4.65] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [5/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0935, loss_critic=-5.86] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [6/30]


100%|██████████| 1875/1875 [08:24<00:00,  3.72it/s, gp=0.0524, loss_critic=-4.98] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [7/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0582, loss_critic=-5.61] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [8/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0443, loss_critic=-2.58] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [9/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0514, loss_critic=-1.44] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [10/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0521, loss_critic=-3.48] 


=> Saving checkpoint
=> Saving checkpoint
Epoch [11/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0593, loss_critic=-1.25]  


=> Saving checkpoint
=> Saving checkpoint
Epoch [12/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0504, loss_critic=1.2]    


=> Saving checkpoint
=> Saving checkpoint
Epoch [13/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0268, loss_critic=-2.74]  


=> Saving checkpoint
=> Saving checkpoint
Epoch [14/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0394, loss_critic=-2.42]   


=> Saving checkpoint
=> Saving checkpoint
Epoch [15/30]


100%|██████████| 1875/1875 [08:23<00:00,  3.72it/s, gp=0.0381, loss_critic=-2.76]  


=> Saving checkpoint
=> Saving checkpoint
Epoch [16/30]


 95%|█████████▍| 1774/1875 [07:56<00:26,  3.74it/s, gp=0.0385, loss_critic=-3.46] 