
#### Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a class of machine learning models introduced by Ian Goodfellow in 2014, designed to generate realistic synthetic data by using two neural networks: a generator and a discriminator. The generator creates fake data, such as images, from random noise, while the discriminator evaluates whether the data is real (from the training set) or fake (produced by the generator).

These networks compete in a min-max game: the generator tries to fool the discriminator by producing increasingly realistic data, while the discriminator gets better at distinguishing real from fake. This adversarial training helps the generator learn to create highly realistic outputs.

GANs have found applications in various fields, including image generation, style transfer, super-resolution, and data augmentation. However, they can be challenging to train due to issues like mode collapse and training instability, requiring careful tuning of hyperparameters and architectures.

Despite these challenges, GANs remain one of the most popular techniques for generative tasks due to their ability to produce high-quality outputs.

#### Deep Convolutional GAN (DCGAN)
Deep Convolutional GAN (DCGAN) is an extension of the original GAN architecture that utilizes convolutional neural networks to improve the quality of generated images. Introduced by Alec Radford, Luke Metz, and Soumith Chintala in 2015, DCGAN replaces the fully connected layers used in traditional GANs with convolutional and transposed convolutional layers, making it particularly effective for image generation tasks.

The generator in DCGAN uses transposed convolutions to upsample the latent noise vector into high-resolution images, while the discriminator applies convolutions to classify images as real or fake. Key techniques like batch normalization and Leaky ReLU activation help stabilize training and prevent issues like mode collapse.

DCGAN has been influential in advancing image generation, serving as a foundation for more sophisticated architectures like StyleGAN and CycleGAN. Its ability to leverage convolutional structures makes it ideal for tasks such as image synthesis, super-resolution, and artistic style transfer.

#### GAN vs DCGAN

| Aspect                        | GAN                                         | DCGAN                                     |
|-------------------------------|---------------------------------------------|-------------------------------------------|
| **Architecture**              | Fully connected layers (MLPs)               | Convolutional layers (Conv2d, ConvTranspose2d) |
| **Spatial Awareness**         | No spatial awareness                        | Captures spatial features and hierarchies |
| **Generator Upsampling**      | Uses linear layers                          | Uses transposed convolutions              |
| **Discriminator**             | Uses linear layers                          | Uses convolutional layers                 |
| **Activation Functions**      | Sigmoid/Tanh                                | ReLU in generator, Leaky ReLU in discriminator |
| **Normalization**             | Not consistently used                       | Batch normalization in most layers        |
| **Training Stability**        | Prone to instability and mode collapse      | More stable, reduced mode collapse        |
| **Output**                    | Works with various data types               | Primarily designed for image generation   |


#### Model training

Trained on Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz   2.59 GHz with 24 GB RAM

##### 1st Iteration: Trained simple GAN on MNIST with (1 min 20 seconds per epoch)

* Constant learning rate of 0.0002 for both discriminator and generator 
* Batch size = 64
* latent dimensions = 100
* Weights initialization with Normal distribution

![generated images](../images/gan_genupd1_epoch_24.png)
![loss](../images/gan_genupd1_losses.png)


##### 2nd Iteration: Simple GAN with (1 min 5 sec per epoch)

* Batch size=128
* Label smoothing to real labels
* Adam optimizer with betas=(0.5, 0.999)
* Train generator weight updates = 2 times the discriminator
* lr for genertaor = 0.0002, discriminator = 0.0001
* weight initialization (normal distribution)
* latent dims = 256

![generated images](../images/gan_genupd2_epoch_16.png)

NOTE: iteration1 produced better results than iteration2

##### DCGANS  (Time per epoch ~ 11 minutes)

* Batch Size 64
* Adam optimizer with betas=(0.5, 0.999)
* lr = 0.0002 for both discriminator and generator
* latent dims = 100

![gen and disc losses](../images/final_dcgan_losses.png)
![generated images](../images/dcgan_epoch_16.png)

##### DCGANS  (Time per Epoch ~ 12 minutes)

* Batch Size = 64
* latent dims = 100
* Train generator weight updates = 2 times the discriminator
* Adam optimizer with betas=(0.5, 0.999)
* lr for genertaor = 0.0002, discriminator = 0.0001

![final dcgan loss](../images/dcgan_genupd2_losses.png)
![generated images](../images/dcgan_genupd2_epoch_16.png)


The discriminator loss decreasing while the generator loss increases often indicates an imbalance in the training dynamics. The above strategies aim to slow down the discriminator's learning or boost the generator's performance, helping to balance the two networks. Try experimenting with different combinations of these techniques to find the most suitable configuration for your specific GAN model.
I couldn't get the generator loss to converge, nevertheless the generated images look ok. I would resort to compute FID scores just in case to see whether the generated images are any good?
Stopping my experiments because of paucity of time and compute (for complex datasets). The code for FID calculation is in the repo.

##### My observations:

* DCGANS started generated better images early during training but the generator loss could not converge. Not a good fit for simple dataset like MNIST 
* Training Simpla GANs , the generator loss converged but the generated image quality did not improve further
* Even though the DCGANs have higher generator loss comapred to GANS, the generated image quality was better. I would go for FID scores than relying purely on loss.

Source code ----> https://github.com/Kunal627/kunal627.github.io/tree/main/code





#### Further reading and references

* Generative Adversarial Nets - https://arxiv.org/pdf/1406.2661
* Unsupervised representation learning with deep convolutional generative adversarial networks - https://arxiv.org/pdf/1511.06434
 