# CycleGAN Project Write-up

Anthony Lee 2025-01-06

## Abstract
This project created a Generative Adversarial Network (GAN) as proposed by Ian Goodfellow (Goodfellow et al., 2014) with the generator network inspired by the U-Net architecture and the discriminator to be a simple multi-level perceptron (MLP). The choice of U-Net architecture is to utilize its ability augment data and capture the images' context with relatively few available data. With limited number of Monet data available in contrast the infinitely limitless non-Monet images available, this is a very desireable trait. The GAN also leverage cycled images to further augment the limited number of training data inspired by Amy Jang's CycleGAN tutorial on Kaggle (Monet CycleGAN Tutorial, n.d.). The model is evaluated as part of the Kaggle's ongoing competition "I'm Something of a Patiner Myself" using a modified Frechet Inception Distance (FID) (Bioinf-Jku/TTUR, 2017/2024). The model performance received a score of 145.49744 from approximatley 7000 generated Monets after training for 10 epochs. For reference, the best performing model on the leadership board received a score of 33.82955 at this moment of writing.


## Introduction
Happy new year!

In 2014, Ian Goodfellow published a framework of machine learning model and termed it Generative Adversarial Net (GAN), which is composed of a generator and a discriminator. The generator and discriminator are both deep multi-level perceptron models capable of backpropagations. A unique solution for this framework exists when the generator `G` able to recover the training data's distribution and the discriminator `D` equal to `1/2` everywhere (Goodfellow et al., 2014). The elimination of Markov chain sidesteps the issue faced by intractable likelihood functions in much more complex models.

In this project I aimed to implement my first GAN instead of focusing too much on model performance and optimizations. Additionally, practicing ways to organize my code and diagram to assist with my own understanding of the complex model.

## Method
Overview of the CycleGAN model
<div><img src="../CycleGAN_Process.png" width="800" style="display: block; margin-left: auto; margin-right: auto;"/></div>

### Model
The challenge here is an image-to-image translation problem and thus aims to learn a mapping from one image style domain to another. To obtain paired datasets such as the same picture but of the style of a Claude Monet's painting and another of candid photograph is not easy.

CycleGAN (Zhu et al., 2020) incoporates an algorithm to learn the translation between the two image style domains without paired input-output examples for the model to train on. The incorporation of "cycle consistency loss" further enhances the GAN model proposed by Goodfellow et al. 2014. The paper summarizes that the model is successful in color/texture translations, however geometric changes found to be challenging (Zhu et al., 2020).

My implementation is inspired by Amy Jang's implementation of the CycleGAN (Monet CycleGAN Tutorial, n.d.). The model consists of four main components, the monet-generator, photo-generator, monet-discriminator, and photo-discriminator. The generators' role is to generator respective images of the style of Claude Monet (monet-generator) and candid photographs (photo-generator) by transforming a 256-pixel by 256-pixel image. Contrastively, the discriminators aims to discern a real Monet (monet-discriminator) or real image (photo-discriminator) by outputing a 30-pixel by 30-pixel image.

### Generators and Discriminators
The generators are modeled after a U-NET for its strong use of data augmentation "to teach the network the desired invariance and robustness properties when only few training samples are available" (Ronneberger et al., 2015). The contracting path with a symmetrical expanding path of the U-NET enables preceise localization as the perception field increases with each contraction layer.

The discriminator was built as a simple contracting multi-level perceptron with padding and dropout to augment the data by introducing noise to the input data (Bouthillier et al., 2016). The output of the discriminator is 30-pixel by 30-pixel representing whether the input is a real Monet or a fake/generated Monet, respectively, the photo discriminator output indicates whether a photo is real or a generated photo.

### Optimizers
The model utilizes the Adaptive Moment (Adam) optimization algorithm that is memory efficient in computing individual learning rates for different parameters from estimates of the first and second moments of the graidents (Kingma & Ba, 2017).

Each of the four sub-models have their own Adam optimizers to accomodate for their differences in adapted learning rate. 

### Other Utilities
Other utilities such as the Pillow library to assist with the JPEG decoding, and Draw.io for diagramming were used.

## Discussion



## Future Improvements
- Transfer learning


## Conclusion



## References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems, 27. https://proceedings.neurips.cc/paper_files/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation (No. arXiv:1505.04597). arXiv. https://doi.org/10.48550/arXiv.1505.04597

Monet CycleGAN Tutorial. (n.d.). Retrieved January 6, 2025, from https://kaggle.com/code/amyjang/monet-cyclegan-tutorial

Bioinf-jku/TTUR. (2024). [Jupyter Notebook]. Institute of Bioinformatics, Johannes Kepler University Linz. https://github.com/bioinf-jku/TTUR (Original work published 2017)

Bouthillier, X., Konda, K., Vincent, P., & Memisevic, R. (2016). Dropout as data augmentation (No. arXiv:1506.08700). arXiv. https://doi.org/10.48550/arXiv.1506.08700

Kingma, D. P., & Ba, J. (2017). Adam: A Method for Stochastic Optimization (No. arXiv:1412.6980). arXiv. https://doi.org/10.48550/arXiv.1412.6980




## Appendix

### Data Exploration

### Image data type

