This project is mainly inspired from Generative Adversarial Text-to-Image Synthesis paper. We implemented this model using PyTorch. In this model we train a conditional generative adversarial network, conditioned on text captions, to generate images that correspond to the captions. The network architecture is shown below. This architecture is based on DCGAN.
Credits: [1]We used the hdf5 format of these datasets which can be found here for birds_hdf5 and here for flowers_hdf5. These hdf5 datasets were converted from Caltech-UCSD Birds 200 and Oxford Flowers datasets.
We used the text embeddings provided by the paper([1]) authors.
- PyTorch
- h5py
- EasyDict
- PIL
- Numpy
This implementation only supports running with GPUs.
To install all the dependencies please do:
$ pip install -r requirements.txt
To use this code for training you can:
$ git clone https://github.com/Rakshith-Manandi/text-to-image-using-GAN.git
$ cd ./text-to-image-using-GAN
$ python -u runtime.py
Inputs to the model for training/prediction:
dataset
: Dataset to use(birds | flowers)
split
: An integer indicating which split to use(0 : train | 1: valid | 2: test)
.save_path
: Path for saving the models and resultspre_trained_disc
: Discriminator pre-tranined model path used for intializing training or continuing from a checkpoint.pre_trained_gen
Generator pre-tranined model path used for intializing training or continuing from a checkpoint.cls
: Boolean flag to indicate whether to train with cls algorithms or not.
To get a glimpse of the results generated, you can:
First make sure you have installed all the dependencies, as mentioned in Requirements section. Also, make sure you have GPU access.
$ git clone https://github.com/Rakshith-Manandi/text-to-image-using-GAN.git
$ cd ./text-to-image-using-GAN
$ jupyter notebook GAN_demo.ipynb (i.e. open the 'GAN_demo.ipynb' file)
Here are a few examples of the images generated by our model:
[1] Generative Adversarial Text-to-Image Synthesis https://arxiv.org/abs/1605.05396
[2] https://github.com/reedscot/icml2016 (the authors version)