## ** Image Generation from Text **
* Converting text or high-level language (English sentences) to realistic images, ie depicting the content of the text in a picture, would be a very influential aspect in fields like training artificial intelligence . For example, a statement such as “ this girl is wearing black dress and is carrying a white purse” should give us an image of a girl in a black dress with a white purse. Another example, taken from a research paper on similar context, [reference: Generative Adversarial Text-to-Image Synthesis http://arxiv.org/abs/1605.05396]. 
* Few examples:

** The following image is generated by training on the CUB dataset of birds. **

<img src='birds1.PNG'>
    
    
** The following image is generated from the oxford flowers dataset. **

<img src='flowers.PNG'>

** Both the images above prominently had a single object. The follwing image generated based on the Microsoft COCO dataset, has multiple objects in the frame. **

<img src='coco.PNG'>

In recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations.Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories, such as faces, album covers, and room
interiors. Realistic image generation from text description could have major impact on feilds such as interactive computational graphic design, training and tuning Artifical Intelligence etc.

For this project, I will be using Deep Convolutional GANs (DCGANs). They are an improvement over vanilla GANs which were first developed and are more stable in terms of training and generate higher quality samples. The following changes were made for DCGANS

* Batch normalization is a must in both networks.

* ReLU activations.


** System Architechture **
<img src='sys1.PNG'>

### GAN for text to image conversion (StackGan)

** Step 1 : Preprocess the IMAGES AND CAPTIONS **

* In image preprocessing, the images are resized to unify the image dimensions.
 The data is divided into training and testing set.
 The images data is then converted to pickle files for training the generative model.

* The captions data is extracted and a tensorlayer vocabulary is built. The vocab list and the captions data are then pickled and saved for training the algorithm.

### Generative Adversarial networks
** Generative adversarial networks (GANs)** are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. They were introduced by Ian Goodfellow et al. in 2014.

GANS have been the most efficient in generation of content and have been modified and improved since they were developed for the first time.

GAN's evolution:(types of GANs)

* DCGANS
* Improved DCGANS
* Condtional GANs
* Info GANs
* Wasserstein GANs



## GAN Architechture


GANs are composed of two networks,
* Generator network :  A deep network generates realistic images.
* Discriminator network: A deep network distinguishes real images from computer generated images.

* Both networks are trained at the same time and compete against each other in a minmax game.

<img src='Ganoverview.PNG'>

* Step 1: Generator generates images by sampling a vector noise Z from a simple distribution (e.g. normal) and then upsampling this vector up to an image. Initially gnerated images look very noisy.

* Step 2: The discriminator is given fake and real images and learns to distinguish them. 

* Step 3: The generator later receives the “feedback” of the discriminator through a backpropagation step, becoming better at generating images.

* Result: The distribution of fake images should be as close as possible to the distribution of real images, ie the generated images should look as real as possible.

In [None]:
# improvements

## DCGAN (Deep Convolutional Generative Adversarial Networks)
* First improvement over the vanilla GAN
* Added Batch normalization and ReLU activation on both the networks

An image directly generated using a deep network while using a second discriminator network to guide the generation process.

This method trains a deep convolutional generative adversarial network (DC-GAN) conditioned on text features encoded by a hybrid character-level convolutional recurrent neural network. Both the generator network G
and the discriminator network D perform feed-forward inference conditioned on the text feature.

* ** Generator network : G : R<sup>Z</sup> x R<sup>T</sup> -> R<sup>D</sup> **
* ** Discriminator network : D : R<sup>D</sup> x R<sup>T</sup> -> {0,1} **

T: dimension of the textdescription embedding
D: dimension of the image,
Z: dimension of the noise input to G

Step 1 : encode the text query t using text encoder ϕ. The description embedding ϕ(t) is first compressed to a smaller dimension and then concatenated to the noise vector Z.

Step 2 : standard deconvolutional network follows and a synthetic image is generated

Step 2 : several layers of stride-2 convolution with spatial batch normalization followed by leaky ReLU

<img src='dcgan.PNG'>

* Steps of the algorithm
Input : minibatch images x, matching text t, mismatching,tˆ, number of training batch steps S

* 1.encode matching text descriptions
* 2.Encode mis-matching text description
* 3.Draw sample of random noise
* 4.Forward through generator, generate image
* 5.calculate score for real image and right text (image and its coresponding text)
* 6.calculate score for real image wrong text
* 7.calculate score for fake image, right text
* 8.Determine the scores of discriminator and generators
* 9.Update the discriminator/generator by determinig the gradient of D and G 's objective with respect to its parameters.

* Batch normalization is used in both the networks.

## Stacked Generative Adversarial Networks

* To generate high resolution images 


<img src='stackgan.PNG'>

StackGAN are trained in two stages,

* Initially the text description of the image is taken as input and an image of size 64x64 is produced using architecture similar to DCGANs

* Then the generated 64x64 image(downsampled to 512*16*16) and the caption of the image are passed as inputs to the StageII generator and it generates the images of size 256x256.

* In both the stages, the discriminator is trained on the caption and the corresponding generated image as input.

* In StackGan, they first trained the StageI generator, then generated the images of size 64x64 and used those images to generated 256x256 size images from StageII 

<img src ='stackgan2.PNG'>


Next Steps:
    
* Test the models on multi-object images on a larger dataset approx, 120,000 training data.

References:


https://jhui.github.io/2017/03/05/Generative-adversarial-models/

http://guimperarnau.com/blog/2017/03/Fantastic-GANs-and-where-to-find-them

https://github.com/zsdonghao/text-to-image

https://github.com/hanzhanggit/StackGAN
