Paddle_T2I

this is a paddle repo of Generative Adversarial Text to Image Synthesis

[Paddle_T2I]

1 Introduction

This project replicates T2I_GAN, the first conditional GAN for text-to-image synthesis tasks, based on the paddlepaddle framework. given a text description, the model is able to understand the meaning of the text and synthesize a semantic image

Paper:

[1] Reed S, Akata Z, Yan X, et al. Generative adversarial text to image synthesis[C]//International Conference on Machine Learning. PMLR, 2016: 1060-1069.

Reference project：

https://github.com/aelnouby/Text-to-Image-Synthesis

Online Project:

https://aistudio.baidu.com/aistudio/projectdetail/2410937?contributionType=1

2 Accuracy

The acceptance criteria for this project is to evaluate the images generated on the Oxford-102 dataset with the human eye, so there are no specific quantitative metrics and only synthetic samples are shown

Dataset	Paddle_T2I	Text_to_Image_Synthesis
[Oxford-102]

3 Dataset

Oxford-102 This dataset was provided by text-to-image-synthesis.The dataset has been converted to hd5 format for faster reading.The datasets are downloaded and saved in: Data\
If you want to convert the data format yourself, you can follow the steps below (actually the format of the data storage is changed, the information of the data itself remains unchanged and no feature extraction is performed by the neural network).

Download dataset:flowers
Add the path to the dataset to config.yaml
Run convert_flowers_to_hd5_script.py to convert the dataset storage format

Data Organization Format

There are three subsets under the whole dataset, namely "train", "valid" and "test". Each subset contains 5 types of data (Note: the text embedding vector is provided by the author of the paper, which has been converted from string form to vector form, and this part of data is included in the above downloaded dataset)

File Namename
Imageimg
Text embeddingsembeddings
class of imageclass
text description of imagetxt

Dataset size：

Train+Validation：8192
Test：800
Number of text per image:5
Data format: flower images and the corresponding text data set of the images

4 Environment

Hardware：GPU、CPU
Framwork：
- PaddlePaddle >= 2.0.0

5 Quick start

step1:clone

git clone https://github.com/Caimthefool/Paddle_T2I.git
cd Paddle_T2I

step2:Training

python main.py --split=0

step3:Test

Save the parameters of the model in model\, then change the value of pretrain_model, and then run the following command to save the output image in the image\ directory

python main.py --validation --split=2 --pretrain_model=model/netG.pdparams

Prediction using pre training model

Place the files to be tested in the directory determined by the parameter pretrain_model, run the following command, and save the output images in the image\ directory

python main.py --validation --split=2 --pretrain_model=model/netG.pdparams

6 Code structure

6.1 structure

Because the acceptance of this project is through the human eye to observe the image, i.e. user_study, the evaluation is the same way as the prediction

├─Data                                                
├─Log                                               
├─examples                                                
├─image                                               
├─model
├─sample
|  T2IDataset.py
|  config.yaml  
│  convert_flowers_to_hd5_script.py                     
│  README.md                                            
│  README_cn.md                                         
│  discriminator.py
|  generator.py  
│  trainer.py                                           
|  main.py   
|  requirement.txt

6.2 Parameter description

Training and evaluation-related parameters can be set in main.py, as follows.

Parameters	default	description	other
config	None, Mandatory	Configuration file path
--split	0, Mandatory	Data set splitused	0 represents the training set, 1 represents the validation set, 2 represents the test set
--validation	false, Optional	Prediction and evaluation
--pretrain_model	None, Optional	Pre-trained model path

6.3 Training

python main.py --split=0

Training output

When training is executed, the output will look like the following. Each round of batch training will print the current epoch, step, and loss values.

Epoch: [1 | 600]
(1/78) Loss_D: 1.247 | Loss_G: 20.456 | D(X): 0.673 | D(G(X)): 0.415

6.4 Evaluation and Test

Our pre-trained model is already included in this repo, in the model directory

python main.py --validation --split=2 --pretrain_model=model/netG.pdparams

7 Model information

For other information about the model, please refer to the following table:

information	description
Author	weiyuan zeng
Date	2021.09
Framework version	Paddle 2.0.2
Application scenarios	Text-to-Image Synthesis
Support hardware	GPU、CPU

Log

visualdl --logdir Log --port 8080

Results

Dataset	Paddle_T2I	Text_to_Image_Synthesis
[Oxford-102]

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
Data		Data
Log		Log
examples		examples
image		image
model		model
sample		sample
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md
T2IDataset.py		T2IDataset.py
config.yaml		config.yaml
convert_flowers_to_hd5_script.py		convert_flowers_to_hd5_script.py
discriminator.py		discriminator.py
generator.py		generator.py
main.py		main.py
requirement.txt		requirement.txt
trainer.py		trainer.py

License

Caimthefool/Paddle_T2I

Folders and files

Latest commit

History

Repository files navigation