Skip to content

Mind23-2/MindCode-61

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contents

Many problems in image processing, computer graphics, and computer vision can be posed as “translating” an input image into a corresponding output image, each of these tasks has been tackled with separate, special-purpose machinery, despite the fact that the setting is always the same: predict pixels from pixels. Our goal in this paper is to develop a common framework for all these problems. Pix2pix model is a conditional GAN, which includes two modules--generator and discriminator. This model transforms an input image into a corresponding output image. The essence of the model is the mapping from pixel to pixel.

Paper: Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-Image Translation with Conditional Adversarial Networks", in CVPR 2017.

Pix2Pix Imgs

The Pix2Pix contains a generation network and a discriminant networks.In the generator part, the model can be any pixel to pixel mapping network (in the raw paper, the author proposed to use Unet). In the discriminator part, a patch GAN is used to judge whether each N*N patches is fake or true, thus can improve the reality of the generated image.

Generator(Unet-Based) architectures:

Encoder:

C64-C128-C256-C512-C512-C512-C512-C512

Decoder:

CD512-CD1024-CD1024-C1024-C1024-C512-C256-C128

Discriminator(70 × 70 discriminator) architectures:

C64-C128-C256-C512

Note: Let Ck denote a Convolution-BatchNorm-ReLU layer with k filters. CDk denotes a Convolution-BatchNorm-Dropout-ReLU layer with a dropout rate of 50%.

Dataset_1 used: facades

    Dataset size: 29M, 606 images
                  400 train images
                  100 validation images
                  106 test images
    Data format:.jpg images

Dataset_2 used: maps

    Dataset size: 239M, 2194 images
                  1096 train images
                  1098 validation images
    Data format:.jpg images

Note: We provide data/download_Pix2Pix_dataset.sh to download the datasets.

  • Python==3.8.5
  • Mindspore==1.2

The entire code structure is as following:

.Pix2Pix
├─ README.md                           # descriptions about Pix2Pix
├─ data
  └─download_Pix2Pix_dataset.sh        # download dataset
├── scripts
  └─run_infer_310.sh                   # launch ascend 310 inference
  └─run_train_ascend.sh                # launch ascend training(1 pcs)
  └─run_distribute_train_ascend.sh     # launch ascend training(8 pcs)
  └─run_eval_ascend.sh                 # launch ascend eval
  └─run_train_gpu.sh                   # launch gpu training(1 pcs)
  └─run_distribute_train_gpu.sh        # launch gpu training(8 pcs)
  └─run_eval_gpu.sh                    # launch gpu eval
├─ imgs
  └─Pix2Pix-examples.jpg               # Pix2Pix Imgs
├─ src
  ├─ __init__.py                       # init file
  ├─ dataset
    ├─ __init__.py                     # init file
    ├─ pix2pix_dataset.py              # create pix2pix dataset
  ├─ models
    ├─ __init__.py                     # init file
    ├─ discriminator_model.py          # define discriminator model——Patch GAN
    ├─ generator_model.py              # define generator model——Unet-based Generator
    ├─ init_w.py                       # initialize network weights
    ├─ loss.py                         # define losses
    └─ pix2pix.py                      # define Pix2Pix model
  └─ utils
    ├─ __init__.py                     # init file
    ├─ config.py                       # parse args
    ├─ tools.py                        # tools for Pix2Pix model
├─ eval.py                             # evaluate Pix2Pix Model
├─ train.py                            # train script
└─ export.py                           # export mindir script

Major parameters in train.py and config.py as follows:

"device_target": Ascend                     # run platform, only support Ascend.
"device_num": 1                             # device num, default is 1.
"device_id": 0                              # device id, default is 0.
"save_graphs": False                        # whether save graphs, default is False.
"init_type": normal                         # network initialization, default is normal.
"init_gain": 0.02                           # scaling factor for normal, xavier and orthogonal, default is 0.02.
"load_size": 286                            # scale images to this size, default is 286.
"batch_size": 1                             # batch_size, default is 1.
"LAMBDA_Dis": 0.5                           # weight for Discriminator Loss, default is 0.5.
"LAMBDA_GAN": 1                             # weight for GAN Loss, default is 1.
"LAMBDA_L1": 100                            # weight for L1 Loss, default is 100.
"beta1": 0.5                                # adam beta1, default is 0.5.
"beta2": 0.999                              # adam beta2, default is 0.999.
"lr": 0.0002                                # the initial learning rate, default is 0.0002.
"lr_policy": linear                         # learning rate policy, default is linear.
"epoch_num": 200                            # epoch number for training, default is 200.
"n_epochs": 100                             # number of epochs with the initial learning rate, default is 100.
"n_epochs_decay": 100                       # number of epochs with the dynamic learning rate, default is 100.
"dataset_size": 400                         # for Facade_dataset,the number is 400; for Maps_dataset,the number is 1096.
"train_data_dir": None                      # the file path of input data during training.
"val_data_dir": None                        # the file path of input data during validating.
"train_fakeimg_dir": ./results/fake_img/    # during training, the file path of stored fake img.
"loss_show_dir": ./results/loss_show        # during training, the file path of stored loss img.
"ckpt_dir": ./results/ckpt                  # during training, the file path of stored CKPT.
"ckpt": None                                # during validating, the file path of the CKPT used.
"predict_dir": ./results/predict/           # during validating, the file path of Generated images.
  • running on Ascend with default parameters
python train.py --device_target [Ascend] --device_id [0] --train_data_dir [./data/facades/train]
  • running distributed trainning on Ascend with fixed parameters
bash run_distribute_train_ascend.sh [DEVICE_NUM] [DISTRIBUTE] [RANK_TABLE_FILE] [DATASET_PATH] [DATASET_NAME]
  • running on GPU with fixed parameters
python train.py --device_target [GPU] --run_distribute [1] --device_num [8] --dataset_size 400 --train_data_dir [./data/facades/train] --pad_mode REFLECT
OR
bash scripts/run_train_gpu.sh [DATASET_PATH] [DATASET_NAME]
  • running distributed trainning on GPU with fixed parameters
bash run_distribute_train_gpu.sh [DATASET_PATH] [DATASET_NAME] [DEVICE_NUM]
  • running on Ascend
python eval.py --device_target [Ascend] --device_id [0] --val_data_dir [./data/facades/test] --ckpt [./results/ckpt/Generator_200.ckpt] --pad_mode REFLECT
OR
bash scripts/run_eval_ascend.sh [DATASET_PATH] [DATASET_NAME] [CKPT_PATH] [RESULT_DIR]
  • running on GPU
python eval.py --device_target [GPU] --device_id [0] --val_data_dir [./data/facades/test] --ckpt [./train/results/ckpt/Generator_200.ckpt] --predict_dir [./train/results/predict/] \
--dataset_size 1096 --pad_mode REFLECT
OR
bash scripts/run_eval_gpu.sh [DATASET_PATH] [DATASET_NAME] [CKPT_PATH] [RESULT_PATH]

Note:: Before training and evaluating, create folders like "./results/...". Then you will get the results as following in "./results/predict".

bash run_infer_310.sh [The path of the MINDIR for 310 infer] [The path of the dataset for 310 infer] y Ascend 0

Note:: Before executing 310 infer, create the MINDIR/AIR model using "python export.py --ckpt [The path of the CKPT for exporting] --train_data_dir [The path of the training dataset]".

Training Performance on single device

Parameters single Ascend single GPU
Model Version Pix2Pix Pix2Pix
Resource Ascend 910 PCIE V100-32G
MindSpore Version 1.2 1.3.0
Dataset facades facades
Training Parameters epoch=200, steps=400, batch_size=1, lr=0.0002 epoch=200, steps=400, batch_size=1, lr=0.0002, pad_mode=REFLECT
Optimizer Adam Adam
Loss Function SigmoidCrossEntropyWithLogits Loss & L1 Loss SigmoidCrossEntropyWithLogits Loss & L1 Loss
outputs probability probability
Speed 1pc(Ascend): 10 ms/step 1pc(GPU): 40 ms/step
Total time 1pc(Ascend): 0.3h 1pc(GPU): 0.8 h
Checkpoint for Fine tuning 207M (.ckpt file) 207M (.ckpt file)
Parameters single Ascend single GPU
Model Version Pix2Pix Pix2Pix
Resource Ascend 910
MindSpore Version 1.2 1.3.0
Dataset maps maps
Training Parameters epoch=200, steps=1096, batch_size=1, lr=0.0002 epoch=200, steps=400, batch_size=1, lr=0.0002, pad_mode=REFLECT
Optimizer Adam Adam
Loss Function SigmoidCrossEntropyWithLogits Loss & L1 Loss SigmoidCrossEntropyWithLogits Loss & L1 Loss
outputs probability probability
Speed 1pc(Ascend): 20 ms/step 1pc(GPU): 90 ms/step
Total time 1pc(Ascend): 1.58h 1pc(GPU): 3.3h
Checkpoint for Fine tuning 207M (.ckpt file) 207M (.ckpt file)

Distributed Training Performance

Parameters Ascend (8pcs) GPU (8pcs)
Model Version Pix2Pix Pix2Pix
Resource Ascend 910 PCIE V100-32G
MindSpore Version 1.4.1 1.3.0
Dataset facades facades
Training Parameters epoch=200, steps=400, batch_size=1, lr=0.0002 epoch=200, steps=400, batch_size=1, lr=0.0002, pad_mode=REFLECT
Optimizer Adam Adam
Loss Function SigmoidCrossEntropyWithLogits Loss & L1 Loss SigmoidCrossEntropyWithLogits Loss & L1 Loss
outputs probability probability
Speed 8pc(Ascend): 15 ms/step 8pc(GPU): 30 ms/step
Total time 8pc(Ascend): 0.5h 8pc(GPU): 1 h
Checkpoint for Fine tuning 207M (.ckpt file) 207M (.ckpt file)
Parameters Ascend (8pcs) GPU (8pcs)
Model Version Pix2Pix Pix2Pix
Resource Ascend 910 PCIE V100-32G
MindSpore Version 1.4.1 1.3.0
Dataset maps maps
Training Parameters epoch=200, steps=1096, batch_size=1, lr=0.0002 epoch=200, steps=400, batch_size=1, lr=0.0002, pad_mode=REFLECT
Optimizer Adam Adam
Loss Function SigmoidCross55EntropyWithLogits Loss & L1 Loss SigmoidCrossEntropyWithLogits Loss & L1 Loss
outputs probability probability
Speed 8pc(Ascend): 20 ms/step 8pc(GPU): 40 ms/step
Total time 8pc(Ascend): 1.2h 8pc(GPU): 2.8h
Checkpoint for Fine tuning 207M (.ckpt file) 207M (.ckpt file)

Evaluation Performance

Parameters single Ascend single GPU
Model Version Pix2Pix Pix2Pix
Resource Ascend 910 PCIE V100-32G
MindSpore Version 1.2 1.3.0
Dataset facades / maps facades / maps
batch_size 1 1
outputs probability probability

Please check the official homepage.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published