WideResNet

WideResNet Description

Description

Szagoruyko proposed WideResNet on the basis of ResNet, which is used to solve the problem of deep and thin network models. Only a limited number of layers have learned useful knowledge, and more layers have made little contribution to the final result. This problem is also called diminishing feature reuse. The authors of WideResNet widened the residual block, which increased the training speed by several times, and the accuracy was also significantly improved.

Just like a ResNet - WideResNet network is not a network with any particular architecture, but an example of the idea of wide residual networks. So there is a group of networks called "wideresnet". But unlike ResNet, WideResNets differ by two numbers (not just one). The first number is the number of layers, as in resnet, and the second number is the "widening factor" and shows how many times the blocks of this network are wider than the same blocks in ResNet.

These is example of training WideResNet-40-10 (40 layers and 10 times wider) with CIFAR-10 dataset in MindSpore.

Paper

1.[paper] Wide Residual Networks: Sergey Zagoruyko, Nikos Komodakis,

Model Architecture

The overall network architecture of WideResNet is shown below: paper

Dataset

Dataset used: CIFAR-10

Dataset size：60,000 32*32 colorful images in 10 classes
- Train：50,000 images
- Test： 10,000 images
Data format：binary files
- Note：Data will be processed in dataset.py
Download the dataset, the directory structure is as follows:

├─cifar-10-batches-bin
│
└─cifar-10-verify-bin

Environment Requirements

Hardware（Ascend）
- Prepare hardware environment with Ascend.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Quick Start

After installing MindSpore via the official website, you can start training and evaluation as follows:

Running on Ascend

# Distributed training
 usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [CONFIG_PATH] [EXPERIMENT_LABEL]
 [DATASET_PATH] is the path of the dataset.
.

# Standalone training
 usage: bash run_standalone_train.sh [DATASET_PATH] [CONFIG_PATH] [EXPERIMENT_LABEL]
[ DATASET_PATH] is the path of the data set.


# Run evaluation example
 usage:bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH] [CONFIG_PATH]
[ DATASET_PATH] is the path of the data set.
[ CHECKPOINT_PATH] The trained ckpt file.

Script Description

Script and Sample Code

.
└──WideResNet
  ├── requirements.txt
  ├── README.md
  ├── config                               # parameter configuration
    ├── wideresnet_cifar10_config.yaml
  ├── scripts
    ├── run_distribute_train.sh            # launch ascend distributed training(8 pcs)
    ├── run_standalone_train.sh            # launch ascend standalone training(1 pcs)
    ├── run_eval.sh                        # launch ascend evaluation
    └── cache_util.sh                      # a collection of helper functions to manage cache
  ├── src
    ├── dataset.py                         # data preprocessing
    ├── callbacks.py                       # evaluation and save callbacks
    ├── cross_entropy_smooth.py            # loss definition for ImageNet2012 dataset
    ├── generator_lr.py                    # generate learning rate for each step
    ├── wide_resnet.py                     # wide_resnet backbone  
    ├── model_utils
       └── config.py                       # parameter configuration
  ├── export.py                            # Ascend 910 export network
  ├── eval.py                              # eval net
  └── train.py                             # train net

Script Parameters

Parameters for both training and evaluation can be set in config file.

Config for WideResNet-40-10, CIFAR-10 dataset

"num_classes" : 10 ,                 # Number of data set classes
"batch_size" : 32 ,                  # Input tensor batch size
"epoch_size" : 300 ,                 # Training period size
"save_checkpoint_path" : "./" ,      # Checkpoint relative execution path Jin’s save path
"repeat_num" : 1 ,                   # number of repetitions of data set
"widen_factor" : 10 ,                # network width
"depth" : 40 ,                       # network depth
"lr_init" : 0.1 ,                    # initial learning rate
"weight_decay" : 5e-4 ,             # Weight decay
"momentum" :0.9 ,                   # Momentum optimizer
"loss_scale" : 32 ,                  # Loss level
"save_checkpoint" : False ,         # Whether to save checkpoints during training
"save_checkpoint_epochs" : 5 ,       # Period interval between two checkpoints; by default, the last check Points will be saved after the last cycle is completed
"use_label_smooth" : True ,          # label smoothing
"label_smooth_factor" : 0.1 ,        # label smoothing factor
"pretrain_epoch_size" : 0 ,          # pretrain Training period
"warmup_epochs" :5,               # Warm-up cycle

Training Process

Usage

Running on Ascend

# Distributed training
 usage: bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [CONFIG_PATH] [LABEL]
[ DATASET_PATH] is the path of the dataset.


# Standalone training
 usage: bash bash run_standalone_train.sh [DATASET_PATH] [CONFIG_PATH] [LABEL]
[ DATASET_PATH] is the path of the data set.

For distributed training, a hccl configuration file with JSON format needs to be created in advance.

Please follow the instructions in the link hccn_tools.

Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". Under this, you can find checkpoint file together with result like the following in log.

If you want to change device_id for standalone training, you can set environment variable export DEVICE_ID=x or set device_id=x in context.

Evaluation while training

# distributed training Ascend with evaluation example:
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [CONFIG_PATH] [LABEL] [RUN_EVAL] [EVAL_DATASET_PATH]

# standalone training Ascend with evaluation example:
bash run_standalone_train.sh [DATASET_PATH] [CONFIG_PATH] [LABEL] [RUN_EVAL] [EVAL_DATASET_PATH]

RUN_EVAL and EVAL_DATASET_PATH are optional arguments, setting RUN_EVAL=True allows you to do evaluation while training. When RUN_EVAL is set, EVAL_DATASET_PATH must also be set. And you can also set these optional arguments: save_best_ckpt, eval_start_epoch, eval_interval for python script when RUN_EVAL is True.

By default, a standalone cache server would be started to cache all eval images in tensor format in memory to improve the evaluation performance. Please make sure the dataset fits in memory (Around 30GB of memory required for ImageNet2012 eval dataset, 6GB of memory required for CIFAR-10 eval dataset).

Users can choose to shutdown the cache server after training or leave it alone for future usage.

Resume Process

Usage

Running on Ascend

# distributed training
Usage：bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [CONFIG_PATH] [EXPERIMENT_LABEL] [PRETRAINED_CKPT_PATH]

# standalone training
Usage：bash run_standalone_train.sh [DATASET_PATH] [CONFIG_PATH] [EXPERIMENT_LABEL] [PRETRAINED_CKPT_PATH]

Result

Training WideResNet-40-10 with CIFAR-10 dataset

# distribute training result(8 pcs)
epoch: 1 step: 5, loss is 2.3153763
epoch: 1 step: 5, loss is 2.274118
epoch: 1 step: 5, loss is 2.2663743
epoch: 1 step: 5, loss is 2.324574
epoch: 1 step: 5, loss is 2.253627
epoch: 1 step: 5, loss is 2.2363935
epoch: 1 step: 5, loss is 2.3112013
epoch: 1 step: 5, loss is 2.252127
...

Evaluation Process

Usage

Running on Ascend

# Evaluation
 Usage: bash run_eval.sh [DATASET_PATH] [CONFIG_PATH] [CHECKPOINT_PATH]
[ DATASET_PATH] is the path of the data set.
[ CHECKPOINT_PATH] The trained ckpt file.

# Evaluation example
 bash run_eval.sh /cifar10  ../config/wideresnet.yaml WideResNet_best.ckpt

checkpoint can be produced in training process.

Result

Evaluation result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the following in log.

Evaluating WideResNet-40-10 with CIFAR-10 dataset

result: {'top_1_accuracy': 0.961738782051282}

Ascend310 reasoning process

Export MindIR

python export.py --ckpt_file [CKPT_PATH] --file_format [FILE_FORMAT] --device_id [0]

[ CKPT_PATH] is the ckpt file saved after training

The parameter ckpt_file is required and file_formatmust be selected in ["AIR", "MINDIR"].

Perform inference on Ascend310

Before performing inference, the mindir file must be export.pyexported through a script. The following shows an example of using the mindir model to perform inference.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET_PATH] [DEVICE_ID]

MINDIR_PATH mindir file path
DATASET_PATH Inference data set path
DEVICE_ID Optional, the default value is 0.

Evaluation Performance

WideResNet on CIFAR-10

Parameters	Ascend 910
Model Version	WideResNet-40-10
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G
uploaded Date	02/25/2021 (month/day/year)
MindSpore Version	1.1.1
Dataset	CIFAR-10
Training Parameters	epoch=300, steps per epoch=195, batch_size = 32
Optimizer	Momentum
Loss Function	Softmax Cross Entropy
outputs	probability
Loss	0.545541
Speed	65.2 ms/step (8 cards)）
Total time	70 minutes
Parameters (M)	52.1
Checkpoint for Fine tuning	426.49M (.ckpt file)
Scripts	Link

Description of Random Situation

In dataset.py, we set the seed inside "create_dataset" function. We also use random seed in train.py.

ModelZoo Homepage

Please check the official homepage.

FAQ

Refer to the ModelZoo FAQ for some common question.

Q: What should I do if memory overflow occurs when using PYNATIVE_MODE?

A: The memory overflow is usually because PYNATIVE_MODE requires more memory. Set the batch size to 16 to reduce memory consumption and allow network training.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ascend310_infer		ascend310_infer
config		config
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
eval.py		eval.py
export.py		export.py
postprocess.py		postprocess.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

License

alililia/wideresnet_Ascend

Folders and files

Latest commit

History

Repository files navigation

WideResNet

Contents

Description

Paper

Usage

Running on Ascend

Evaluation while training

Usage

Running on Ascend

Result

Usage

Running on Ascend

Result

Evaluation Performance

WideResNet on CIFAR-10

FAQ

About

Resources

License

Stars

Watchers

Forks

Languages