This folder contains an example implementation for Fully Convolutional Networks (FCN) in MXNet.
The example is based on the FCN paper by long et al. of UC Berkeley.
We have trained a simple fcn-xs model, the hyper-parameters are below:
when using the newest mxnet, you'd better using larger learning rate, such as 1e-4, 1e-5, 1e-6 instead, because the newest mxnet will do gradient normalization in SoftmaxOutput)
The training dataset size is only 2027, and the validation dataset size is 462.
Training the model
Step 1: setup pre-requisites
- Install python package
pip install --upgrade Pillow
- Setup your working directory. Assume your working directory is
~/train_fcn_xs, and MXNet is built as
~/mxnet. Copy example scripts into the working directory.
cp ~/mxnet/example/fcn-xs/* .
Step 2: Download the vgg16fc model and training data
- vgg16fc model: you can download the
VGG_FC_ILSVRC_16_layers-0074.paramsfrom baidu yun, dropbox.
this is the fully convolution style of the origin VGG_ILSVRC_16_layers.caffemodel, and the corresponding VGG_ILSVRC_16_layers_deploy.prototxt, the vgg16 model has license for non-commercial use only.
- Training data: download the
VOC2012.rarrobots.ox.ac.uk, and extract it into
- Mapping files: download
val.lstfrom baidu yun into the
Once you completed all these steps, your working directory should contain a
.\VOC2012 directory, which contains the following:
Step 3: Train the fcn-xs model
- Based on your hardware, configure CPU or GPU for training by parameter
--gpu. It is recommended to use GPU due to the computational complexity and data load. View parameters we can use with the following command.
python fcn_xs.py -h usage: fcn_xs.py [-h] [--model MODEL] [--prefix PREFIX] [--epoch EPOCH] [--init-type INIT_TYPE] [--retrain] [--gpu GPU] Convert vgg16 model to vgg16fc model. optional arguments: -h, --help show this help message and exit --model MODEL The type of fcn-xs model, e.g. fcnxs, fcn16s, fcn8s. --prefix PREFIX The prefix(include path) of vgg16 model with mxnet format. --epoch EPOCH The epoch number of vgg16 model. --init-type INIT_TYPE the init type of fcn-xs model, e.g. vgg16, fcnxs --retrain true means continue training. --gpu GPU 0 to use GPU, not set to use CPU
- It is recommended to train fcn-32s and fcn-16s before training the fcn-8s model
To train the fcn-32s model, run the following:
python -u fcn_xs.py --model=fcn32s --prefix=VGG_FC_ILSVRC_16_layers --epoch=74 --init-type=vgg16
- In the fcn_xs.py, you may need to change the directory
flist_name, ``fcnxs_model_prefix``` for your own data.
- When you train fcn-16s or fcn-8s model, you should change the code in
run_fcnxs.shcorresponding, such as when train fcn-16s, comment out the fcn32s script, then it will like this:
python -u fcn_xs.py --model=fcn16s --prefix=FCN32s_VGG16 --epoch=31 --init-type=fcnxs
- The output log may look like this(when training fcn-8s):
INFO:root:Start training with gpu(3) INFO:root:Epoch Batch  Speed: 1.16 samples/sec Train-accuracy=0.894318 INFO:root:Epoch Batch  Speed: 1.11 samples/sec Train-accuracy=0.904681 INFO:root:Epoch Batch  Speed: 1.13 samples/sec Train-accuracy=0.908053 INFO:root:Epoch Batch  Speed: 1.12 samples/sec Train-accuracy=0.912219 INFO:root:Epoch Batch  Speed: 1.13 samples/sec Train-accuracy=0.914238 INFO:root:Epoch Batch  Speed: 1.13 samples/sec Train-accuracy=0.912170 INFO:root:Epoch Batch  Speed: 1.12 samples/sec Train-accuracy=0.912080
Using the pre-trained model for image segmentation
To try out the pre-trained model, follow these steps:
- Download the pre-trained symbol and weights from yun.baidu. You should download these files:
- Run the segmentation script, providing it your input image path:
python image_segmentaion.py --input <your JPG image path>
- The segmented output
.pngfile will be generated in the working directory
- This example runs full image size training, so there is no need to resize or crop input images to the same size. Accordingly, batch_size during training is set to 1.
- The fcn-xs model is based on vgg16 model, with some crop, deconv, element-sum layer added, so the model is quite big, moreover, the example is using whole image size training, if the input image is large(such as 700*500), then memory consumption may be high. Due to that, I suggest you use GPU with at least 12GB memory for training.
- If you don't have access to GPU with 12GB memory for training, I suggest you change the
cut_off_sizeto a small value when constructing the FileIter, example below:
train_dataiter = FileIter( root_dir = "./VOC2012", flist_name = "train.lst", cut_off_size = 400, rgb_mean = (123.68, 116.779, 103.939), )