Implementation of Mask R-CNN in Chainer
Jul 24, 2018


Chainer implementation of Mask R-CNN - the multi-task network for object detection, object classification, and instance segmentation. (

What's New

  • Training result for R-50-C4 model has been evaluated!
  • COCO box AP = 0.346 using our trainer (0.355 with official boxes)
  • COCO mask AP = 0.287 using our trainer (0.314 with official boxes)


  • to be updated


  • Chainer
  • Chainercv
  • Cupy
    (operable if your environment can run chainer > v3 with cuda and cudnn.)
    (verified as operable: chainer==3.1.0, chainercv==0.7.0, cupy==1.0.3)
$ pip install chainer   
$ pip install chainercv
$ pip install cupy
  • Python 3.0+
  • NumPy
  • Matplotlib
  • OpenCV


  • Precision Evaluator (bbox, COCO metric)
  • Detectron Model Parser
  • Modify ROIAlign
  • Mask inference using refined ROIs
  • Precision Evaluator (mask, COCO metric)
  • Improve segmentation AP for R-50-C4 model
  • Feature Pyramid Network (R-50-FPN)
  • Keypoint Detection (R-50-FPN, Keypoints)

Benchmark Results

Box AP 50:95 Segm AP 50:95
Ours (1 GPU) 0.346 0.287
Detectron model 0.350 0.295
Detectron caffe2 0.355 0.314

Inference with Pretrained Models

python utils/
  • And the converted model file is saved in modelfiles
  • Run the demo:
python --bn2affine --modelfile modelfiles/e2e_mask_rcnn_R-50-C4_1x_d2c.npz --image <input image>

Prerequisites for training

  • Download 'ResNet-50-model.caffemodel' from the "OneDrive download" of ResNet pretrained models for model initialization and place it in ~/.chainer/dataset/pfnet/chainer/models/

  • COCO 2017 dataset : the COCO dataset can be downloaded and unzipped by:


Setup the COCO API:

git clone
cd coco/PythonAPI/
python install
cd ../../

note: the official coco repository is not python3 compatible.
Use the repository above in order to run our evaluation.



arguments and the default conditions are defined as follows:

'--dataset', choices=('coco2017'), default='coco2017'   
'--extractor', choices=('resnet50','resnet101'), default='resnet50', help='extractor network'
'--gpu', '-g', type=int, default=0   
'--lr', '-l', type=float, default=1e-4   
'--batchsize', '-b', type=int, default=8   
'--freeze_bn', action='store_true', default=False, help='freeze batchnorm gamma/beta'
'--bn2affine', action='store_true', default=False, help='batchnorm to affine'
'--out', '-o', default='result',  help='output directory'   
'--seed', '-s', type=int, default=0   
'--roialign', action='store_true', default=True, help='True: ROIAlign, False: ROIpooling'
'--step_size', '-ss', type=int, default=400000  
'--lr_step', '-ls', type=int, default=480000    
'--lr_initialchange', '-li', type=int, default=800     
'--pretrained', '-p', type=str, default='imagenet'   
'--snapshot', type=int, default=4000   
'--validation', type=int, default=30000   
'--resume', type=str   
'--iteration', '-i', type=int, default=800000   
'--roi_size', '-r', type=int, default=14, help='ROI size for mask head input'
'--gamma', type=float, default=1, help='mask loss balancing factor'   

note that we use a subdivision-based updater to enable training with large batch size.


Segment the objects in the input image by executing:

python --image <input image> --modelfile result/snapshot_model.npz --contour


Evaluate the trained model with COCO metric (bounding box, segmentation) :

python --lr 0 --iteration 1 --validation 1 --resume <trained_model> 


