Skip to content

Latest commit

 

History

History
 
 

SemSeg-distill

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Channel-wise Knowledge Distillation for Dense Prediction

Performance on the Cityscape dataset

We apply the distillation method to training the PSPNet. We used the dataset splits (train/val/test) provided here. We trained the models at a resolution of 512x512. image

Checkpoints

new_rn18-cityscape_singleAndWhole_val-75.02_test-73.86.pth [Google Drive]

new_rn18-cityscape_singleAndWhole_val-75.90_test-74.58.pth [Google Drive]

Introduction

This repository is the official implementation for Channel-wise Knowledge Distillation for Dense Prediction, ICCV 2021. The channel-wise distillation is simple and effective. We have domenstrate the effectiveness on semantic segmentation and object detection.

image

This repository contains the PyTorch implementation for the semantic segmentation experiments on Cityscapes.

To reproduce other experiments, please refer to the following links:

Experiments on Pascal VOC and Ade20K: https://github.com/pppppM/mmsegmentation-distiller

Experiments on object detection: https://github.com/pppppM/mmdetection-distiller

Requirements

All the codes are tested in the following environment:

  • Linux (tested on Ubuntu 16.04)
  • Python 3.6.2
  • PyTorch 0.4.1

Installation

  • Install PyTorch: conda install pytorch=0.4.1 cuda90 torchvision -c pytorch
  • Install other dependences: pip install opencv-python scipy
  • Install InPlace-ABN:
cd libs
sh build.sh
python build.py

The build.sh script assumes that the nvcc compiler is available in the current system search path. The CUDA kernels are compiled for sm_50, sm_52 and sm_61 by default. To change this (e.g. if you are using a Kepler GPU), please edit the CUDA_GENCODE variable in build.sh.

Dataset & Models

Please create a new folder ckpt and move all downloaded models to it.

Usage

1. Trainning with evaluation

To train a model only with channel-wise distillation on the logits map.

bash exp_cwd.sh channel kl 3 4 False False False v1

There are some fluctuations in the result, from 74.6~75.0. We report the average results in the paper.

To train a model with channel-wise distillation, GAN loss and Pixel-wise distillation.

bash exp_cwd.sh channel kl 3 4 True True False v1

Citation

Please consider citing this work if it helps your research:

@inproceedings{wang2020ifvd,
  title={Channel-wise Knowledge Distillation for Dense Prediction},
  author={Shu, Changyong and Liu, Yifan and Gao, Jianfei and Zheng, Yan and Shen, Chunhua},
  booktitle={ICCV},
  year={2021}
}

Acknowledgment

Thanks [Changyong Shu] and [Jianfei Gao] for their valuable contribution. This codebase is heavily borrowed from IFVD.