Skip to content

Commit

Permalink
add code and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
csrhddlam committed Dec 10, 2018
1 parent 19d03be commit fd494cd
Show file tree
Hide file tree
Showing 13 changed files with 3,002 additions and 1 deletion.
90 changes: 89 additions & 1 deletion README.md
@@ -1 +1,89 @@
# elastic
# Elastic
This repo contains a PyTorch implementation of Elastic. It is compatible with PyTorch 1.0-stable, PyTorch 1.0-preview and PyTorch 0.4.1. All released models are exactly the models evaluated in the paper.
## ImageNet
We prepare our data following https://github.com/pytorch/examples/tree/master/imagenet

Pretrained models available at
```
for a in resnext50 resnext50_elastic resnext101 resnext101_elastic dla60x dla60x_elastic dla102x se_resnext50_elastic densenet201 densenet201_elastic; do
wget http://ai2-vision.s3.amazonaws.com/elastic/imagenet_models/"$a".pth.tar
done
```
### Testing
```
python classify.py /path/to/imagenet/ --evaluate --resume /path/to/model.pth.tar
```
### Training
```
python classify.py /path/to/imagenet/
```
### Multi-processing distributed training in Docker (recommended):
We train all the models in docker containers: https://docs.nvidia.com/deeplearning/dgx/pytorch-release-notes/rel_18.07.html

You may need to follow instructions in the link above to install [docker](https://www.docker.com/) and [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) if you haven't done so.

After pulling the docker image, we run a docker container:
```
nvidia-docker run -it -e NVIDIA_VISIBLE_DEVICES=0,1 --ipc=host --rm -v /path/to/code:/path/to/code -v /path/to/imagenet:/path/to/imagenet nvcr.io/nvidia/pytorch:18.07-py3
```
Then run this training script inside the docker container.
```
python -m apex.parallel.multiproc docker_classify.py /path/to/imagenet
```
## MSCOCO
We extract data into this structure and use python cocoapi to load data: https://github.com/cocodataset/cocoapi
```
/path/to/mscoco/annotations/instances_train2014.json
/path/to/mscoco/annotations/instances_val2014.json
/path/to/mscoco/train2014
/path/to/mscoco/val2014
```
Pretrained models available at
```
for a in resnext50 resnext50_elastic resnext101 resnext101_elastic dla60x dla60x_elastic densenet201 densenet201_elastic; do
wget http://ai2-vision.s3.amazonaws.com/elastic/coco_models/coco_"$a".pth.tar
done
```
### Testing
```
python multilabel_classify.py /path/to/mscoco --resume /path/to/model.pth.tar --evaluate
```
### Finetuning or resume training
```
python multilabel_classify.py /path/to/mscoco --resume /path/to/model.pth.tar
```
## PASCAL VOC semantic segmentation
We prepare PASCAL VOC data following https://github.com/chenxi116/DeepLabv3.pytorch

Pretrained models available at
```
for a in resnext50 resnext50_elastic resnext101 resnext101_elastic dla60x dla60x_elastic; do
wget http://ai2-vision.s3.amazonaws.com/elastic/pascal_models/deeplab_"$a"_pascal_v3_original_epoch50.pth
done
```
### Testing
Models should be put at data/deeplab_*.pth
```
CUDA_VISIBLE_DEVICES=0 python segment.py --exp original
```
### Finetuning or resume training
All PASCAL VOC semantic segmentation models are trained on one GPU.
```
CUDA_VISIBLE_DEVICES=0 python segment.py --exp my_exp --train --resume /path/to/model.pth.tar
```
## Note
Distributed training maintains batchnorm statistics on each GPU/worker/process without synchronization, which leads to different performances on different GPUs. At the end of each epoch, our distributed script reports averaged performance (top-1, top-5) by evaluating the whole validation set on all GPUs, and saves the model on the first GPU (throws away models on other GPUs). As a result, evaluating the saved model after training leads to slightly (<0.1%) different (could be either better or worse) numbers. In the paper, we reported the average performances for all models. Averaging batchnorm statistics before evaluation may lead to marginally better numbers.

## Credits
ImageNet training script is modified from https://github.com/pytorch/pytorch

ImageNet distributed training script is modified from https://github.com/NVIDIA/apex

Pascal segmentation code is modified from https://github.com/chenxi116/DeepLabv3.pytorch

ResNext model is modified form https://github.com/last-one/tools

DLA models are modified from https://github.com/ucbdrive/dla

DenseNet model is modified from https://github.com/csrhddlam/pytorch-checkpoint

0 comments on commit fd494cd

Please sign in to comment.