The goal of this repo is:
- to help to reproduce research papers results (transfer learning setups for instance),
- to access pretrained ConvNets with a unique interface/API inspired by torchvision.
Updates specific to this fork: This repo is my own personal fork of this popular model zoo for PyTorch. Since my work focuses on action recognition in videos, I plan to accumulate standard model architectures trained on the popular video datasets such as Moments in Time, Kinetics, Something-Something, etc., as well models specifically designed for action recognition. For example, you can load 3DResNet50 pretrained on Moments in Time with the following:
model_name = 'resnet3d50'
model = pretorched.__dict__[model_name](num_classes=339, pretrained='moments')
model.eval()
Not every architecture will be trained on every dataset, but I will do the best I can to include all that I have accumulated. I will try to maintain the same API where appropriate, but may decided to make modifications to specifically handle multi-frame nature of videos.
News:
- 04/06/2018: PolyNet and PNASNet-5-Large thanks to Alex Parinov
- 16/04/2018: SE-ResNet* and SE-ResNeXt* thanks to Alex Parinov
- 09/04/2018: SENet154 thanks to Alex Parinov
- 22/03/2018: CaffeResNet101 (good for localization with FasterRCNN)
- 21/03/2018: NASNet Mobile thanks to Veronika Yurchuk and Anastasiia
- 25/01/2018: DualPathNetworks thanks to Ross Wightman, Xception thanks to T Standley, improved TransformImage API
- 30/11/2017: improve API (
model.features(input)
,model.logits(features)
,model.forward(input)
,model.last_linear
) - 16/11/2017: nasnet-a-large pretrained model ported by T. Durand and R. Cadene
- 22/07/2017: torchvision pretrained models
- 22/07/2017: momentum in inceptionv4 and inceptionresnetv2 to 0.1
- 17/07/2017: model.input_range attribute
- 17/07/2017: BNInception pretrained on Imagenet
- Installation
- Quick examples
- Few use cases
- Evaluation on ImageNet
- Documentation
- Available models
- AlexNet
- BNInception
- CaffeResNet101
- DenseNet121
- DenseNet161
- DenseNet169
- DenseNet201
- DenseNet201
- DualPathNet68
- DualPathNet92
- DualPathNet98
- DualPathNet107
- DualPathNet113
- FBResNet152
- InceptionResNetV2
- InceptionV3
- InceptionV4
- NASNet-A-Large
- NASNet-A-Mobile
- PNASNet-5-Large
- PolyNet
- ResNeXt101_32x4d
- ResNeXt101_64x4d
- ResNet101
- ResNet152
- ResNet18
- ResNet34
- ResNet50
- SENet154
- SE-ResNet50
- SE-ResNet101
- SE-ResNet152
- SE-ResNeXt50_32x4d
- SE-ResNeXt101_32x4d
- SqueezeNet1_0
- SqueezeNet1_1
- VGG11
- VGG13
- VGG16
- VGG19
- VGG11_BN
- VGG13_BN
- VGG16_BN
- VGG19_BN
- Xception
- Model API
- Available models
- Reproducing porting
git clone https://github.com/alexandonian/pretorched-x.git
cd pretorched-x
python setup.py install
- To import
pretorched
:
import pretorched
- To print the available pretorched-x models:
print(pretorched.model_names)
> ['fbresnet152', 'bninception', 'resnext101_32x4d', 'resnext101_64x4d', 'inceptionv4', 'inceptionresnetv2', 'alexnet', 'densenet121', 'densenet169', 'densenet201', 'densenet161', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'inceptionv3', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19_bn', 'vgg19', 'nasnetalarge', 'nasnetamobile', 'cafferesnet101', 'senet154', 'se_resnet50', 'se_resnet101', 'se_resnet152', 'se_resnext50_32x4d', 'se_resnext101_32x4d', 'cafferesnet101', 'polynet', 'pnasnet5large']
- To print the available pretorched-x for a chosen model:
print(pretorched.pretrained_settings['nasnetalarge'])
> {'imagenet': {'url': 'http://pretorched-x.csail.mit.edu/models/nasnetalarge-a1897284.pth', 'input_space': 'RGB', 'input_size': [3, 331, 331], 'input_range': [0, 1], 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'num_classes': 1000}, 'imagenet+background': {'url': 'http://pretorched-x.csail.mit.edu/models/nasnetalarge-a1897284.pth', 'input_space': 'RGB', 'input_size': [3, 331, 331], 'input_range': [0, 1], 'mean': [0.5, 0.5, 0.5], 'std': [0.5, 0.5, 0.5], 'num_classes': 1001}}
- To load a pretrained models from imagenet:
model_name = 'nasnetalarge' # could be fbresnet152 or inceptionresnetv2
model = pretorched.__dict__[model_name](num_classes=1000, pretrained='imagenet')
model.eval()
Note: By default, models will be downloaded to your $HOME/.torch
folder. You can modify this behavior using the $TORCH_MODEL_ZOO
variable as follow: export TORCH_MODEL_ZOO="/local/pretorched
- To load an image and do a complete forward pass:
import torch
import pretorched.utils as utils
load_img = utils.LoadImage()
# transformations depending on the model
# rescale, center crop, normalize, and others (ex: ToBGR, ToRange255)
tf_img = utils.TransformImage(model)
path_img = 'data/cat.jpg'
input_img = load_img(path_img)
input_tensor = tf_img(input_img) # 3x400x225 -> 3x299x299 size may differ
input_tensor = input_tensor.unsqueeze(0) # 3x299x299 -> 1x3x299x299
input = torch.autograd.Variable(input_tensor,
requires_grad=False)
output_logits = model(input) # 1x1000
- To extract features (beware this API is not available for all networks):
output_features = model.features(input) # 1x14x14x2048 size may differ
output_logits = model.logits(output_features) # 1x1000
- See examples/imagenet_logits.py to compute logits of classes appearance over a single image with a pretrained model on imagenet.
$ python examples/imagenet_logits.py -h
> nasnetalarge, resnet152, inceptionresnetv2, inceptionv4, ...
$ python examples/imagenet_logits.py -a nasnetalarge --path_img data/cat.png
> 'nasnetalarge': data/cat.png' is a 'tiger cat'
- See examples/imagenet_eval.py to evaluate pretrained models on imagenet valset.
$ python examples/imagenet_eval.py /local/common-data/imagenet_2012/images -a nasnetalarge -b 20 -e
> * Acc@1 92.693, Acc@5 96.13
Results were obtained using (center cropped) images of the same size than during the training process.
Model | Version | Acc@1 | Acc@5 |
---|---|---|---|
PNASNet-5-Large | Tensorflow | 82.858 | 96.182 |
PNASNet-5-Large | Our porting | 82.736 | 95.992 |
NASNet-A-Large | Tensorflow | 82.693 | 96.163 |
NASNet-A-Large | Our porting | 82.566 | 96.086 |
SENet154 | Caffe | 81.32 | 95.53 |
SENet154 | Our porting | 81.304 | 95.498 |
PolyNet | Caffe | 81.29 | 95.75 |
PolyNet | Our porting | 81.002 | 95.624 |
InceptionResNetV2 | Tensorflow | 80.4 | 95.3 |
InceptionV4 | Tensorflow | 80.2 | 95.3 |
SE-ResNeXt101_32x4d | Our porting | 80.236 | 95.028 |
SE-ResNeXt101_32x4d | Caffe | 80.19 | 95.04 |
InceptionResNetV2 | Our porting | 80.170 | 95.234 |
InceptionV4 | Our porting | 80.062 | 94.926 |
DualPathNet107_5k | Our porting | 79.746 | 94.684 |
ResNeXt101_64x4d | Torch7 | 79.6 | 94.7 |
DualPathNet131 | Our porting | 79.432 | 94.574 |
DualPathNet92_5k | Our porting | 79.400 | 94.620 |
DualPathNet98 | Our porting | 79.224 | 94.488 |
SE-ResNeXt50_32x4d | Our porting | 79.076 | 94.434 |
SE-ResNeXt50_32x4d | Caffe | 79.03 | 94.46 |
Xception | Keras | 79.000 | 94.500 |
ResNeXt101_64x4d | Our porting | 78.956 | 94.252 |
Xception | Our porting | 78.888 | 94.292 |
ResNeXt101_32x4d | Torch7 | 78.8 | 94.4 |
SE-ResNet152 | Caffe | 78.66 | 94.46 |
SE-ResNet152 | Our porting | 78.658 | 94.374 |
ResNet152 | Pytorch | 78.428 | 94.110 |
SE-ResNet101 | Our porting | 78.396 | 94.258 |
SE-ResNet101 | Caffe | 78.25 | 94.28 |
ResNeXt101_32x4d | Our porting | 78.188 | 93.886 |
FBResNet152 | Torch7 | 77.84 | 93.84 |
SE-ResNet50 | Caffe | 77.63 | 93.64 |
SE-ResNet50 | Our porting | 77.636 | 93.752 |
DenseNet161 | Pytorch | 77.560 | 93.798 |
ResNet101 | Pytorch | 77.438 | 93.672 |
FBResNet152 | Our porting | 77.386 | 93.594 |
InceptionV3 | Pytorch | 77.294 | 93.454 |
DenseNet201 | Pytorch | 77.152 | 93.548 |
DualPathNet68b_5k | Our porting | 77.034 | 93.590 |
CaffeResnet101 | Caffe | 76.400 | 92.900 |
CaffeResnet101 | Our porting | 76.200 | 92.766 |
DenseNet169 | Pytorch | 76.026 | 92.992 |
ResNet50 | Pytorch | 76.002 | 92.980 |
DualPathNet68 | Our porting | 75.868 | 92.774 |
DenseNet121 | Pytorch | 74.646 | 92.136 |
VGG19_BN | Pytorch | 74.266 | 92.066 |
NASNet-A-Mobile | Tensorflow | 74.0 | 91.6 |
NASNet-A-Mobile | Our porting | 74.080 | 91.740 |
ResNet34 | Pytorch | 73.554 | 91.456 |
BNInception | Our porting | 73.522 | 91.560 |
VGG16_BN | Pytorch | 73.518 | 91.608 |
VGG19 | Pytorch | 72.080 | 90.822 |
VGG16 | Pytorch | 71.636 | 90.354 |
VGG13_BN | Pytorch | 71.508 | 90.494 |
VGG11_BN | Pytorch | 70.452 | 89.818 |
ResNet18 | Pytorch | 70.142 | 89.274 |
VGG13 | Pytorch | 69.662 | 89.264 |
VGG11 | Pytorch | 68.970 | 88.746 |
SqueezeNet1_1 | Pytorch | 58.250 | 80.800 |
SqueezeNet1_0 | Pytorch | 58.108 | 80.428 |
Alexnet | Pytorch | 56.432 | 79.194 |
Notes:
- the Pytorch version of ResNet152 is not a porting of the Torch7 but has been retrained by facebook.
- For the PolyNet evaluation each image was resized to 378x378 without preserving the aspect ratio and then the central 331×331 patch from the resulting image was used.
Beware, the accuracy reported here is not always representative of the transferable capacity of the network on other tasks and datasets. You must try them all! :P
Please see Compute imagenet validation metrics
Source: TensorFlow Slim repo
- `nasnetalarge(num_classes=1000,
nasnetalarge(num_classes=1001, pretrained='imagenet+background')
nasnetamobile(num_classes=1000, pretrained='imagenet')
Source: Torch7 repo of FaceBook
There are a bit different from the ResNet* of torchvision. ResNet152 is currently the only one available.
fbresnet152(num_classes=1000, pretrained='imagenet')
Source: Caffe repo of KaimingHe
cafferesnet101(num_classes=1000, pretrained='imagenet')
Source: TensorFlow Slim repo and Pytorch/Vision repo for inceptionv3
inceptionresnetv2(num_classes=1000, pretrained='imagenet')
inceptionresnetv2(num_classes=1001, pretrained='imagenet+background')
inceptionv4(num_classes=1000, pretrained='imagenet')
inceptionv4(num_classes=1001, pretrained='imagenet+background')
inceptionv3(num_classes=1000, pretrained='imagenet')
Source: Trained with Caffe by Xiong Yuanjun
bninception(num_classes=1000, pretrained='imagenet')
Source: ResNeXt repo of FaceBook
resnext101_32x4d(num_classes=1000, pretrained='imagenet')
resnext101_62x4d(num_classes=1000, pretrained='imagenet')
Source: MXNET repo of Chen Yunpeng
The porting has been made possible by Ross Wightman in his PyTorch repo.
As you can see here DualPathNetworks allows you to try different scales. The default one in this repo is 0.875 meaning that the original input size is 256 before croping to 224.
dpn68(num_classes=1000, pretrained='imagenet')
dpn98(num_classes=1000, pretrained='imagenet')
dpn131(num_classes=1000, pretrained='imagenet')
dpn68b(num_classes=1000, pretrained='imagenet+5k')
dpn92(num_classes=1000, pretrained='imagenet+5k')
dpn107(num_classes=1000, pretrained='imagenet+5k')
'imagenet+5k'
means that the network has been pretrained on imagenet5k before being finetuned on imagenet1k.
Source: Keras repo
The porting has been made possible by T Standley.
xception(num_classes=1000, pretrained='imagenet')
Source: Caffe repo of Jie Hu
senet154(num_classes=1000, pretrained='imagenet')
se_resnet50(num_classes=1000, pretrained='imagenet')
se_resnet101(num_classes=1000, pretrained='imagenet')
se_resnet152(num_classes=1000, pretrained='imagenet')
se_resnext50_32x4d(num_classes=1000, pretrained='imagenet')
se_resnext101_32x4d(num_classes=1000, pretrained='imagenet')
Source: TensorFlow Slim repo
pnasnet5large(num_classes=1000, pretrained='imagenet')
pnasnet5large(num_classes=1001, pretrained='imagenet+background')
Source: Caffe repo of the CUHK Multimedia Lab
polynet(num_classes=1000, pretrained='imagenet')
Source: Pytorch/Vision repo
(inceptionv3
included in Inception*)
resnet18(num_classes=1000, pretrained='imagenet')
resnet34(num_classes=1000, pretrained='imagenet')
resnet50(num_classes=1000, pretrained='imagenet')
resnet101(num_classes=1000, pretrained='imagenet')
resnet152(num_classes=1000, pretrained='imagenet')
densenet121(num_classes=1000, pretrained='imagenet')
densenet161(num_classes=1000, pretrained='imagenet')
densenet169(num_classes=1000, pretrained='imagenet')
densenet201(num_classes=1000, pretrained='imagenet')
squeezenet1_0(num_classes=1000, pretrained='imagenet')
squeezenet1_1(num_classes=1000, pretrained='imagenet')
alexnet(num_classes=1000, pretrained='imagenet')
vgg11(num_classes=1000, pretrained='imagenet')
vgg13(num_classes=1000, pretrained='imagenet')
vgg16(num_classes=1000, pretrained='imagenet')
vgg19(num_classes=1000, pretrained='imagenet')
vgg11_bn(num_classes=1000, pretrained='imagenet')
vgg13_bn(num_classes=1000, pretrained='imagenet')
vgg16_bn(num_classes=1000, pretrained='imagenet')
vgg19_bn(num_classes=1000, pretrained='imagenet')
Once a pretrained model has been loaded, you can use it that way.
Important note: All image must be loaded using PIL
which scales the pixel values between 0 and 1.
Attribut of type list
composed of 3 numbers:
- number of color channels,
- height of the input image,
- width of the input image.
Example:
[3, 299, 299]
for inception* networks,[3, 224, 224]
for resnet* networks.
Attribut of type str
representating the color space of the image. Can be RGB
or BGR
.
Attribut of type list
composed of 2 numbers:
- min pixel value,
- max pixel value.
Example:
[0, 1]
for resnet* and inception* networks,[0, 255]
for bninception network.
Attribut of type list
composed of 3 numbers which are used to normalize the input image (substract "color-channel-wise").
Example:
[0.5, 0.5, 0.5]
for inception* networks,[0.485, 0.456, 0.406]
for resnet* networks.
Attribut of type list
composed of 3 numbers which are used to normalize the input image (divide "color-channel-wise").
Example:
[0.5, 0.5, 0.5]
for inception* networks,[0.229, 0.224, 0.225]
for resnet* networks.
/!\ work in progress (may not be available)
Method which is used to extract the features from the image.
Example when the model is loaded using fbresnet152
:
print(input_224.size()) # (1,3,224,224)
output = model.features(input_224)
print(output.size()) # (1,2048,1,1)
# print(input_448.size()) # (1,3,448,448)
output = model.features(input_448)
# print(output.size()) # (1,2048,7,7)
/!\ work in progress (may not be available)
Method which is used to classify the features from the image.
Example when the model is loaded using fbresnet152
:
output = model.features(input_224)
print(output.size()) # (1,2048, 1, 1)
output = model.logits(output)
print(output.size()) # (1,1000)
Method used to call model.features
and model.logits
. It can be overwritten as desired.
Note: A good practice is to use model.__call__
as your function of choice to forward an input to your model. See the example bellow.
# Without model.__call__
output = model.forward(input_224)
print(output.size()) # (1,1000)
# With model.__call__
output = model(input_224)
print(output.size()) # (1,1000)
Attribut of type nn.Linear
. This module is the last one to be called during the forward pass.
- Can be replaced by an adapted
nn.Linear
for fine tuning. - Can be replaced by
pretrained.utils.Identity
for features extraction.
Example when the model is loaded using fbresnet152
:
print(input_224.size()) # (1,3,224,224)
output = model.features(input_224)
print(output.size()) # (1,2048,1,1)
output = model.logits(output)
print(output.size()) # (1,1000)
# fine tuning
dim_feats = model.last_linear.in_features # =2048
nb_classes = 4
model.last_linear = nn.Linear(dim_feats, nb_classes)
output = model(input_224)
print(output.size()) # (1,4)
# features extraction
model.last_linear = pretrained.utils.Identity()
output = model(input_224)
print(output.size()) # (1,2048)
th pretrainedmodels/fbresnet/resnet152_dump.lua
python pretrainedmodels/fbresnet/resnet152_load.py
https://github.com/clcarwin/convert_torch_to_pytorch
https://github.com/alexandonian/tensorflow-model-zoo.torch
Thanks to the deep learning community and especially to the contributers of the pytorch ecosystem.