## Intel / Nervana Systems Distiller Models

https://nervanasystems.github.io/distiller/model_zoo.html#distiller-model-zoo

Code adapted from:

https://github.com/NervanaSystems/distiller/blob/master/jupyter/alexnet_insights.ipynb


### Learning Structured Sparsity in Deep Neural Networks  (SSL)

  ResNet20 models
   
###   

In [25]:
# Suppress the powerlaw package warnings
# "powerlaw.py:700: RuntimeWarning: divide by zero encountered in true_divide"
# "powerlaw.py:700: RuntimeWarning: invalid value encountered in true_divide"
import warnings
warnings.simplefilter(action='ignore', category=RuntimeWarning)

In [104]:
import weightwatcher as ww
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Load some common jupyter code
%run './distiller_jupyter_helpers.ipynb'
from distiller.models import create_model
from distiller.apputils import *
import qgrid

from ipywidgets import *
from bqplot import *
import bqplot.pyplot as bqplt
from functools import partial

from sklearn.decomposition import TruncatedSVD

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  

In [45]:
DISTILLER_DIR = "/Users/charleshmartin/work/distiller/"
SSL_CHECKPOINTS_DIR = DISTILLER_DIR+"examples/ssl/checkpoints/"

In [28]:
ls $CHECKPOINTS_DIR

checkpoint_trained_4D_regularized_5Lremoved.pth.tar
checkpoint_trained_4D_regularized_5Lremoved_finetuned.pth.tar
checkpoint_trained_ch_regularized_dense.pth.tar
checkpoint_trained_channel_regularized_resnet20.pth.tar
checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar
checkpoint_trained_dense.pth.tar


Load the checkpoint captured after one pruning event, and fine-tuning for one epoch:

## Resnet20

In [30]:
cpfiles = {
    'checkpoint_trained_4D_regularized_5Lremoved.pth.tar': 90.620,
    'checkpoint_trained_4D_regularized_5Lremoved_finetuned.pth.tar': 94.240,
    'checkpoint_trained_ch_regularized_dense.pth.tar': 91.700,
    'checkpoint_trained_channel_regularized_resnet20.pth.tar': 91.420,
    'checkpoint_trained_channel_regularized_resnet20_finetuned.pth.tar': 91.420,
    'checkpoint_trained_dense.pth.tar': 92.540,
}

'checkpoint_trained_4D_regularized_5Lremoved.pth.tar'

In [39]:
resnet20_model = create_model(False, 'cifar10', 'resnet20_cifar', parallel=True)
checkpoint_file = SSL_CHECKPOINTS_DIR+list(cpfiles.keys())[0]
print(checkpoint_file)

try:
    load_checkpoint(resnet20_model, checkpoint_file);
except Exception as e:
    print("Did you forget to download the checkpoint file?")
    raise e

INFO:root:==> using cifar10 dataset
INFO:root:=> creating resnet20_cifar model for CIFAR10
INFO:root:=> loading checkpoint /Users/charleshmartin/work/distiller/examples/ssl/checkpoints/checkpoint_trained_4D_regularized_5Lremoved.pth.tar
INFO:root:   best top@1: 90.620
INFO:root:Loaded compression schedule from checkpoint (epoch 179)
INFO:root:=> loaded checkpoint '/Users/charleshmartin/work/distiller/examples/ssl/checkpoints/checkpoint_trained_4D_regularized_5Lremoved.pth.tar' (epoch 179)


/Users/charleshmartin/work/distiller/examples/ssl/checkpoints/checkpoint_trained_4D_regularized_5Lremoved.pth.tar


In [41]:
ww = ww.WeightWatcher()

2019-04-12 18:26:49,682 INFO 
WeightWatcher v0.1.2 by Calculation Consulting
Analyze weight matrices of Deep Neural Networks
https://calculationconsulting.com/
python      version 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.15.3
tensforflow version 1.10.1
keras       version 2.2.2
INFO:weightwatcher.weightwatcher:
WeightWatcher v0.1.2 by Calculation Consulting
Analyze weight matrices of Deep Neural Networks
https://calculationconsulting.com/
python      version 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.15.3
tensforflow version 1.10.1
keras       version 2.2.2


### Show that 5L removed layers are not analyzed

see: https://nervanasystems.github.io/distiller/model_zoo.html#distiller-model-zoo

<pre>
Removing layer: module.layer1.0.conv1 [layer=0 block=0 conv=0]
Removing layer: module.layer1.0.conv2 [layer=0 block=0 conv=1]
Removing layer: module.layer1.1.conv1 [layer=0 block=1 conv=0]
Removing layer: module.layer1.1.conv2 [layer=0 block=1 conv=1]
Removing layer: module.layer2.2.conv2 [layer=1 block=2 conv=1]

</pre>

I think most of these are so small that the WW does not consider them anyway

In [43]:
ww.analyze(resnet20_model)

2019-04-12 18:27:57,533 INFO Analyzing model
INFO:weightwatcher.weightwatcher:Analyzing model
2019-04-12 18:27:57,546 INFO ### Printing results ###
INFO:weightwatcher.weightwatcher:### Printing results ###
2019-04-12 18:27:57,548 INFO LogNorm: min: -0.13694187998771667, max: 0.8701463937759399, avg: 0.5175516605377197
INFO:weightwatcher.weightwatcher:LogNorm: min: -0.13694187998771667, max: 0.8701463937759399, avg: 0.5175516605377197
2019-04-12 18:27:57,550 INFO LogNorm compound: min: -0.08011464857392842, max: 0.8399592306878831, avg: 0.5175515964627266
INFO:weightwatcher.weightwatcher:LogNorm compound: min: -0.08011464857392842, max: 0.8399592306878831, avg: 0.5175515964627266


{0: {'id': 0, 'type': ResNetCifar(
    (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace)
    (layer1): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
      )
      (1): BasicBlock(
        (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv2): Conv2d(16, 16, kern

##  Iterative Pruning


### Compare comptessed models to PyTorch AlexNet and Resnet18


### AlexNet Iterative Pruning

alexnet.checkpoint.89.pth.tar

https://s3-us-west-1.amazonaws.com/nndistiller/sensitivity-pruning/alexnet.checkpoint.89.pth.tar


Our reference is TorchVision's pretrained Alexnet model which has a Top1 accuracy of 56.55 and Top5=79.09. 

We prune away 88.44% of the parameters and achieve Top1=56.61 and Top5=79.45. 
- 

In [46]:
!wget https://s3-us-west-1.amazonaws.com/nndistiller/sensitivity-pruning/alexnet.checkpoint.89.pth.tar

--2019-04-12 19:19:21--  https://s3-us-west-1.amazonaws.com/nndistiller/sensitivity-pruning/alexnet.checkpoint.89.pth.tar
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 54.231.236.41
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|54.231.236.41|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 733174951 (699M) [application/x-tar]
Saving to: ‘alexnet.checkpoint.89.pth.tar’


2019-04-12 19:20:27 (10.6 MB/s) - ‘alexnet.checkpoint.89.pth.tar’ saved [733174951/733174951]



In [88]:
alexnet89_model = create_model(False, 'imagenet', 'alexnet', parallel=True)
checkpoint_file = 'alexnet.checkpoint.89.pth.tar'
try:
    load_checkpoint(alexnet89_model, checkpoint_file)
    watcher = ww.WeightWatcher(model=resnet20_model, logger=logger)
    watcher.analyze(compute_alphas=True)
    summary = watcher.get_summary()
except NameError as e:
    print("Did you forget to download the checkpoint file?")
    raise e
    
summary

INFO:root:==> using imagenet dataset
INFO:root:=> using alexnet model for ImageNet
INFO:root:=> loading checkpoint alexnet.checkpoint.89.pth.tar
INFO:root:   best top@1: 52.043
INFO:root:Loaded compression schedule from checkpoint (epoch 89)
INFO:root:=> loaded checkpoint 'alexnet.checkpoint.89.pth.tar' (epoch 89)
INFO:app_cfg:
WeightWatcher v0.1.2 by Calculation Consulting
Analyze weight matrices of Deep Neural Networks
https://calculationconsulting.com/
python      version 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.15.3
tensforflow version 1.10.1
keras       version 2.2.2
INFO:app_cfg:Analyzing model
INFO:app_cfg:### Printing results ###
INFO:app_cfg:LogNorm: min: -0.13694187998771667, max: 0.8701463937759399, avg: 0.5175516605377197
INFO:app_cfg:LogNorm compound: min: -0.08011464857392842, max: 0.8399592306878831, avg: 0.5175515964627266
INFO:app_cfg:Alpha: min: 1.490533392059119, max: 10

{'lognorm': 0.51755166,
 'lognorm_compound': 0.5175515964627266,
 'alpha': 3.196053477405699,
 'alpha_compound': 3.196053477405699,
 'alpha_weighted': 0.3071941264704492,
 'alpha_weighted_compound': 0.30719412647044925}

In [145]:
import torchvision.models as models
alexnet_baseline_model = models.alexnet(pretrained=True)
watcher = ww.WeightWatcher(model=alexnet_baseline_model)
results = watcher.analyze(compute_alphas=True)
watcher.get_summary()

2019-04-12 22:27:05,874 INFO 
WeightWatcher v0.1.2 by Calculation Consulting
Analyze weight matrices of Deep Neural Networks
https://calculationconsulting.com/
python      version 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.15.3
tensforflow version 1.10.1
keras       version 2.2.2
INFO:weightwatcher.weightwatcher:
WeightWatcher v0.1.2 by Calculation Consulting
Analyze weight matrices of Deep Neural Networks
https://calculationconsulting.com/
python      version 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.15.3
tensforflow version 1.10.1
keras       version 2.2.2
2019-04-12 22:27:05,884 INFO Analyzing model
INFO:weightwatcher.weightwatcher:Analyzing model
2019-04-12 22:29:21,593 INFO ### Printing results ###
INFO:weightwatcher.weightwatcher:### Printing results ###
2019-04-12 22:29:21,594 INFO LogNorm

{'lognorm': 0.84261876,
 'lognorm_compound': 1.2041019360602847,
 'alpha': 2.8006915646495956,
 'alpha_compound': 2.79836381328899,
 'alpha_weighted': 1.7093696729133068,
 'alpha_weighted_compound': 2.892666070125807}

In [95]:
layers = alexnet89_model.modules()
for i, l in enumerate(layers):
    if (type(l)==torch.nn.modules.linear.Linear):
        print(i,l)

17 Linear(in_features=9216, out_features=4096, bias=True)
20 Linear(in_features=4096, out_features=4096, bias=True)
22 Linear(in_features=4096, out_features=1000, bias=True)


In [151]:
def randomize_mat(W):
    Wshape = W.shape
    Wrand = W.flatten()
    np.random.shuffle(Wrand)
    Wrand = Wrand.reshape(Wshape)
    return Wrand

In [154]:
layers = alexnet89_model.modules()
for i, l in enumerate(layers):
    if (type(l)==torch.nn.modules.linear.Linear):
        print(i,l)
        W = [np.array(l.weight.data.clone().cpu())][0]
        
        svd = TruncatedSVD(n_components=999)
        svd.fit(W)
        sv = svd.singular_values_
        max_sv =  np.max(sv)
        print("actual", max_sv)

        svd = TruncatedSVD(n_components=999)
        svd.fit(randomize_mat(W))
        sv = svd.singular_values_
        max_rand_sv =  np.max(sv)
        print("random ", max_rand_sv, max_rand_sv/max_sv )


17 Linear(in_features=9216, out_features=4096, bias=True)
actual 3.2532797
random  1.404957 0.4318587
20 Linear(in_features=4096, out_features=4096, bias=True)
actual 4.510313
random  2.8153994 0.62421376
22 Linear(in_features=4096, out_features=1000, bias=True)
actual 4.740425
random  1.6384666 0.34563705


In [155]:
layers = alexnet_baseline_model.modules()
for i, l in enumerate(layers):
    if (type(l)==torch.nn.modules.linear.Linear):
        print(i,l)
        W = [np.array(l.weight.data.clone().cpu())][0]
        
        svd = TruncatedSVD(n_components=999)
        svd.fit(W)
        sv = svd.singular_values_
        max_sv =  np.max(sv)
        print("actual", max_sv)

        svd = TruncatedSVD(n_components=999)
        svd.fit(randomize_mat(W))
        sv = svd.singular_values_
        max_rand_sv =  np.max(sv)
        print("random ", max_rand_sv, max_rand_sv/max_sv )


17 Linear(in_features=9216, out_features=4096, bias=True)
actual 5.9382377
random  3.6277204 0.61090857
20 Linear(in_features=4096, out_features=4096, bias=True)
actual 9.300288
random  6.8204 0.7333536
22 Linear(in_features=4096, out_features=1000, bias=True)
actual 6.6769724
random  1.7816509 0.26683515


# Models not working yet

### MobileNet   Iterative Pruning

As our baseline we used a pretrained PyTorch MobileNet model (width=1) which has Top1=68.848 and Top5=88.740.
In their paper, 

Zhu and Gupta prune 50% of the elements of MobileNet (width=1) with a 1.1% drop in accuracy. We pruned about 51.6% of the elements, with virtually no change in the accuracies (Top1: 68.808 and Top5: 88.656).

In [68]:
!wget https://s3-us-west-1.amazonaws.com/nndistiller/agp-pruning/mobilenet/checkpoint.pth.tar

--2019-04-12 21:18:48--  https://s3-us-west-1.amazonaws.com/nndistiller/agp-pruning/mobilenet/checkpoint.pth.tar
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.120.32
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.120.32|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50527563 (48M) [application/x-tar]
Saving to: ‘checkpoint.pth.tar’


2019-04-12 21:18:54 (9.67 MB/s) - ‘checkpoint.pth.tar’ saved [50527563/50527563]



### ResNet18  Iterative Pruning

Results not reportedon Blog

Can not run ?  IDK why

In [72]:
!wget https://s3-us-west-1.amazonaws.com/nndistiller/agp-pruning/resnet18/checkpoint.pth.tar
!mv checkpoint.pth.tar resnet18.checkpoint.pth.tar

--2019-04-12 21:24:17--  https://s3-us-west-1.amazonaws.com/nndistiller/agp-pruning/resnet18/checkpoint.pth.tar
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.28.5
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.28.5|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 138082569 (132M) [application/x-tar]
Saving to: ‘checkpoint.pth.tar.2’


2019-04-12 21:24:36 (7.29 MB/s) - ‘checkpoint.pth.tar.2’ saved [138082569/138082569]



In [None]:
import torchvision.models as models
model = models.resnet18(pretrained=True)
watcher = ww.WeightWatcher(model=model)
results = watcher.analyze(compute_alphas=True)
watcher.get_summary()

### ResNet56 .  Network thinning  CIFAR10

We started by training the baseline ResNet56-Cifar dense network (180 epochs) since we didn't have a pre-trained model.

We trained a ResNet56-Cifar10 network and achieve accuracy results which are on-par with published results: Top1: 92.970 and Top5: 99.740.

We used Hao et al.'s algorithm to remove 37.3% of the original convolution MACs, while maintaining virtually the same accuracy as the baseline: Top1: 92.830 and Top5: 99.760

In [84]:
!wget https://s3-us-west-1.amazonaws.com/nndistiller/pruning_filters_for_efficient_convnets/checkpoint.resnet56_cifar_baseline.pth.tar


--2019-04-12 21:42:51--  https://s3-us-west-1.amazonaws.com/nndistiller/pruning_filters_for_efficient_convnets/checkpoint.resnet56_cifar_baseline.pth.tar
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.116.88
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.116.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6937961 (6.6M) [application/x-tar]
Saving to: ‘checkpoint.resnet56_cifar_baseline.pth.tar’


2019-04-12 21:42:53 (3.11 MB/s) - ‘checkpoint.resnet56_cifar_baseline.pth.tar’ saved [6937961/6937961]



In [85]:
!wget https://s3-us-west-1.amazonaws.com/nndistiller/pruning_filters_for_efficient_convnets/checkpoint_finetuned.pth.tar
!mv checkpoint_finetuned.pth.tar checkpoint.resnet56_cifar_finetuned.pth.tar



--2019-04-12 21:42:53--  https://s3-us-west-1.amazonaws.com/nndistiller/pruning_filters_for_efficient_convnets/checkpoint_finetuned.pth.tar
Resolving s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)... 52.219.116.88
Connecting to s3-us-west-1.amazonaws.com (s3-us-west-1.amazonaws.com)|52.219.116.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5235933 (5.0M) [application/x-tar]
Saving to: ‘checkpoint_finetuned.pth.tar’


2019-04-12 21:42:56 (2.41 MB/s) - ‘checkpoint_finetuned.pth.tar’ saved [5235933/5235933]



In [86]:
resnet56_model = create_model(False, 'cifar10', 'resnet56_cifar', parallel=True)
checkpoint_file = 'checkpoint.resnet56_cifar_baseline.pth.tar'

try:
    load_checkpoint(resnet56_model, checkpoint_file)
    watcher = ww.WeightWatcher(model=resent18_model, logger=logger)
    watcher.analyze(compute_alphas=True)
    summary = watcher.get_summary()
except NameError as e:
    print("Did you forget to download the checkpoint file?")
    raise e
    
summary

INFO:root:==> using cifar10 dataset
INFO:root:=> creating resnet56_cifar model for CIFAR10
INFO:root:=> loading checkpoint checkpoint.resnet56_cifar_baseline.pth.tar
INFO:root:   best top@1: 92.920
INFO:root:Loaded compression schedule from checkpoint (epoch 179)
INFO:root:=> loaded checkpoint 'checkpoint.resnet56_cifar_baseline.pth.tar' (epoch 179)
INFO:app_cfg:
WeightWatcher v0.1.2 by Calculation Consulting
Analyze weight matrices of Deep Neural Networks
https://calculationconsulting.com/
python      version 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.15.3
tensforflow version 1.10.1
keras       version 2.2.2
INFO:app_cfg:Analyzing model
INFO:app_cfg:### Printing results ###
INFO:app_cfg:LogNorm: min: 0.5676514506340027, max: 1.3524690866470337, avg: 0.7858442068099976
INFO:app_cfg:LogNorm compound: min: 0.5736563735538058, max: 1.3524690866470337, avg: 0.8627014862166511
INFO:app_cfg:Alpha:

{'lognorm': 0.7858442,
 'lognorm_compound': 0.8627014862166511,
 'alpha': 7.889852459338699,
 'alpha_compound': 8.006562435126574,
 'alpha_weighted': -0.5332915968457409,
 'alpha_weighted_compound': 0.4614291985654078}

In [87]:
resnet56_model = create_model(False, 'cifar10', 'resnet56_cifar', parallel=True)
checkpoint_file = 'checkpoint.resnet56_cifar_finetuned.pth.tar'

try:
    load_checkpoint(resnet56_model, checkpoint_file)
    watcher = ww.WeightWatcher(model=resent18_model, logger=logger)
    watcher.analyze(compute_alphas=True)
    summary = watcher.get_summary()
except NameError as e:
    print("Did you forget to download the checkpoint file?")
    raise e
    
summary

INFO:root:==> using cifar10 dataset
INFO:root:=> creating resnet56_cifar model for CIFAR10
INFO:root:=> loading checkpoint checkpoint.resnet56_cifar_finetuned.pth.tar
INFO:root:   best top@1: 92.960
INFO:root:Loaded compression schedule from checkpoint (epoch 59)
INFO:root:=> loaded checkpoint 'checkpoint.resnet56_cifar_finetuned.pth.tar' (epoch 59)


RuntimeError: Error(s) in loading state_dict for ResNetCifar:
	size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.0.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.0.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.0.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.0.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.0.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.1.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.1.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.1.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.1.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.1.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.1.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.2.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.2.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.2.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.2.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.2.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.2.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.3.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.3.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.3.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.3.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.3.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.3.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.4.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.4.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.4.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.4.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.4.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.4.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.5.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.5.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.5.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.5.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.5.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.5.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.6.conv1.weight: copying a param with shape torch.Size([7, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer1.6.bn1.weight: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.6.bn1.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.6.bn1.running_mean: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.6.bn1.running_var: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([16]).
	size mismatch for layer1.6.conv2.weight: copying a param with shape torch.Size([16, 7, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
	size mismatch for layer2.0.conv1.weight: copying a param with shape torch.Size([16, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 16, 3, 3]).
	size mismatch for layer2.0.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.0.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.0.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.0.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.0.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.1.conv1.weight: copying a param with shape torch.Size([16, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.1.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.1.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.1.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.1.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.1.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.2.conv1.weight: copying a param with shape torch.Size([16, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.2.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.2.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.2.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.2.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.2.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.3.conv1.weight: copying a param with shape torch.Size([16, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.3.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.3.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.3.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.3.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.3.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.5.conv1.weight: copying a param with shape torch.Size([16, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.5.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.5.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.5.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.5.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.5.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.6.conv1.weight: copying a param with shape torch.Size([16, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.6.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.6.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.6.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.6.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.6.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.7.conv1.weight: copying a param with shape torch.Size([16, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer2.7.bn1.weight: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.7.bn1.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.7.bn1.running_mean: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.7.bn1.running_var: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([32]).
	size mismatch for layer2.7.conv2.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
	size mismatch for layer3.2.conv1.weight: copying a param with shape torch.Size([45, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.2.bn1.weight: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.2.bn1.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.2.bn1.running_mean: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.2.bn1.running_var: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.2.conv2.weight: copying a param with shape torch.Size([64, 45, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.3.conv1.weight: copying a param with shape torch.Size([45, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.3.bn1.weight: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.3.bn1.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.3.bn1.running_mean: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.3.bn1.running_var: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.3.conv2.weight: copying a param with shape torch.Size([64, 45, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.5.conv1.weight: copying a param with shape torch.Size([45, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.5.bn1.weight: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.5.bn1.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.5.bn1.running_mean: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.5.bn1.running_var: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.5.conv2.weight: copying a param with shape torch.Size([64, 45, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.6.conv1.weight: copying a param with shape torch.Size([45, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.6.bn1.weight: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.6.bn1.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.6.bn1.running_mean: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.6.bn1.running_var: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.6.conv2.weight: copying a param with shape torch.Size([64, 45, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.7.conv1.weight: copying a param with shape torch.Size([45, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.7.bn1.weight: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.7.bn1.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.7.bn1.running_mean: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.7.bn1.running_var: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.7.conv2.weight: copying a param with shape torch.Size([64, 45, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.8.conv1.weight: copying a param with shape torch.Size([45, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for layer3.8.bn1.weight: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.8.bn1.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.8.bn1.running_mean: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.8.bn1.running_var: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([64]).
	size mismatch for layer3.8.conv2.weight: copying a param with shape torch.Size([64, 45, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).