# WeightWatcher

https://calculationconsulting.com

In [1]:
# Suppress the powerlaw package warnings
# "powerlaw.py:700: RuntimeWarning: divide by zero encountered in true_divide"
# "powerlaw.py:700: RuntimeWarning: invalid value encountered in true_divide"
import warnings
warnings.simplefilter(action='ignore', category=RuntimeWarning)

## 1. Quick start example

### 1.1 Import your model (Keras or PyTorch)

In [2]:
from keras.models import load_model
from keras.applications import vgg16

kmodel = vgg16.VGG16
model = kmodel(weights='imagenet')

Using TensorFlow backend.


Instructions for updating:
Colocations handled automatically by placer.


### 1.2 Run WeightWatcher

In [3]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze()

2019-11-03 17:30:52,299 INFO 

python      version 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.16.4
tensforflow version 1.13.1
keras       version 2.2.4
2019-11-03 17:30:52,301 INFO Analyzing model 'vgg16' with 23 layers
2019-11-03 17:30:53,481 INFO ### Printing results ###
2019-11-03 17:30:55,720 INFO Norm: min: 2.4488985538482666, max: 23.428979873657227, avg: 4.01861572265625
2019-11-03 17:30:55,721 INFO Norm compound: min: 2.728064775466919, max: 23.428979873657227, avg: 6.7535719871521
2019-11-03 17:30:55,723 INFO LogNorm: min: 0.3889707922935486, max: 1.369753360748291, avg: 0.5674788951873779
2019-11-03 17:30:55,725 INFO LogNorm compound: min: 0.43449220061302185, max: 1.369753360748291, avg: 0.6947276592254639


In [4]:
results

{0: {'id': 0,
  'type': <keras.engine.input_layer.InputLayer at 0x10dd2cac8>,
  'message': 'Skipping (Layer not supported)'},
 1: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  0: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 1/9 (3,64): Skipping: too small (<50)'},
  1: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 2/9 (3,64): Skipping: too small (<50)'},
  2: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 3/9 (3,64): Skipping: too small (<50)'},
  3: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 4/9 (3,64): Skipping: too small (<50)'},
  4: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 5/9 (3,64): Skipping: too small (<50)'},
  5: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 6/9 (3,64): Skipping: too small (<50)'},
  6: {'N': 64,
   'M': 3,
   'Q': 21.333333333333332,
   'summary': 'Weight matrix 7/9 

In [5]:
watcher.get_summary()

{'norm': 4.0186157,
 'norm_compound': 6.753572,
 'lognorm': 0.5674789,
 'lognorm_compound': 0.69472766}

In [6]:
watcher.print_results()

2019-11-03 17:30:55,864 INFO ### Printing results ###
2019-11-03 17:30:58,896 INFO Norm: min: 2.4488985538482666, max: 23.428979873657227, avg: 4.01861572265625
2019-11-03 17:30:58,898 INFO Norm compound: min: 2.728064775466919, max: 23.428979873657227, avg: 6.7535719871521
2019-11-03 17:30:58,899 INFO LogNorm: min: 0.3889707922935486, max: 1.369753360748291, avg: 0.5674788951873779
2019-11-03 17:30:58,900 INFO LogNorm compound: min: 0.43449220061302185, max: 1.369753360748291, avg: 0.6947276592254639


## 2. Advanced examples

## 2.1 Filter by layer type (CONV1D, CONV2D, DENSE)

In this example we are interested in the DENSE layers only

In [None]:
from keras.applications import vgg16

kmodel = vgg16.VGG16
model = kmodel(weights='imagenet')

import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

watcher.analyze(layers=ww.LAYER_TYPE.DENSE)

2019-11-03 17:31:02,300 INFO 

python      version 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.16.4
tensforflow version 1.13.1
keras       version 2.2.4
2019-11-03 17:31:02,301 INFO Analyzing model 'vgg16' with 23 layers
2019-11-03 17:31:02,880 INFO ### Printing results ###
2019-11-03 17:31:03,442 INFO Norm: min: 16.757492065429688, max: 23.428979873657227, avg: 19.402746200561523
2019-11-03 17:31:03,443 INFO Norm compound: min: 16.757492065429688, max: 23.428979873657227, avg: 19.402746200561523
2019-11-03 17:31:03,445 INFO LogNorm: min: 1.2242090702056885, max: 1.369753360748291, avg: 1.2832533121109009
2019-11-03 17:31:03,446 INFO LogNorm compound: min: 1.2242090702056885, max: 1.369753360748291, avg: 1.2832533121109009


{0: {'id': 0,
  'type': <keras.engine.input_layer.InputLayer at 0xb3fe8dd68>,
  'message': 'Skipping (Layer not supported)'},
 1: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 2: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 3: {'id': 3,
  'type': <keras.layers.pooling.MaxPooling2D at 0xb3fe8aa90>,
  'message': 'Skipping (Layer not supported)'},
 4: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 5: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 6: {'id': 6,
  'type': <keras.layers.pooling.MaxPooling2D at 0xb3fedbeb8>,
  'message': 'Skipping (Layer not supported)'},
 7: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 8: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested

In [None]:
watcher.print_results()

2019-11-03 17:31:03,495 INFO ### Printing results ###
2019-11-03 17:31:04,109 INFO Norm: min: 16.757492065429688, max: 23.428979873657227, avg: 19.402746200561523
2019-11-03 17:31:04,110 INFO Norm compound: min: 16.757492065429688, max: 23.428979873657227, avg: 19.402746200561523
2019-11-03 17:31:04,111 INFO LogNorm: min: 1.2242090702056885, max: 1.369753360748291, avg: 1.2832533121109009
2019-11-03 17:31:04,113 INFO LogNorm compound: min: 1.2242090702056885, max: 1.369753360748291, avg: 1.2832533121109009


## 2.2 Filter by multiple layer types

In this example we are interested in the CONV1D and DENSE layers.

Filter the layers using a bitmask.

In [None]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

watcher.analyze(layers=ww.LAYER_TYPE.CONV1D|ww.LAYER_TYPE.DENSE)

2019-11-03 17:31:04,155 INFO 

python      version 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.16.4
tensforflow version 1.13.1
keras       version 2.2.4
2019-11-03 17:31:04,157 INFO Analyzing model 'vgg16' with 23 layers
2019-11-03 17:31:04,669 INFO ### Printing results ###
2019-11-03 17:31:05,256 INFO Norm: min: 16.757492065429688, max: 23.428979873657227, avg: 19.402746200561523
2019-11-03 17:31:05,258 INFO Norm compound: min: 16.757492065429688, max: 23.428979873657227, avg: 19.402746200561523
2019-11-03 17:31:05,259 INFO LogNorm: min: 1.2242090702056885, max: 1.369753360748291, avg: 1.2832533121109009
2019-11-03 17:31:05,261 INFO LogNorm compound: min: 1.2242090702056885, max: 1.369753360748291, avg: 1.2832533121109009


{0: {'id': 0,
  'type': <keras.engine.input_layer.InputLayer at 0xb3fe8dd68>,
  'message': 'Skipping (Layer not supported)'},
 1: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 2: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 3: {'id': 3,
  'type': <keras.layers.pooling.MaxPooling2D at 0xb3fe8aa90>,
  'message': 'Skipping (Layer not supported)'},
 4: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 5: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 6: {'id': 6,
  'type': <keras.layers.pooling.MaxPooling2D at 0xb3fedbeb8>,
  'message': 'Skipping (Layer not supported)'},
 7: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested to analyze)'},
 8: {'layer_type': <LAYER_TYPE.CONV2D: 4>,
  'message': 'Skipping (Layer type not requested

## 2.3 Filter by layer Ids

In [None]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

watcher.analyze(layers=[20])

2019-11-03 17:31:05,316 INFO 

python      version 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.16.4
tensforflow version 1.13.1
keras       version 2.2.4
2019-11-03 17:31:05,318 INFO Analyzing model 'vgg16' with 23 layers
2019-11-03 17:31:05,772 INFO ### Printing results ###
2019-11-03 17:31:06,334 INFO Norm: min: 23.428979873657227, max: 23.428979873657227, avg: 23.428979873657227
2019-11-03 17:31:06,335 INFO Norm compound: min: 23.428979873657227, max: 23.428979873657227, avg: 23.428979873657227
2019-11-03 17:31:06,337 INFO LogNorm: min: 1.369753360748291, max: 1.369753360748291, avg: 1.369753360748291
2019-11-03 17:31:06,338 INFO LogNorm compound: min: 1.369753360748291, max: 1.369753360748291, avg: 1.369753360748291


{0: {'id': 0,
  'type': <keras.engine.input_layer.InputLayer at 0xb3fe8dd68>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 1: {'id': 1,
  'type': <keras.layers.convolutional.Conv2D at 0xb3fe8dc18>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 2: {'id': 2,
  'type': <keras.layers.convolutional.Conv2D at 0xb3fe8d8d0>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 3: {'id': 3,
  'type': <keras.layers.pooling.MaxPooling2D at 0xb3fe8aa90>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 4: {'id': 4,
  'type': <keras.layers.convolutional.Conv2D at 0xb3fe8a8d0>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 5: {'id': 5,
  'type': <keras.layers.convolutional.Conv2D at 0xb3fec2198>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 6: {'id': 6,
  'type': <keras.layers.pooling.MaxPooling2D at 0xb3fedbeb8>,
  'message': 'Skipping (Layer id not requested to analyze)'},
 7: {'id': 7,
  'type': <keras.l

## 2.4 Get the return values per layer

In [None]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze()

2019-11-03 17:31:06,379 INFO 

python      version 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.16.4
tensforflow version 1.13.1
keras       version 2.2.4
2019-11-03 17:31:06,381 INFO Analyzing model 'vgg16' with 23 layers
2019-11-03 17:31:07,196 INFO ### Printing results ###
2019-11-03 17:31:09,081 INFO Norm: min: 2.4488985538482666, max: 23.428979873657227, avg: 4.01861572265625
2019-11-03 17:31:09,082 INFO Norm compound: min: 2.728064775466919, max: 23.428979873657227, avg: 6.7535719871521
2019-11-03 17:31:09,082 INFO LogNorm: min: 0.3889707922935486, max: 1.369753360748291, avg: 0.5674788951873779
2019-11-03 17:31:09,083 INFO LogNorm compound: min: 0.43449220061302185, max: 1.369753360748291, avg: 0.6947276592254639


In [None]:
for layer_id, result in results.items():
    for slice_id, summary in result.items():
        if not str(slice_id).isdigit() or "lognorm" not in summary:
            continue
        lognorm = summary["lognorm"]
        print("Layer {}, Slice {}: Lognorm: {}".format(layer_id, slice_id, lognorm))    

Layer 2, Slice 0: Lognorm: 0.3978934586048126
Layer 2, Slice 1: Lognorm: 0.45358702540397644
Layer 2, Slice 2: Lognorm: 0.40578144788742065
Layer 2, Slice 3: Lognorm: 0.45428669452667236
Layer 2, Slice 4: Lognorm: 0.49695152044296265
Layer 2, Slice 5: Lognorm: 0.45737624168395996
Layer 2, Slice 6: Lognorm: 0.4044671952724457
Layer 2, Slice 7: Lognorm: 0.4511153995990753
Layer 2, Slice 8: Lognorm: 0.3889707922935486
Layer 4, Slice 0: Lognorm: 0.44109997153282166
Layer 4, Slice 1: Lognorm: 0.4613628089427948
Layer 4, Slice 2: Lognorm: 0.4370166063308716
Layer 4, Slice 3: Lognorm: 0.4667660593986511
Layer 4, Slice 4: Lognorm: 0.5201410055160522
Layer 4, Slice 5: Lognorm: 0.46935534477233887
Layer 4, Slice 6: Lognorm: 0.4464665651321411
Layer 4, Slice 7: Lognorm: 0.48161619901657104
Layer 4, Slice 8: Lognorm: 0.4471622705459595
Layer 5, Slice 0: Lognorm: 0.45507872104644775
Layer 5, Slice 1: Lognorm: 0.4839082956314087
Layer 5, Slice 2: Lognorm: 0.4593982994556427
Layer 5, Slice 3: Lognorm

## 2.5 Power Law Fit

In [None]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

2019-11-03 17:31:09,120 INFO 

python      version 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
numpy       version 1.16.4
tensforflow version 1.13.1
keras       version 2.2.4
2019-11-03 17:31:09,121 INFO Analyzing model 'vgg16' with 23 layers


## 2.6 Debug and Custom Logging

### Custom Logging at Debug Level

In [None]:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

import weightwatcher as ww

watcher = ww.WeightWatcher(model=model, logger=logger)

results = watcher.analyze()

### Disable Logging

In [None]:
import weightwatcher as ww

watcher = ww.WeightWatcher(model=model, log=False)

results = watcher.analyze()

## 3. pyTorch Models

In [None]:
data = []

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg16(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg16torch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg16_bn(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg16bntorch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg11(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg11torch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg11_bn(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg11bntorch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg13(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg13torch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg13_bn(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg13bntorch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg19(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg19torch", "summary": watcher.get_summary()})

In [None]:
import weightwatcher as ww
import torchvision.models as models

model = models.vgg19_bn(pretrained=True)

watcher = ww.WeightWatcher(model=model)

results = watcher.analyze(alphas=True)

data.append({"name": "vgg19bntorch", "summary": watcher.get_summary()})

In [None]:
data

In [None]:
# pytorch Model accuracies 
# https://github.com/Cadene/pretrained-models.pytorch

accuracies = {
    "vgg11torch": 68.970,
    "vgg11bntorch": 70.452,
    "vgg13torch": 69.662,
    "vgg13bntorch": 71.508,
    "vgg16torch": 71.636,
    "vgg16bntorch": 73.518,
    "vgg19torch": 72.080,
    "vgg19bntorch": 74.266,
}

In [None]:
# pytorch Model accuracies 
# https://github.com/Cadene/pretrained-models.pytorch

accuracies5 = {
    "vgg11torch": 88.746,
    "vgg11bntorch": 89.818,
    "vgg13torch": 89.264,
    "vgg13bntorch": 90.494,
    "vgg16torch": 90.354,
    "vgg16bntorch": 91.608,
    "vgg19torch": 90.822,
    "vgg19bntorch": 92.066,
}

### 3.1 Log Norm of Weight Matrices vs Accuracies of models

The following graph demonstrates the linear relationship between the average Log Norm of Weight matrices and the test accuracies of the models (notice we didnt't need the test data):

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [8,8]

for modelname, accuracy in accuracies5.items():
    x = accuracy
    summary = [d["summary"] for d in data if d["name"] == modelname]
    y = summary[0]["lognorm"]
    label = modelname
    plt.scatter(x,y,label=label)

plt.legend()
plt.title(r"Test Accuracy vs Average Log Norm $\langle\log\Vert W\Vert\rangle$"+"\nPretrained VGG and VGG_BN Models")
plt.xlabel(r"Test Accuracy")
plt.ylabel(r"$\langle\log\Vert W\Vert\rangle$");

Let's compare the average Log Norm with the average Log Norm compound:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [8,8]

x = []
y1, y2 = [], []
for modelname, accuracy in accuracies5.items():
    x.append(accuracy)
    summary = [d["summary"] for d in data if d["name"] == modelname]
    y1.append(summary[0]["lognorm"])
    y2.append(summary[0]["lognorm_compound"])
    label = modelname
plt.scatter(x,y1,label="Log Norm", color='r')
plt.scatter(x,y2,label="Log Norm Compound", color='b')

plt.legend()
plt.title(r"Test Accuracy vs (Average Log Norm $\langle\log\Vert W\Vert\rangle$ and Log Norm Compound)"+"\nPretrained VGG and VGG_BN Models")
plt.xlabel(r"Test Accuracy")
plt.ylabel(r"$\langle\log\Vert W\Vert\rangle$");

### 3.2 Power law fitting (Alpha) of Weight Matrices vs Accuracies of models

The linear relationship between the Power law fitting (Alpha) of the weight matrices and the accuracies of the models is demonstrated in the following graph:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [8,8]

for modelname, accuracy in accuracies5.items():
    x = accuracy
    summary = [d["summary"] for d in data if d["name"] == modelname]
    y = summary[0]["alpha_weighted"]
    label = modelname
    plt.scatter(x,y,label=label)

plt.legend()
plt.title(r"Test Accuracy vs Weighted Alpha"+"\nPretrained VGG and VGG_BN Models")
plt.xlabel(r"Test Accuracy")
plt.ylabel(r"Weighted Alpha");

The more accurate the model, the lower the exponent of the power law fit of the weight matrices is.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [8,8]

x = []
y1, y2 = [], []
for modelname, accuracy in accuracies5.items():
    x.append(accuracy)
    summary = [d["summary"] for d in data if d["name"] == modelname]
    y1.append(summary[0]["alpha_weighted"])
    y2.append(summary[0]["alpha_weighted_compound"])
plt.scatter(x,y1,label="Weighted Alpha", color='r')
plt.scatter(x,y2,label="Weighted Alpha Compound", color='b')

plt.legend()
plt.title(r"Test Accuracy vs (Weighted Alpha and Weighted Alpha compound)"+"\nPretrained VGG and VGG_BN Models")
plt.xlabel(r"Test Accuracy")
plt.ylabel(r"Weighted Alpha");

## 4. Conclusion

WeightWatcher helps you choose the best pretrained model for your needs.

You can use WeightWatcher to compare several pretrained models and choose the one with the lowest Log Norm.