<a href="https://colab.research.google.com/github/CalculatedContent/WeightWatcher/blob/master/examples/WeightWatcher-VGG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# WeightWatcher - VGG

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96


VGG is one of the first, large scale modern architectures based on the classic convolutional model of LeCun, resembling a larger scale version of LeNet5.  While there are earlier models (AlexNet, etc), VGG is usefukl because:

-  There are several variants, with 11, 13, 16, and 19 layers, and with and wthout Batch Normalization
-  It is widely available in pyTorch and other frameworks
-  There are versions trained on all of ImageNet, ImageNet-1K (a smaller data set), etc.
-  Although large,  is still used in Transfer Learning.

The general VGG series architecture consists of 

-  Several sets of Conv2D+ReLU layers (followed by Max Pooling), with feature maps increasing in size
-  3 final Fully Connected (Dense/Linear) layers

Below we show the VGG16 architecture, consisting of 16 layers.

The VGG series are considered very large models, with an enormous number of parameters comparexd to later models like ResNet and DenseNet series.  Most notably, compared to later models,

-  VGG models contain large, FC layers at the end
-  VGG does not contain residual connections

(note: Figures may not display in Google Colab)


In [1]:
!touch VGG16.1.png VGG16.2.png CV-models.png

In [None]:
from IPython.display import Image
Image(filename='VGG16.1.png',width=600, height=300)

In [None]:
Image(filename='VGG16.2.png',width=800, height=400)

It is noted that the VGGNet model (16?) has ov er 2X the number of parameters as one of the largest ResNet models, with far less top5 accuracy.  

In [None]:
from IPython.display import Image
Image(filename='CV-models.png',width=800, height=400)

## Summary of results


We first compute the Average Alpha $\langle\alpha\rangle$ for all models, and compare to the Test Accuracy accross the models.  In contrast to expectations, and other models like ResNet, on average, $\langle\alpha\rangle$ is increasing with Test Accuracy instead of decreasing.  In fact, $\langle\alpha\rangle$ is strongly *negatively correlated* with reported Test Accuracy.  

#### Why is this ?

If we look at $\alpha$ vs Layer Id, we see that  $\alpha$ is increasing with Layer Id. That is, as information flows thorugh the network, the layers are less and less correlated.  THis suggests (to me) that the VGG networks are fairly bad at funneling information through the network.

Instead, we need to use the Weighted Alpha metric $\hat{\alpha}$ , which is positively correlated with the Test Accuracy.  

## Calculation of Results

In [None]:
# Suppress the powerlaw package warnings
# "powerlaw.py:700: RuntimeWarning: divide by zero encountered in true_divide"
# "powerlaw.py:700: RuntimeWarning: invalid value encountered in true_divide"
import warnings
warnings.simplefilter(action='ignore', category=RuntimeWarning)

In [None]:
#%%capture
#!pip install weightwatcher

In [None]:
import numpy as np
import pandas as pd

from tqdm import tqdm

from sklearn.linear_model import LinearRegression
from sklearn import metrics
import scipy.stats as stats

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

### Import WeightWatcher

set custom Logging at WARN Level

In [None]:
import weightwatcher as ww
import torchvision
import torchvision.models as models


import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(ww.__name__)
logger.setLevel(logging.INFO) 


ww.__version__,  torchvision.__version__

### Create all models now

Pick colors from https://matplotlib.org/3.1.0/gallery/color/named_colors.html

In [None]:
series_name = 'VGG'
all_names = [ 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn']
colors =    ['indigo', 'blue',    'purple',  'cyan',   'darkgreen','goldenrod','darkorange','red']


all_models = []
all_models.append(models.vgg11(weights="IMAGENET1K_V1"))
all_models.append(models.vgg11_bn(weights="IMAGENET1K_V1"))

all_models.append(models.vgg13(weights="IMAGENET1K_V1"))
all_models.append(models.vgg13_bn(weights="IMAGENET1K_V1"))

all_models.append(models.vgg16(weights="IMAGENET1K_V1"))
all_models.append(models.vgg16_bn(weights="IMAGENET1K_V1"))

all_models.append(models.vgg19(weights="IMAGENET1K_V1"))
all_models.append(models.vgg19_bn(weights="IMAGENET1K_V1"))


### Get reported accuracies from pytorch website

https://pytorch.org/docs/stable/torchvision/models.html

<pre>
<table class="docutils align-default">
<colgroup>
<col style="width: 55%" />
<col style="width: 22%" />
<col style="width: 22%" />
</colgroup>
<thead>
<thead>
<tr class="row-odd"><th class="head"><p>Network</p></th>
<th class="head"><p>Top-1 error</p></th>
<th class="head"><p>Top-5 error</p></th>
</tr>
</thead>
<tbody>

<tr class="row-odd"><td><p>VGG-11</p></td>
<td><p>30.98</p></td>
<td><p>11.37</p></td>
</tr>
<tr class="row-even"><td><p>VGG-13</p></td>
<td><p>30.07</p></td>
<td><p>10.75</p></td>
</tr>
<tr class="row-odd"><td><p>VGG-16</p></td>
<td><p>28.41</p></td>
<td><p>9.62</p></td>
</tr>
<tr class="row-even"><td><p>VGG-19</p></td>
<td><p>27.62</p></td>
<td><p>9.12</p></td>
</tr>
<tr class="row-odd"><td><p>VGG-11 with batch normalization</p></td>
<td><p>29.62</p></td>
<td><p>10.19</p></td>
</tr>
<tr class="row-even"><td><p>VGG-13 with batch normalization</p></td>
<td><p>28.45</p></td>
<td><p>9.63</p></td>
</tr>
<tr class="row-odd"><td><p>VGG-16 with batch normalization</p></td>
<td><p>26.63</p></td>
<td><p>8.50</p></td>
</tr>
<tr class="row-even"><td><p>VGG-19 with batch normalization</p></td>
<td><p>25.76</p></td>
<td><p>8.15</p></td>
</tr>
</tbody>
</table>
</pre>

In [None]:
top1_errors= {
    
    "vgg11": 30.98,
    "vgg11_bn": 29.62,
    "vgg13": 30.07,
    "vgg13_bn": 28.45,
    "vgg16": 28.41,
    "vgg16_bn": 26.63,
    "vgg19": 27.62,
    "vgg19_bn": 25.76,
}

In [None]:
top5_errors= {
    
    "vgg11": 11.37,
    "vgg11_bn": 10.19,
    "vgg13": 10.75,
    "vgg13_bn": 9.63,
    "vgg16": 9.62,
    "vgg16_bn": 8.50,
    "vgg19": 9.12,
    "vgg19_bn": 8.15,
}

### Run WeightWatcher, collect summary and details (as dataframes) for all models

 The following metrics are analyzed below
 
- 'log_norm'
- 'alpha'
- 'alpha_weighted'

Other metrics are available by default in the summary statistics

    'log_alpha_norm', 'log_spectral_norm', 'stable_rank', 'mp_softrank'
    
For new metrics, the summary can be computed directly from the details dataframes


In [None]:
#%%capture

import warnings
warnings.filterwarnings('ignore')

all_summaries, all_details = [], []
for im, name in enumerate(tqdm(all_names)):
    print(im, name)
    watcher = ww.WeightWatcher(model=all_models[im]) 
    details = watcher.analyze(ww2x=True, mp_fit=True, randomize=False, vectors=False, min_evals=50)
    all_details.append(details)
    all_summaries.append(watcher.get_summary(details))

    

### The summary contains the following metrics (currently)

In [None]:

plt.rcParams.update({'font.size': 10})
from pylab import rcParams

params = {'legend.fontsize': 'x-large',
          'figure.figsize': (7,7),
         'axes.labelsize': 'x-large',
         'axes.titlesize':'xx-large',
         'xtick.labelsize':'xx-large',
         'ytick.labelsize':'xx-large'}
plt.rcParams.update(params)

In [None]:
def plot_test_accuracy(metric, xlabel, title, series_name, \
                       all_names, all_summaries, top_errors, save=False):
    """Create plot of Metric vs Reported Test Accuracy, and run Linear Regression"""
    
    num = len(all_names)
    xs, ys = np.empty(num), np.empty(num)
    for im, modelname in enumerate(all_names):    

        summary = all_summaries[im]
        x = summary[metric]
        xs[im] = x

        error = top_errors[modelname]
        y = 100.0-error
        ys[im] = y

        label = modelname
        plt.scatter(x, y, label=label)


    xs = xs.reshape(-1,1)
    ys = ys.reshape(-1,1)
    regr = LinearRegression()
    regr.fit(xs, ys)
    y_pred = regr.predict(xs)
    plt.plot(xs, y_pred, color='red', linewidth=1)

    rmse = np.sqrt(metrics.mean_squared_error(ys, y_pred))
    r2 = metrics.r2_score(ys, y_pred)

    tau, p_value = stats.kendalltau(xs, ys)
    title2 = " RMSE: {:0.2} K-tau {:0.2}".format(rmse, r2, tau)

    plt.legend()
    plt.title("Test Accuracy vs "+title+"\n"+xlabel+title2)
    plt.ylabel(r"Test Accuracy")
    plt.xlabel(xlabel);
    
    if save:
        figname = "img/{}_{}_accs.png".format(series_name, metric)
        print("saving {}".format(figname))
        plt.savefig(figname)
        plt.show()

### Plots of Test Accuracy vs WeightWatcher metrics

In [None]:
metric = "log_norm"
xlabel = r"$\langle\log\Vert W\Vert_{F}\rangle$"
title = "Avg. log Frobenius Norm "
plot_test_accuracy(metric, xlabel, title, series_name, all_names, all_summaries, top1_errors, save = False)

In [None]:
metric = "alpha"
xlabel = r"$\langle\alpha\rangle$"
title = "Avg. Alpha  "
plot_test_accuracy(metric, xlabel, title, series_name, all_names, all_summaries, top1_errors)

In [None]:
metric = "alpha_weighted"
xlabel = r"$\hat{\alpha}$"
plot_test_accuracy(metric, xlabel, title, series_name,  all_names, all_summaries, top1_errors)

### Correlation Flow

VGG is an odd architecture in the that Alpha $\alpha$ metric gets worse with increasing depth

This suggests that the Correlation Flow is so poor that we can not use the layer-averaged Alpha $\alpha$ metric as a total model metric,

Notice that the Flow of ALpha-Hat $(\hat{\alpha})$ is more aligned between models of very different depths (VGG11 vs VGG19)

In [None]:
def plot_correlation_flow(metric, ylabel, title, series_name,
    all_names, all_details, colors, log=False, valid_ids = [], save=False):
    
    transparency = 1.0
      
    if len(valid_ids) == 0:
        valid_ids = range(len(all_details)-1)
        idname='all'
    else:
        idname='fnl'
        
        
    for im, details in enumerate(all_details):
        if im in valid_ids:
            
            details = all_details[im]
            name = all_names[im]
            x = details.index.to_numpy()
            y = details[metric].to_numpy()
            if log:
                y = np.log10(np.array(y+0.000001, dtype=np.float))

            plt.scatter(x,y, label=name, color=colors[im])

    plt.legend()
    plt.title("Depth vs "+title+" "+ylabel)
    plt.xlabel("Layer id")
    plt.ylabel(ylabel)
    
    if save:
        figname = "img/{}_{}_{}_depth.png".format(series_name, idname, metric)
        print("saving {}".format(figname))
        plt.savefig(figname)
        plt.show()

In [None]:
metric = "alpha"
xlabel = r"Alpha $\alpha$"
title = 'VGG'
plot_correlation_flow(metric, xlabel, title, series_name, all_names, all_details, colors, log=False, valid_ids = [0, 6])

In [None]:
metric = "alpha_weighted"
xlabel = r"Alpha-Hat $(\hat{\alpha})$"
title = 'VGG'
plot_correlation_flow(metric, xlabel, title, series_name, all_names, all_details, colors, log=False, valid_ids = [0, 6])