# Question: Given a pretrained model, how small can we make the file that stores the model weights?

## Let's calculate what our theorectical lower limit. We can do this with a simple calculation:

$$ \text{min file size} = \text{num of model params} * \text{size per param} $$

In [1]:
import torchvision.models as models
import torch
import numpy as np

In [2]:
model = models.vgg16(pretrained=True)

In [3]:
# http://pytorch.org/docs/master/tensors.html
PRECISION = {
    str(torch.FloatTensor): 32,
    str(torch.DoubleTensor): 64,
    str(torch.HalfTensor): 16,
    str(torch.ByteTensor): 8,
    str(torch.CharTensor): 8,
    str(torch.ShortTensor): 16,
    str(torch.IntTensor): 32,
    str(torch.LongTensor): 64,
}
BITS_PER_BYTE = 8

def get_param_count(model):
    num_model_params = 0
    for p in model.parameters():
        num_model_params += p.numel()
    return num_model_params

def get_param_precision(model):
    tensor_type = type(model.parameters().next().data)
    return PRECISION[str(tensor_type)]

def get_min_file_size(model):
    num_model_params = get_param_count(model)
    size_per_param = get_param_precision(model) / BITS_PER_BYTE
    return num_model_params * size_per_param

min_file_size = get_min_file_size(model)
print 'min file size = {:,} bytes '.format(min_file_size)

min file size = 553,430,176 bytes 


## So our lower limit is ~553 MB. 

## How close is this calculation to the value we see when using `torch.save`?

In [4]:
import os

In [6]:
def get_actual_file_size(model):
    file_name = 'model.pt'
    torch.save(model, file_name)
    return os.path.getsize(file_name)

actual_file_size = get_actual_file_size(model)
print 'actual file size = {:,}'.format(actual_file_size)
print 'difference = {:.2}%'.format(abs(min_file_size - actual_file_size) / float(min_file_size))

actual file size = 553,458,637
difference = 5.1e-05%


## The actual file size is ~= the lower limit...is this a fluke?

In [9]:
# WARNING: This will take awhile to run especially if it is your first 
# using one of the pretrained models; pytorch needs to download the weights 
# if they do not already exist locally.

excluded = [models.ResNet, models.VGG, 
            models.DenseNet, models.AlexNet, 
            models.SqueezeNet, models.Inception3]

# https://stackoverflow.com/questions/21885814/how-to-iterate-through-a-modules-functions
max_diff = None
for _, method in models.__dict__.iteritems():
    # Is it a pretrained model?
    if callable(method) and method not in excluded:
        model = method(pretrained=True)
        diff = abs(get_min_file_size(model) - get_actual_file_size(model)) / float(min_file_size)
        max_diff = max(diff, max_diff)
        
print 'max difference = {:.2}%'.format(max_diff)

Downloading: "https://download.pytorch.org/models/inception_v3_google-1a9a5a14.pth" to /Users/jkarimi91/.torch/models/inception_v3_google-1a9a5a14.pth
100.0%
Downloading: "https://download.pytorch.org/models/vgg13-c768596a.pth" to /Users/jkarimi91/.torch/models/vgg13-c768596a.pth
100.0%
Downloading: "https://download.pytorch.org/models/densenet201-c1103571.pth" to /Users/jkarimi91/.torch/models/densenet201-c1103571.pth
100.0%
Downloading: "https://download.pytorch.org/models/vgg11-bbd30ac9.pth" to /Users/jkarimi91/.torch/models/vgg11-bbd30ac9.pth
100.0%
Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /Users/jkarimi91/.torch/models/resnet101-5d3b4d8f.pth
100.0%
Downloading: "https://download.pytorch.org/models/squeezenet1_0-a815701f.pth" to /Users/jkarimi91/.torch/models/squeezenet1_0-a815701f.pth
100.0%
Downloading: "https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth" to /Users/jkarimi91/.torch/models/squeezenet1_1-f364aa15.pth
100.0%
Downloadin

max difference = 0.0021%


## Nope, `torch.save` really is optimal!

## So what does this tell us i.e. what is the answer to our original question?

# Answer: Our file sizes are already optimal. In order to further reduce the file size, we would have to use lower precsision weights or a smaller model altogether.