# Pruning with our automatic structured Pruning framework
Welcome to an end-to-end example for magnitude-based weight pruning

**Summary**

In this tutorial, you will:

* Train a tf.keras model for CIFAR10 from scratch.
* Fine tune the model by applying the pruning Framework and see the accuracy.

If you want to execute this notebook in Google Colab, uncomment the code below.

In [1]:
import sys

if 'google.colab' in sys.modules:
    !git clone https://github.com/Hahn-Schickard/Automatic-Structured-Pruning
    !echo $CWD
    sys.path.append("Automatic-Structured-Pruning/src")
else:
    sys.path.append("../src")
    
import pruning

## Train a model for CIFAR10 without pruning
Download and prepare the CIFAR10 dataset.
The CIFAR10 dataset contains 60,000 color images in 10 classes, with 6,000 images in each class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes are mutually exclusive and there is no overlap between them.

Create the convolutional base
The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.

As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.

To complete our model, you will feed the last output tensor from the convolutional base (of shape (4, 4, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, you will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. CIFAR has 10 output classes, so you use a final Dense layer with 10 outputs and a softmax activation.

In [1]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models


(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dropout(0.25))
model.add(layers.Dense(10, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 256)               0

Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.

## Compile and train the model

In [2]:
comp = {
"optimizer":'adam',
"loss": tf.keras.losses.SparseCategoricalCrossentropy(),
"metrics": ['accuracy']}

model.compile(**comp)
callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)]

model.fit(train_images, train_labels, validation_split=0.2, epochs=1, batch_size=128, callbacks=callbacks)

model_test_loss, model_test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f"Model accuracy after Training: {model_test_acc*100:.2f}%")

313/313 - 1s - loss: 1.6066 - accuracy: 0.4124
Model accuracy after Training: 41.24%


## Fine-tune pre-trained model with pruning
You will apply pruning to the whole model and see this in the model summary.

In this example, you prune the model with 30% dense pruning and 40% filter pruning.

In [3]:
import asp as pruning

In [7]:
dir(pruning)

['Model',
 'NamedTuple',
 'NetStructure',
 'Sequential',
 'ThresholdCallback',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'accuracy_pruning',
 'build_pruned_model',
 'copy',
 'delete_dense_neuron',
 'delete_filter',
 'factor_pruning',
 'get_filter_to_prune_avarage',
 'get_filter_to_prune_l2',
 'get_last_layer_with_params',
 'get_layer_shape_conv',
 'get_layer_shape_dense',
 'get_neurons_to_prune_l1',
 'get_neurons_to_prune_l2',
 'load_model',
 'load_model_param',
 'model_pruning',
 'np',
 'os',
 'prun_filters_conv',
 'prun_neurons_dense',
 'pruning',
 'pruning_helper_classes',
 'pruning_helper_functions',
 'pruning_helper_functions_conv',
 'pruning_helper_functions_dense',
 'stepwise_accuracy_pruning',
 'stepwise_factor_pruning',
 'tf',
 'train_test_split']

In [4]:
dense_prune_rate=40
conv_prune_rate=40
pruned_model=pruning.factor_pruning(model, dense_prune_rate, conv_prune_rate,'L1')

NameError: name 'load_model_param' is not defined

We see how we get less parameter in the pruned model.

## Compile and re-train the model

In [5]:
pruned_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 20)        560       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 20)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 39)        7059      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 39)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 39)          13728     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 39)          0         
_________________________________________________________________
flatten (Flatten)            (None, 156)               0

In [6]:
comp = {
"optimizer":'adam',
"loss": tf.keras.losses.SparseCategoricalCrossentropy(),
"metrics": ['accuracy']}

pruned_model.compile(**comp)

pruned_model.fit(train_images, train_labels, epochs=10, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x202455a16d0>

Compare both models

In [7]:
model_test_loss, model_test_acc = model.evaluate(test_images,  test_labels, verbose=2)
pruned_model_test_loss, pruned_model_test_acc = pruned_model.evaluate(test_images,  test_labels, verbose=2)

print(f"Model accuracy before pruning: {model_test_acc*100:.2f}%")
print(f"Model accuracy after pruning: {pruned_model_test_acc*100:.2f}%")

313/313 - 1s - loss: 0.9143 - accuracy: 0.6925
313/313 - 1s - loss: 0.9476 - accuracy: 0.6848
Model accuracy before pruning: 69.25%
Model accuracy after pruning: 68.48%


In [8]:
print(f"Total number of parameters before pruning: {model.count_params()}")
print(f"Total number of parameters after pruning: {pruned_model.count_params()}")
print(f"Pruned model contains only {(pruned_model.count_params()/model.count_params())*100:.2f}% of the original number of parameters.")

Total number of parameters before pruning: 75178
Total number of parameters after pruning: 28480
Pruned model contains only 37.88% of the original number of parameters.


# Prune a model to a maximum accuracy loss

We define the arguments to compile the model. In this case, we only want to have an accuracy loss of 3%.
In this example we have loaded the data directly from a TensorFlow dataset. Therefore, we do not have a defined dataloader (path or file). However, the structure of the training data is the same as the data we would read from a Python file. Therefore, we use a trick here and pass an existing FILE from the current folder as the dataloader path. This way the correct functions will be executed afterwards and no error will be issued.

In [9]:
comp = {
  "optimizer": 'adam',
  "loss": tf.keras.losses.SparseCategoricalCrossentropy(),
  "metrics": 'accuracy'
}

auto_model = pruning.accuracy_pruning(model, comp, train_images, train_labels, test_images,
                                     test_labels, pruning_acc=None, max_acc_loss=3,
                                     label_one_hot=False)

Start model accuracy: 69.25 %
Minimum required model accuracy: 66.25 %
Next pruning factors: 5
Finish with pruning
Before pruning:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 64)          0 

In [10]:
auto_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 16)        448       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 31)        4495      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 31)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 31)          8680      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 31)          0         
_________________________________________________________________
flatten (Flatten)            (None, 124)               0

Compare both models

In [11]:
model_test_loss, model_test_acc = model.evaluate(test_images,  test_labels, verbose=2)
auto_model_test_loss, auto_model_test_acc = auto_model.evaluate(test_images,  test_labels, verbose=2)

print(f"Model accuracy before pruning: {model_test_acc*100:.2f}%")
print(f"Model accuracy after pruning: {auto_model_test_acc*100:.2f}%")

313/313 - 1s - loss: 0.9143 - accuracy: 0.6925
313/313 - 1s - loss: 0.9854 - accuracy: 0.6630
Model accuracy before pruning: 69.25%
Model accuracy after pruning: 66.30%


In [12]:
print(f"Total number of parameters before pruning: {model.count_params()}")
print(f"Total number of parameters after pruning: {auto_model.count_params()}")
print(f"Pruned model contains only {(auto_model.count_params()/model.count_params())*100:.2f}% of the original number of parameters.")

Total number of parameters before pruning: 75178
Total number of parameters after pruning: 18180
Pruned model contains only 24.18% of the original number of parameters.


In [13]:
step_factor_model = pruning.stepwise_factor_pruning(model, train_images, train_labels, test_images,
                               test_labels, prun_factor_dense=10, prun_factor_conv=10,
                               num_steps=10, comp=comp)

pruning step: 1/10
Finish with pruning
Before pruning:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 64)          0         
_________________________________________________________________
f

In [14]:
model_test_loss, model_test_acc = model.evaluate(test_images,  test_labels, verbose=2)
step_factor_model_test_loss, step_factor_model_test_acc = step_factor_model.evaluate(test_images,  test_labels, verbose=2)

print(f"Model accuracy before pruning: {model_test_acc*100:.2f}%")
print(f"Model accuracy after pruning: {step_factor_model_test_acc*100:.2f}%")

313/313 - 1s - loss: 0.9143 - accuracy: 0.6925
313/313 - 1s - loss: 1.0082 - accuracy: 0.6604
Model accuracy before pruning: 69.25%
Model accuracy after pruning: 66.04%


In [15]:
print(f"Total number of parameters before pruning: {model.count_params()}")
print(f"Total number of parameters after pruning: {step_factor_model.count_params()}")
print(f"Pruned model contains only {(step_factor_model.count_params()/model.count_params())*100:.2f}% of the original number of parameters.")

Total number of parameters before pruning: 75178
Total number of parameters after pruning: 12545
Pruned model contains only 16.69% of the original number of parameters.


In [16]:
step_acc_model = pruning.stepwise_accuracy_pruning(model, train_images, train_labels, test_images,
                            test_labels, pruning_acc=None, max_acc_loss=3,
                            prun_factor_dense=5, prun_factor_conv=5, 
                            metric='L1', comp=comp, label_one_hot=False)

Start model accuracy: 69.25 %
Minimum required model accuracy: 66.25 %
prun_factor_dense: 5 %
prun_factor_conv: 5 %
pruning step: 1
Finish with pruning
Before pruning:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
max_pooling2d_2 (Max

In [20]:
step_acc_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 15)        420       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 15)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 27)        3672      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 27)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 27)          6588      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 2, 2, 27)          0         
_________________________________________________________________
flatten (Flatten)            (None, 108)               0

In [19]:
model_test_loss, model_test_acc = model.evaluate(test_images,  test_labels, verbose=2)
step_acc_model_test_loss, step_acc_model_test_acc = step_acc_model.evaluate(test_images,  test_labels, verbose=2)

print(f"Model accuracy before pruning: {model_test_acc*100:.2f}%")
print(f"Model accuracy after pruning: {step_acc_model_test_acc*100:.2f}%")

313/313 - 1s - loss: 0.9143 - accuracy: 0.6925
313/313 - 1s - loss: 0.9983 - accuracy: 0.6668
Model accuracy before pruning: 69.25%
Model accuracy after pruning: 66.68%


In [18]:
print(f"Total number of parameters before pruning: {model.count_params()}")
print(f"Total number of parameters after pruning: {step_acc_model.count_params()}")
print(f"Pruned model contains only {(step_acc_model.count_params()/model.count_params())*100:.2f}% of the original number of parameters.")

Total number of parameters before pruning: 75178
Total number of parameters after pruning: 14203
Pruned model contains only 18.89% of the original number of parameters.
