# Custom Sparsification and Pruning Routines Using IDAES Sparsification and Pruning Utils

The IDAES toolkit provides utility functions which allow the user to sparsify and prune Keras sequential neural networks (NNs). Sparsification is the process of setting a desired percentage of weights in each layer to zero. Sparse NNs are desired because they lower the inference time, allowing for faster prediction times. Pruning is the process of removing inactive nodes - nodes which do not contribute to the output - from the network entirely. Similar to sparsification, pruning will lower the inference time of NNs and will also decrease the size of the NN.

## Initial Dependencies
To begin the necessary libraries are imported for a simple NN training workflow and the IDAES utilities are imported:

In [1]:
# Import libraries used to train and deploy NNs
import numpy as np
import pandas as pd
from math import prod
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.optimizers import Adamax

# Import IDAES utility functions
from prune import prune_sequential
from sparsify import count_N_zero_weights, sparsify_sequential

## Building a Neural Network
As a simple example data will be generated to train a NN which takes three inputs and returns one output. For this example the following non-linear model will be used:
$$y = x_{1} * (x_{2} + x_{3})$$

### Data Generation
A uniform data set of a million data points is generated for the input vector $x$ and the output $y$ is calculated. After, the dataframe is obtained and separated into input and output dataframes. Typically a train/test split would be conducted when training a real NN, but for this example it is not necesarry as the accuracy and degree of overfitting of the neural network is not important.

In [2]:
def get_data():
    df = pd.DataFrame(np.random.uniform(0, 1, size=(10**6, 3)), columns=['x1', 'x2', 'x3'])
    df['y'] =  df['x1'] * (df['x2'] + df['x3'])
    return df

# Get the data necessary for the NN and separate into inputs and outputs
df = get_data()
print(df.describe())

inputs = df[['x1', 'x2', 'x3']]
outputs = df['y']

                   x1            x2              x3               y
count  1000000.000000  1.000000e+06  1000000.000000  1000000.000000
mean         0.499951  4.997686e-01        0.499727        0.499638
std          0.288685  2.886080e-01        0.288788        0.372343
min          0.000002  8.299529e-07        0.000004        0.000002
25%          0.250168  2.495583e-01        0.249427        0.193802
50%          0.499694  4.998767e-01        0.499510        0.425727
75%          0.750144  7.496302e-01        0.749917        0.736170
max          0.999999  9.999992e-01        0.999999        1.996314


### Model Formulation
Currently, the IDAES utility functions support Keras Sequential models. For this example a relu model will be used with three hidden layers of 50 nodes:

In [3]:
# Define a model that will be used to predict the outputs. For this example three hidden layers of 50 are used.
def get_model():
    model = Sequential()
    model.add(Input(3))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(1))
    model.compile(optimizer=Adamax(learning_rate=0.1), loss='mse')
    return model

### Learning Rate Scheduling

Sparsification routines are typically run with learning rate schedules. The sparsification schedule should match the learning rate schedule to some degree. In this example the learning rate will decrease by 2% each epoch. Keras learning rate routines are created using callbacks as shown here: 

In [4]:
# Create a learning rate schedule for the neural network this reduces the learning rate by 2% per epoch
class LearningRateReducer(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        lr = self.model.optimizer.lr.read_value()
        lr = lr*0.98
        self.model.optimizer.lr.assign(lr)

After model formulation is complete the model and learning rate schedule objects can be instantiated for training:

In [5]:
# Build neural network and run initial training steps, if training has already occurred t0 should be set.
model = get_model()
lr_schedule = LearningRateReducer()

## Initial Training of the Neural Network and Sparsification Routine

A sparsification routine is run in tandem with training such that the sparsification schedule matches the learning rate schedule to a certain degree. In this example the sparsification routine follows the procedure proposed by [Zhu and Gupta 2017](https://arxiv.org/pdf/1710.01878.pdf):
$$s_{t} = s_{f} + (s_{i} - s_{f}) * (1 - \frac{t-t_{0}}{n\Delta t})^{3}$$

Where the sparsities $s_{i}$ is the initial sparsity, $s{t}$ is the sparsity after a sparsification step, and $s{f}$ is the desired final sparsification. Variable $t$ is the current training step, $\Delta t$ is the timesteps between sparsification steps, and $t_{0}$ represents the number of training steps conducted prior to running sparsification. Finally, $n$ is the number of sparsification steps desired. It should be noted that $t_{0}$, $\Delta t$, and $n$ are hyperparameters and can be adjusted to maintain accuracy during post-sparsification re-training. Parameters $t_{0}$ and $\Delta t$ are training parameters set prior to the initial training of a NN and $n$ can be set after the initial training of a NN.

For this example the initial training of the NN will train using 20 epochs with $\Delta t = 1000$ steps per epoch. The initial timestep $t_{0} = epochs * \Delta t$. Using these parameters we can train a neural network. By default a Keras epoch with $\Delta t = N_{x} / N_{batch}$ where $N_{x}$ is the number of data points and $N_{batch}$ is the batch size. This allows the NN to train over all data points in batches of size $N_{batch}$. For this routine we will specify the training steps per epoch by setting the steps_per_epoch argument.

In [6]:
# Define initial training information
epochs = 20
dt = 1000
t0 = epochs*dt

# Initial training of the model
model.fit(x=inputs, y=outputs, epochs=epochs, steps_per_epoch=dt, callbacks=[lr_schedule],
          verbose=1, validation_split=0.2, batch_size=32)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x189730075e0>

To compare the sparse model to the non-sparse model the keras evaluate function can be called to get a score for each model.

In [7]:
non_sparse_mse = model.evaluate(x=inputs, y=outputs, batch_size=64)
print(f"Initial model MSE on training data: {non_sparse_mse}")

Initial model MSE on training data: 0.00015240127686411142


Since initial training was conducted the timestep before sparsification occurs is $t = t_{0}$. For this example an arbitrary number of sparsification steps was chosen as 15 and a desired sparsity of 50% was chosen.

In [None]:
# Define sparsification parameters t0 = steps/dt from training
t = t0
n_steps = 15
sf = 0.5

To illustrate the effect of sparsification, a helper function is defined to count the total number of weights in the model. The sparsification util provides a helper function which counts the number of zero weights and was imported. Additionally, this function can be used to set the initial sparsity of the model in case there are weights equal to zero.

In [None]:
# Define a helper function to get the total number of weights in the neural net
def get_num_weights(model):

    # Get weights and biases
    w = model.get_weights()

    # Filter biases
    w = [w[2*i] for i in range(int(len(w)/2))]

    return sum([prod(l_w.shape) for l_w in w])

N_weights = get_num_weights(model)
N_zero_weights = count_N_zero_weights(model.get_weights())

print(f"Total weights in model: {N_weights}")
print(f"Total number of zero weights in model: {N_zero_weights}")

# Define initial sparsity
si = N_zero_weights/N_weights
print(f"Initial total model (not layer by layer) sparsity {si*100:.2f}%")

With the initial training completed the sparsification loop can be generated. The sparsification util function takes a model and sparsifies to a provided sparsity. After a sparsification step is completed the timestep is incremented by $\Delta t$ and the model is retrained. After sparsification is completed the model is retrained to reduce the loss of accuracy from sparsification. This training uses the sparse NN as the initial state and a reduced learning rate. Keras does not allow specific weights to be untrainable - only entire layers can be frozen- so the model is not retrained on the last sparsification step. Ideally, individual weights would be frozen such that the contribution from sparsified weights is shifted to non-sparsified weights.

In [None]:
# Create a custom sparsification loop
for n in range(1, n_steps + 1):

    # Sparsification schedule can be modified to be whatever is desired
    st = sf + (si - sf) * (1 - (t - t0) / (n * dt)) ** 3

    # Sparsify to desired sparsification value
    model = sparsify_sequential(model, st)

    # Update timestep
    t += dt

    # Retrain if not last step
    if n != n_steps:
        model.fit(x=inputs, y=outputs, epochs=1, steps_per_epoch=dt, callbacks=[lr_schedule],
                  verbose=1, validation_split=0.2, batch_size=64)


After sparsification the model error can be checked using the Keras evaluate function to compare to the full model accuracy,

In [None]:
sparse_mse = model.evaluate(x=inputs, y=outputs, batch_size=64)
print(f"Final model mse after sparsification: {sparse_mse}")
print(f"Error change after sparsification {non_sparse_mse - sparse_mse}")

In [None]:
total_w = get_num_weights(model)
zero_weights = count_N_zero_weights(model.get_weights())
print(f"Total Weights: {total_w}\nZero Weights:{zero_weights}\nSparsification:{zero_weights/total_w}")

In [None]:
pruned_model = prune_sequential(model, verbose=1)
new_total_w = get_num_weights(pruned_model)
print(f"Total Weights Pruned Model: {new_total_w}")
print(f"Total Reduction in Weights: {total_w - new_total_w} ({(total_w - new_total_w)/total_w*100:.2f}%)")

The total number of nodes removed can also be found using the following helper function:

In [None]:
def count_nodes(model):
    cfg = model.get_config()
    N_nodes = 0
    for i, layer_config in enumerate(cfg['layers']):
        if 'config' in layer_config and 'units' in layer_config['config']:
            N_nodes += layer_config['config']['units']
    return N_nodes

# Remove output layer
N_full = count_nodes(model) - 1
N_pruned = count_nodes(pruned_model) -1 

print(f"Initial Model Nodes: {N_full}")
print(f"Pruned Model Nodes: {N_pruned}")
print(f"Reduction: {N_full - N_pruned} ({(N_full - N_pruned)/N_full*100:.2f}%)")