# Compute the number of floating point operations of a model

For models running in the hardware triggering system of a particle detector, the model latency and resource consumption is equally as important as the model accuracy. A reasonable trade-off between the two must therefore be made, often accomplished by iteratively compressing and synthesizing the model to get an accurate resource/latency estimate.

Since evaluating the DNN firmware of your algorithm is slightly out of the scope for this challenge (although we do encourage you to give it a try! If you have a Vivado license, have a look at the [hls4ml tutorials](https://github.com/fastmachinelearning/hls4ml-tutorial) and see what you get!), we will instead count the number of floating point operations (FLOPs)in the model, giving us a reasonable idea of the model size and hence resource consumption.

Three examples are provided: Using the Tensorflow graph, using the keras-flops tool and one back of the envelope calculation. The examples below are for Tensorflow Keras models and must be adapted if using other libraries.

This code is based on TensorFlow 2.3.1.

## Fetch the autoencoder

We'll use the fully connected dense neural network autoencoder for this demonstration

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Activation, Layer, ReLU, LeakyReLU
import tensorflow.keras.backend as K
import numpy as np
import os
import numpy as np
import h5py
import math
import os
import pathlib
import matplotlib.pyplot as plt
import matplotlib
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Concatenate, Dense, BatchNormalization, Activation, Layer, ReLU, LeakyReLU
from tensorflow.keras import backend as K
from tensorflow.keras import layers
from sklearn.metrics import roc_curve, auc
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
from keras.callbacks import ModelCheckpoint# build model
input_shape = 57
latent_dimension = 6
#num_nodes=[40,30,20]

#num_nodes=[25,20]
num_nodes=[25,20]

EPOCHS = 20
BATCH_SIZE = 512


class custom_func(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batches = tf.shape(z_mean)[0]
        dimension = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batches, dimension))
        return z_mean + tf.exp(-1*(z_log_var)*(z_log_var)) * epsilon + tf.exp(-1*z_mean) * epsilon


inputArray = keras.Input(shape=(57))
x = Dense(num_nodes[0], activation='LeakyReLU',use_bias=False)(inputArray)
x = Dense(num_nodes[1], activation='LeakyReLU',use_bias=False)(x)

z_mean_1 = layers.Dense(latent_dimension, activation='ReLU', name="z_mean_1")(x)
z_log_var = layers.Dense(latent_dimension, activation='ReLU', name="z_log_var")(x)
z_1 = custom_func()([z_mean_1, z_log_var])

bottle_neck = Dense(latent_dimension, activation='LeakyReLU',use_bias=False)(z_1)

x = Dense(num_nodes[1], activation='LeakyReLU',use_bias=False)(bottle_neck)
x = Dense(num_nodes[0], activation='LeakyReLU',use_bias=False)(x)

decoder = Dense(input_shape)(x)
#create autoencoder
autoencoder = Model(inputs = inputArray, outputs=decoder)
autoencoder.summary()
autoencoder.compile(optimizer = tf.keras.optimizers.Adam(), loss='mse')

autoencoder.save('ae.h5')


2022-11-08 13:31:57.902632: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-08 13:31:58.100608: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-08 13:31:58.731261: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/amdesai/HEP-Softwares/marty-public/install/lib:/home/amdesai/HEP-Softwares/ROOT/install/lib:/home/amdesai/HEP-Softwares/fastjet-install/lib:/home/amdesai/HEP-Softwares/lhapdf_install/lib:/home/amdesai/HEP-S

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 57)]         0           []                               
                                                                                                  
 dense (Dense)                  (None, 25)           1425        ['input_1[0][0]']                
                                                                                                  
 dense_1 (Dense)                (None, 20)           500         ['dense[0][0]']                  
                                                                                                  
 z_mean_1 (Dense)               (None, 6)            126         ['dense_1[0][0]']                
                                                                                              

2022-11-08 13:32:00.259763: E tensorflow/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-11-08 13:32:00.259802: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: amdesai
2022-11-08 13:32:00.259808: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: amdesai
2022-11-08 13:32:00.259905: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 515.65.7
2022-11-08 13:32:00.259924: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.65.7
2022-11-08 13:32:00.259928: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 515.65.7
2022-11-08 13:32:00.260173: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in pe


## Example 1: Using the TF graph
Use the TF graph to profile the model and get the total number of floating point ops:

In [2]:
def get_flops():
    session = tf.compat.v1.Session()
    graph = tf.compat.v1.get_default_graph()

    with graph.as_default():
        with session.as_default():
            model = tf.keras.models.load_model('ae.h5')
            
            run_meta = tf.compat.v1.RunMetadata()
            opts = tf.compat.v1.profiler.ProfileOptionBuilder.float_operation()

            # Optional: save printed results to file
            # flops_log_path = os.path.join(tempfile.gettempdir(), 'tf_flops_log.txt')
            # opts['output'] = 'file:outfile={}'.format(flops_log_path)

            # We use the Keras session graph in the call to the profiler.
            flops = tf.compat.v1.profiler.profile(graph=graph,
                                                  run_meta=run_meta, cmd='op', options=opts)

    tf.compat.v1.reset_default_graph()

    return flops.total_float_ops


print('TF Profile: Total number of FLOPs =  {}'.format(get_flops()))
# Profile:
# node name | # float_ops
# Mul                      2.02k float_ops (100.00%, 49.95%)
# Add                      2.02k float_ops (50.05%, 49.93%)
# Sub                          5 float_ops (0.12%, 0.12%)

ValueError: Unknown layer: custom_func. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.

So this model has 4,054 floating point operations. Check your terminal for some more detailed per-layer information. If your model is a Keras/TensorFlow model we recommend using this way of estimating the FLOPs.

However, if you are for some reason forced to compute it by hand, you can find an example below.

## Example 2: Doing a back of the envelope calculation

Below you can find an example of how to compute the FLOPs of a linear/conv2D layer (based on [keras-Opcounter](https://github.com/kentaroy47/keras-Opcounter)), not taking the activations into account. One multiply-and-accumulate (MAC) operation is counted as 2 FLOPs, and one ADD is counted as one FLOP.

In [3]:
def count_linear(layers):
    MAC = layers.output_shape[1] * layers.input_shape[1]
    if layers.get_config()["use_bias"]:
        ADD = layers.output_shape[1]
    else:
        ADD = 0
    return MAC*2 + ADD

def count_conv2d(layers, log = False):
    if log:
        print(layers.get_config())

    numshifts = int(layers.output_shape[1] * layers.output_shape[2])
    
    MACperConv = layers.get_config()["kernel_size"][0] * layers.get_config()["kernel_size"][1] * layers.input_shape[3] * layers.output_shape[3]
    
    if layers.get_config()["use_bias"]:
        ADD = layers.output_shape[3]
    else:
        ADD = 0
        
    return MACperConv * numshifts * 2 + ADD

def profile(model, log = False):

    layer_name = []
    layer_flops = []
    inshape = []
    weights = []

    for layer in model.layers:
        if "act" in layer.get_config()["name"]:
          print ("Skipping ativation functions")
           
        elif "dense" in layer.get_config()["name"] or "fc" in layer.get_config()["name"]:
            layer_flops.append(count_linear(layer))
            layer_name.append(layer.get_config()["name"])
            inshape.append(layer.input_shape)
            weights.append(int(np.sum([K.count_params(p) for p in (layer.trainable_weights)])))
            
        elif "conv" in layer.get_config()["name"] and "pad" not in layer.get_config()["name"] and "bn" not in layer.get_config()["name"] and "relu" not in layer.get_config()["name"] and "concat" not in layer.get_config()["name"]:
            layer_flops.append(count_conv2d(layer,log))
            layer_name.append(layer.get_config()["name"])
            inshape.append(layer.input_shape)
            weights.append(int(np.sum([K.count_params(p) for p in (layer.trainable_weights)])))
            
        elif "res" in layer.get_config()["name"] and "branch" in layer.get_config()["name"]:
            layer_flops.append(count_conv2d(layer,log))
            layer_name.append(layer.get_config()["name"])
            inshape.append(layer.input_shape)
            weights.append(int(np.sum([K.count_params(p) for p in (layer.trainable_weights)])))
            
    return layer_name, layer_flops, inshape, weights

def doOPS(model):
  print("Counting number of FLOPs in model")

  layer_name, layer_flops, inshape, weights = profile(autoencoder)
  for name, flop, shape, weight in zip(layer_name, layer_flops, inshape, weights):
      print("layer:", name, shape, " FLOPs:", flop, "Weights:", weight)
  totalFlops = sum(layer_flops)
  print("By hand: Total number of FLOPs = {}".format(totalFlops) )

In [4]:
totalGFlops = doOPS(autoencoder)

Counting number of FLOPs in model
layer: dense (None, 57)  FLOPs: 2850 Weights: 1425
layer: dense_1 (None, 25)  FLOPs: 1000 Weights: 500
layer: dense_2 (None, 6)  FLOPs: 72 Weights: 36
layer: dense_3 (None, 6)  FLOPs: 240 Weights: 120
layer: dense_4 (None, 20)  FLOPs: 1000 Weights: 500
layer: dense_5 (None, 25)  FLOPs: 2907 Weights: 1482
By hand: Total number of FLOPs = 8069


With this back-of-the envelope calculation, there is some difference between this estimate and the one above albeit relatively small. We will therefor prioritize the number returned by tf profile when evaluating contributions, but whenever this is not possible we'll do a double check.

## Example 3: Using the keras-flops tool

Another minimal-code example one can use, and which is also built on top of tf.profile, is the library [keras-flops](https://pypi.org/project/keras-flops/). This library supports dense, convolutional and pooling layers. Let's give it a try too:


In [None]:
# Install keras-flops?
#!pip install keras-flops

In [None]:
from keras_flops import get_flops

# Let's load the model again so we have a clean graph
model = tf.keras.models.load_model('ae.h5')
    
# Compute FLOPs
flops = get_flops(autoencoder, batch_size=1)
print("keras-flops: Total number of FLOPs = {} ".format(flops))
# FLOPS: 4.1e-06 G
# _TFProfRoot (--/4.11k flops)
#   functional_1/dense/MatMul (1.82k/1.82k flops)
#   functional_1/dense_4/MatMul (1.82k/1.82k flops)
#   functional_1/dense_3/MatMul (256/256 flops)
#   functional_1/dense_1/MatMul (96/96 flops)
#   functional_1/dense_4/BiasAdd (57/57 flops)
#   functional_1/dense_2/MatMul (48/48 flops)