# Memory Equivalent Capacity (MEC)

This notebook is used for calculation and documentation of the model MEC. 

The calculation of MEC in our model only looks at the classifier section. This means that the feature extraction from images performed by the CNN will be included. This means that the generalization of the model will be calculated through a comparison between the information content of the latent embedding

In [10]:
import tensorflow as tf
import tensorflow.keras as keras

In [11]:
model = tf.keras.applications.MobileNetV2(
    alpha=1.0,
)

In [12]:
model.layers[-1].input_shape, model.layers[-1].output_shape


((None, 1280), (None, 1000))

We see that the output features of the model is $1280$ which will be the number of input features for the linear model

The MEC of the model is defined as in section 7.2 in the book

In summary, derived four engineering rules to determine the Memory-equivalent
Capacity of a neural network:
1. The output of a single neuron yields maximally one bit of information.
2. The capacity of a single neuron is the number of its parameters (weights and threshold) in bits.
3. The total capacity $C_{tot}$ of $M$ neurons in parallel is $C_{tot} = \sum^M_{C_i}$ where $C_i$ is the capacity of each neuron.

4. For perceptrons in series (e.g., in subsequent layers), the capacity of a subsequent
layer cannot be larger than the output of the previous layer.

In [13]:
# Only works for dense layer
def mec_of_linear_layer(layer):
    assert isinstance(layer, keras.layers.Dense), "Only works for dense layers"
    # Zero is weights, 1 is bias
    in_shape, out_shape = layer.get_weights()[0].shape
    cap_neuron=out_shape
    if type(layer.bias)!=type(None):
        cap_neuron+=1
    return cap_neuron*in_shape

**Calculate MEC**

In [14]:
MEC = mec_of_linear_layer(model.layers[-1])
print(f"Model has {MEC} bits of capacity or {MEC*0.000122}kB or {MEC*(1.192e-7)} mB")

Model has 1281280 bits of capacity or 156.31616kB or 0.152728576 mB


### Width multiplier


<i><u>The original paper except</u></i>


Although the base MobileNet architecture is already small and low latency, many times a specific use case or application may require the model to be smaller and faster.
In order to construct these smaller and less computationally expensive models we introduce a very simple parameter $\alpha$ called width multiplier. The role of the width multiplier $\alpha$ is to thin a network uniformly at each layer. For a given layer and width multiplier $\alpha$, the number of input channels $M$ becomes $\alpha M$ and the number of output channels $N$ becomes $\alpha N$.

The computational cost of a depthwise separable convolution with width multiplier $\alpha$ is:

$D_K\cdot D_K\cdot \alpha M \cdot D_F\cdot D_F\cdot + \alpha M \cdot \alpha N \cdot D_F \cdot D_F$

where $\alpha \in (0, 1]$ with typical settings of 1, 0.75, 0.5 and
0.25. $\alpha = 1$ is the baseline MobileNet and $\alpha < 1$ are
reduced MobileNets. Width multiplier has the effect of reducing computational cost and the number of parameters quadratically by roughly $\alpha^2$. Width multiplier can be applied to any model structure to define a new smaller model with a reasonable accuracy, latency and size trade off. It is used to define a new reduced structure that needs to be trained from scratch.




In [15]:
import plotly.express as px
import pandas as pd

In [16]:
data =[[alpha, mec_of_linear_layer(tf.keras.applications.MobileNetV2(alpha=alpha).layers[-1])] for alpha in [0.35,0.5,0.75,1.0,1.3,1.4]]

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_0.35_224.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_0.5_224.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_0.75_224.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.3_224.h5
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.4_224.h5


In [17]:
df = pd.DataFrame(data, columns=["alpha","mec"])

In [25]:
px.line(df, x="alpha",y="mec", title="MEC as function of Alpha")

### Resolution Multiplier: Reduced Representation (NOT IN V2)


<i><u>The original paper except</u></i>


he second hyper-parameter to reduce the computational
cost of a neural network is a resolution multiplier $\rho$. We apply this to the input image and the internal representation of every layer is subsequently reduced by the same multiplier. In practice we implicitly set ρ by setting the input resolution. We can now express the computational cost for the core layers of our network as depthwise separable convolutions with width multiplier α and resolution multiplier $\rho$:



The computational cost of a depthwise separable convolution with width multiplier $\alpha$ is:

$D_K\cdot D_K\cdot \alpha M \cdot \rho D_F\cdot \rho D_F\cdot + \alpha M \cdot \alpha N \cdot \rho D_F \cdot \rho D_F$

where $\rho \in (0, 1]$ which is typically set implicitly so that
the input resolution of the network is 224, 192, 160 or 128. $\rho = 1$ is the baseline MobileNet and $\rho < 1$ are reduced computation MobileNets. Resolution multiplier has the effect of reducing computational cost by $\rho^2$.


