# Architectural choices in computer vision and their impact on energy consumption

<a href="https://colab.research.google.com/drive/1G3tP5kLD1MUjdVVOpg3vDWeqpI0Q03gh" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
</a>

Return to the [Castle](https://github.com/Nkluge-correa/TeenyTinyCastle)

Convolutional neural networks (CNNs) have led to massive achievements and progress in Computer Vision. However, these networks come at a cost, given that training them can be relatively costly regarding resources like energy consumption. When you have a massive dataset -which is necessary when you aim to achieve extremely high performance - training such models results in substantial training times, which in the end may equate to non-trivial carbon emissions tied to the development of a CV model.

The carbon emissions associated with training CNNs can vary depending on factors such as the network's size, the training process's length, and the energy efficiency of the hardware used. Currently, efforts are being made to develop more energy-efficient hardware and optimize training algorithms to reduce carbon emissions associated with training CNNs for computer vision. Something that is usually debated under the umbrella term of "_Sustaineble AI_".

<img src="https://lh5.googleusercontent.com/prAfirs8L4UBqWWkX9dPoAEQwYHIJ0CLR9sUDNSrRMC44R3vXaQfGFycHjq68rT6Z5_B6sJJ9jlQmhun0adWX2BBfVFr6BZ8OFTXQskjPNqTBkPfl5ysmdMinxn7CPEgkGoXL1hT=s0" alt="image" width="600">

Source: [The Imperative for Sustainable AI Systems](https://thegradient.pub/sustainable-ai/).

> For more information on the matter (Sustainable Computer Vision), we recommend "_[Highlighting the Importance of Reducing Research Bias and Carbon Emissions in CNNs](https://arxiv.org/abs/2106.03242)_".

For this tutorial, we want to compare different CNN architectural choices. More specifically, we want to compare the energy consumption and carbon emissions generated by networks that use convolutional layers: `Conv2D` and `SeparableConv2D`.

In other words, we want to compare how architectural choices can impact the carbon footprint related to training such models. We will be using [`CodeCarbon`](https://github.com/mlco2/codecarbon) for measuring our consumption and emissions, as already demonstrated in [this notebook](https://github.com/Nkluge-correa/TeenyTinyCastle/blob/master/ML-Accountability/CO2-Emission-tracking/emission_tracker.ipynb).

But first, let us understand wht is the difference between traditional convolutions (`Conv2D`) and depthwise separable convolutions (`SeparableConv2D`).

## `Conv2D` versus `SeparableConv2D`

Imagine you are a traditional convolutional layer trained on 15x15x3 pixel images. A forward pass on this layer will require more than 45,000 multiplications per image. Spatially separable convolutions help solve this problem. They are convolutions that can be separated across their spatial axis, meaning that one large convolution (e.g., the original `Conv2D` layer) can be split into smaller ones that, when convolved sequentially, produce the same result. For example, on our 15x15x3 pixel image, we would only require around 9,000 multiplications for the same result: an 80% decrease in multiplication operations!

One gain of performing this sequential way to perform convolutions is a decrease in the number of multiplications. Less multiplication -> less computation -> less energy consumption.

> **Note: To learn more about traditional convolutions and separable convolutions, we recommend the [following explanation](https://machinecurve.com/index.php/2019/09/23/understanding-separable-convolutions#how-many-multiplications-do-we-save). We also recommend the original article, "_[Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357)_", where depthwise separable convolutions where introduced.**

In principle, depthwise separable convolutional layers may significantly optimize your model's energy efficiency while preserving its performance. Let us test this. First, we create a standard convolutional network using the original `Conv2D` layer. Our model uses the  `Sequential`  API provided by Keras and stacks all layers on top of each other. We employ `Conv2D`  twice, followed by Max Pooling and Dropout, before we flatten the abstract feature map and classify the data using densely connected layers. Our loss function is categorical cross-entropy, and the optimizer is [Adam](https://paperswithcode.com/method/adam).

For this tutorial, we will use the MNIST dataset to train our model, which we can load straight from TensorFlow.

In [1]:
import tensorflow as tf

# Model configuration
img_width, img_height = 28, 28
batch_size = 250
no_epochs = 25
no_classes = 10
validation_split = 0.2
verbosity = 1

# Load MNIST dataset
(input_train, target_train), (input_test, target_test) = tf.keras.datasets.mnist.load_data()

# Reshape the data
input_train = input_train.reshape(input_train.shape[0], img_width, img_height, 1)
input_test = input_test.reshape(input_test.shape[0], img_width, img_height, 1)
input_shape = (img_width, img_height, 1)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Scale data
input_train = input_train / 255
input_test = input_test / 255

# Convert target vectors to categorical targets
target_train = tf.keras.utils.to_categorical(target_train, no_classes)
target_test = tf.keras.utils.to_categorical(target_test, no_classes)

# Create a CNN using `Conv2D` layers
model_Conv2D = tf.keras.models.Sequential()
model_Conv2D.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model_Conv2D.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model_Conv2D.add(tf.keras.layers.Dropout(0.25))
model_Conv2D.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'))
model_Conv2D.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model_Conv2D.add(tf.keras.layers.Dropout(0.25))
model_Conv2D.add(tf.keras.layers.Flatten())
model_Conv2D.add(tf.keras.layers.Dense(256, activation='relu'))
model_Conv2D.add(tf.keras.layers.Dense(no_classes, activation='softmax'))

# Compile the model
model_Conv2D.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

# Display a model summary
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")
model_Conv2D.summary()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Version:  2.15.0
Eager mode:  True
GPU is available
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2  (None, 13, 13, 32)        0         
 D)                                                              
                                                                 
 dropout (Dropout)           (None, 13, 13, 32)        0         
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 5, 5, 64)          0         
 g2D)                                       

In this tutorial, we will use the `EmissionsTracker` from CodeCarbon to track our experiments. To learm more on how to use CodeCarbon, visit this [tutorial](https://github.com/Nkluge-correa/TeenyTinyCastle/blob/master/ML-Accountability/CO2-Emission-tracking/emission_tracker.ipynb).


In [2]:
!pip install codecarbon -q

from codecarbon import EmissionsTracker

tracker = EmissionsTracker(
    project_name="Conv2D",
    log_level="critical",
    measure_power_secs=15,
    output_dir="./",
    output_file="emissions-conv2d.csv",
    tracking_mode='machine',
)

tracker.start()

# Train the model
model_Conv2D.fit(input_train, target_train,
          batch_size=batch_size,
          epochs=no_epochs,
          verbose=verbosity,
          validation_split=validation_split)

tracker.stop()

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m181.6/181.6 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.4/66.4 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[?25hEpoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


0.001056536112822383

Now, let us introduce the  `SeparableConv2D`  convolutional layer in our model. The layer is very similar to the traditional  `Conv2D`  layer and can be added to your model easily, given that it is already implemented in the library. However, it comes with some separation-specific configuration options that must be set before training is commenced. The [Keras documentation](https://keras.io/api/layers/convolution_layers/separable_convolution2d/)  defines the  `SeparableConv2D`  layer as follows:

```python
keras.layers.SeparableConv2D(
    filters,
    kernel_size,
    strides=(1, 1),
    padding="valid",
    data_format=None,
    dilation_rate=(1, 1),
    depth_multiplier=1,
    activation=None,
    use_bias=True,
    depthwise_initializer="glorot_uniform",
    pointwise_initializer="glorot_uniform",
    bias_initializer="zeros",
    depthwise_regularizer=None,
    pointwise_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    depthwise_constraint=None,
    pointwise_constraint=None,
    bias_constraint=None,
    **kwargs
)
```

Where:

-   **filters**: int, the dimensionality of the output space (i.e. the number of filters in the pointwise convolution).
-   **kernel_size**: int or tuple/list of 2 integers, specifying the size of the depthwise convolution window.
-   **strides**: int or tuple/list of 2 integers, specifying the stride length of the depthwise convolution. If only one int is specified, the same stride size will be used for all dimensions.  `strides > 1`  is incompatible with  `dilation_rate > 1`.
-   **padding**: string, either  `"valid"`  or  `"same"`  (case-insensitive).  `"valid"`  means no padding.  `"same"`  results in padding evenly to the left/right or up/down of the input. When  `padding="same"`  and  `strides=1`, the output has the same size as the input.
-   **data_format**: string, either  `"channels_last"`  or  `"channels_first"`. The ordering of the dimensions in the inputs.  `"channels_last"`  corresponds to inputs with shape  `(batch, height, width, channels)`  while  `"channels_first"`  corresponds to inputs with shape  `(batch, channels, height, width)`. It defaults to the  `image_data_format`  value found in your Keras config file at  `~/.keras/keras.json`. If you never set it, then it will be  `"channels_last"`.
-   **dilation_rate**: int or tuple/list of 2 integers, specifying the dilation rate to use for dilated convolution. If only one int is specified, the same dilation rate will be used for all dimensions.
-   **depth_multiplier**: The number of depthwise convolution output channels for each input channel. The total number of depthwise convolution output channels will be equal to  `input_channel * depth_multiplier`.
-   **activation**: Activation function. If  `None`, no activation is applied.
-   **use_bias**: bool, if  `True`, bias will be added to the output.
-   **depthwise_initializer**: An initializer for the depthwise convolution kernel. If None, then the default initializer (`"glorot_uniform"`) will be used.
-   **pointwise_initializer**: An initializer for the pointwise convolution kernel. If None, then the default initializer (`"glorot_uniform"`) will be used.
-   **bias_initializer**: An initializer for the bias vector. If None, the default initializer ('"zeros"') will be used.
-   **depthwise_regularizer**: Optional regularizer for the depthwise convolution kernel.
-   **pointwise_regularizer**: Optional regularizer for the pointwise convolution kernel.
-   **bias_regularizer**: Optional regularizer for the bias vector.
-   **activity_regularizer**: Optional regularizer function for the output.
-   **depthwise_constraint**: Optional projection function to be applied to the depthwise kernel after being updated by an  `Optimizer`  (e.g. used for norm constraints or value constraints for layer weights). The function must take as input the unprojected variable and must return the projected variable (which must have the same shape).
-   **pointwise_constraint**: Optional projection function to be applied to the pointwise kernel after being updated by an  `Optimizer`.
-   **bias_constraint**: Optional projection function to be applied to the bias after being updated by an  `Optimizer`.

Now that we understand how to set a depthwise separable convolutional layer in Keras, we can adapt our CNN from above to use depthwise separable convolutions, a.k.a., replace `Conv2D`  with  `SeparableConv2D`  and add the extra configuration that we need.

In [3]:
# Create a CNN using `SeparableConv2D` layers
model_SeparableConv2D = tf.keras.models.Sequential()
model_SeparableConv2D.add(tf.keras.layers.SeparableConv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model_SeparableConv2D.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model_SeparableConv2D.add(tf.keras.layers.Dropout(0.25))
model_SeparableConv2D.add(tf.keras.layers.SeparableConv2D(64, kernel_size=(3, 3), activation='relu'))
model_SeparableConv2D.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model_SeparableConv2D.add(tf.keras.layers.Dropout(0.25))
model_SeparableConv2D.add(tf.keras.layers.Flatten())
model_SeparableConv2D.add(tf.keras.layers.Dense(256, activation='relu'))
model_SeparableConv2D.add(tf.keras.layers.Dense(no_classes, activation='softmax'))

# Compile the model
model_SeparableConv2D.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

# Display a model summary
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")
model_SeparableConv2D.summary()

print(f"This model has {model_Conv2D.count_params() - model_SeparableConv2D.count_params()} parameters less!")

Version:  2.15.0
Eager mode:  True
GPU is available
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 separable_conv2d (Separabl  (None, 26, 26, 32)        73        
 eConv2D)                                                        
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 13, 13, 32)        0         
 g2D)                                                            
                                                                 
 dropout_2 (Dropout)         (None, 13, 13, 32)        0         
                                                                 
 separable_conv2d_1 (Separa  (None, 11, 11, 64)        2400      
 bleConv2D)                                                      
                                                                 
 max_pooling2d_3 (MaxPoolin  (None, 5, 5, 64)          0         
 g

Again, we use the `EmissionsTracker` from CodeCarbon to track our experiments.

In [4]:
tracker = EmissionsTracker(
    project_name="SeparableConv2D",
    log_level="critical",
    measure_power_secs=15,
    output_dir="./",
    output_file="emissions-separableconv2d.csv",
    tracking_mode='machine',
)

tracker.start()

# Train the model
model_SeparableConv2D.fit(input_train, target_train,
          batch_size=batch_size,
          epochs=no_epochs,
          verbose=verbosity,
          validation_split=validation_split)

tracker.stop()

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


0.0005982410172611527

### Traditional vs. Depthwise separable CNN: energy consumption comparison

Now is the time for us to compare the performance of our two networks and their energy consumption and estimated emissions.

In [6]:
perf_model_Conv2D

[0.023584142327308655, 0.9930999875068665]

In [13]:
import pandas as pd
from IPython.display import Markdown

# Evaluate the models
loss_Conv2D, acc_Conv2D = model_Conv2D.evaluate(input_test, target_test, verbose=0)
loss_SeparableConv2D, acc_SeparableConv2D = model_SeparableConv2D.evaluate(input_test, target_test, verbose=0)

# Read the generated emissions data
emissions_conv2d = pd.read_csv("emissions-conv2d.csv")
emissions_separableconv2d = pd.read_csv("emissions-separableconv2d.csv")

# Create a dataframe with the combined results
emissions_conv2d['accuracy'] = acc_Conv2D
emissions_conv2d['loss'] = loss_Conv2D
emissions_separableconv2d['accuracy'] = acc_SeparableConv2D
emissions_separableconv2d['loss'] = loss_SeparableConv2D

emissions_conv2d.index = ['Conv2D']
emissions_conv2d = emissions_conv2d[['accuracy','loss', 'duration','energy_consumed',
                    'emissions','emissions_rate']]

emissions_separableconv2d.index = ['SeparableConv2D']
emissions_separableconv2d = emissions_separableconv2d[['accuracy', 'loss', 'duration','energy_consumed',
                    'emissions','emissions_rate']]

# Concatenate the dataframes and display the results
emissions_report = pd.concat([emissions_conv2d, emissions_separableconv2d])
display(Markdown(emissions_report.transpose().to_markdown()))

|                 |       Conv2D |   SeparableConv2D |
|:----------------|-------------:|------------------:|
| accuracy        |  0.9931      |       0.9895      |
| loss            |  0.0235841   |       0.0362006   |
| duration        | 83.0816      |      40.3688      |
| energy_consumed |  0.00216531  |       0.00122606  |
| emissions       |  0.00105654  |       0.000598241 |
| emissions_rate  |  1.27168e-05 |       1.48194e-05 |

In terms of accuracy, Conv2D marginally outperforms SeparableConv2D. However, when considering energy efficiency metrics, SeparableConv2D exhibits better results. It consumes significantly less energy and emits fewer carbon emissions (an approximate 40% decrease). Also, the experiments' duration is influenced by the type of letter we are using, with the network implementing SeparableConv2D taking half the time to train. These findings highlight a tradeoff between model accuracy and energy efficiency, where Conv2D excels in accuracy but consumes more energy and emits more carbon compared to SeparableConv2D. The choice between the two models depends on the specific priorities and constraints of the application, emphasizing the importance of considering both performance and energy efficiency in model selection.

For example:

> **_How do we deal with the tradeoff between accuracy and sustainability if low accuracy means that children with pneumonia will likely receive false negative results?_**

We do not have answers to these questions, but the CV field is working to optimize our current techniques to reduce their environmental footprint. But the point here is that, when discussing sustainability in certain areas of application, values sometimes collide: intergenerational justice (to be fair to those who haven't yet arrived) and beneficence/non-maleficence. What would you choose? 🤔

---

Return to the [castle](https://github.com/Nkluge-correa/TeenyTinyCastle).