# DenseNet and CSPNet

**Author:** Varchita Lalwani<br>
**Date created:** 27 June 2021<br>
**Last modified:** 27 June 2021<br>
**Description:** Training DenseNet, CSPNet and variants on Cifar10 Dataset.

In [0]:
!densenet_cspnet_variants.ipynb
!
!Automatically generated by Colaboratory.
!
!Original file is located at
!    https://colab.research.google.com/drive/1ejnjgRzfo60FWXMp8_HyzQJ_5kBLIdwd
!
!##DenseNet
!
!Densely Connected Convolutional Networks: - With dense connection, fewer parameters and high accuracy are achieved compared with ResNet and Pre-Activation ResNet. In Standard ConvNet, input image goes through multiple convolution and obtain high-level features. In ResNet, identity mapping is proposed to promote the gradient propagation. Element-wise addition is used. An advantage of ResNets is that the gradient can flow directly through the identity function from later layers to the earlier layers. However, the identity function and the output of layers are combined by summation, which may impede the information flow in the network. To further improve the information
!flow between layers author of paper propose a different connectivity
!pattern: They introduce direct connections from any layer to all subsequent layers. Each layer has access to all the preceding feature-maps in its block and, therefore, to the network's "collective knowledge". Since each layer receives feature maps from all preceding layers, network can be thinner and compact, i.e. number of channels can be fewer. The growth rate k is the additional number of channels for each layer. So, it have higher computational efficiency and memory efficiency.
!
!####Bottleneck Layers
!A 1Ã—1 convolution can be introduced as bottleneck layer before each 3Ã—3 convolution to reduce the number of input feature-maps, and thus to improve computational efficiency.
!
!So, dense block will have convolution with kernel size 1 followed by convolution with kernel size 3.
!
!DenseNet-B will have BN-ReLU-Conv(1Ã—1) BN-ReLU-Conv(3Ã—3).
!
!####Composite function
!Three consecutive operations: batch normalization (BN), followed by a rectified linear unit (ReLU) and a 3 Ã— 3 convolution (Conv).
!
!####Pooling Layers
!The concatenation operation used is not viable when the size of feature-maps changes. However, an essential part of convolutional networks is down-sampling layers that change the size of feature-maps. To facilitate down-sampling authors divide the network into multiple densely connected dense blocks; refered to layers between blocks as transition layers, which do convolution and pooling. The transition layers used in experiments consist of a batch normalization layer and an 1Ã—1 convolutional layer followed by a 2Ã—2 average pooling layer
!
!####Compression
!To further improve model compactness, the number of feature-maps at transition
!layers can be reduced. If a dense block contains m feature-maps, then let
!the following transition layer generate floor_function(theta) output featuremaps, where theta can go maximum to 1. theta is referred to as the compression factor. When theta = 1, the number of feature-maps across transition layers remains unchanged.
!
!When both the bottleneck and transition layers with theta < 1
!are used, model is reffered as DenseNet-BC.
!
!DenseNets utilize parameters more efficiently than alternative architectures (in particular, ResNets). The DenseNetBC with bottleneck structure and dimension reduction at transition layers is particularly parameter-efficient
!
!##CSPNet
!Cross Stage Partial DenseNet
!A stage of CSPDenseNet is composed of a partial dense block and a partial transition layer.  In a partial dense block, the feature maps of the base layer in a stage are split into two parts through channel. the former is directly linked to the end of the stage, and the latter will go through a dense block. All steps involved in a partial transition layer are as follows:-  First, the output of dense layers, will undergo a transition layer. Second, the output of this transition layer,  will be concatenated with first part and undergo another transition layer, and then generate output final output.
!
!The proposed CSPDenseNet preserves the advantages of DenseNetâ€™s feature reuse characteristics, but at the same time prevents an excessively amount of duplicate gradient information by truncating the gradient flow. This idea is realized by designing a hierarchical feature fusion strategy and used in a partial transition layer.
!
!####Partial Dense Block
!The purpose of designing partial dense blocks is to
!1.) increase gradient path: Through the split and merge strategy, the number of gradient paths can be doubled. Because of the cross-stage strategy, one can alleviate the disadvantages caused by using explicit feature map copy for concatenation.
!2.) balance computation of each layer: usually, the channel number in the base layer of a DenseNet is much larger than the growth rate. Since the base layer channels involved in the dense layer operation in a partial dense block account for only half of the original number, it can effectively solve nearly half of the computational bottleneck
!3.) reduce memory traffic:
!
!####Partial Transition Layer.
!The purpose of designing partial transition layers is to maximize the difference of gradient combination. The partial transition layer is a hierarchical feature fusion mechanism, which uses the strategy of truncating the gradient flow to prevent distinct layers from learning duplicate gradient information.
!
!##To summarize: -
!1) DenseNet: - Base Layer -> Denseblock -> Transition Layer -> concat base layer with output of transition layer
!2) CSPDenseNet: - Divide base layer to part1 and part2 via channenls
!Part2 -> Denseblock -> Transition Layer -> concate part1, part2 ->   Transition Layer
!3) Variant of CSPNet (Fusion First): - Divide base layer to part1 and part2 via channenls
!Part2 -> Denseblock -> concat part1 with output of denseblock -> Transition layer.
!4) Variant of CSPNet (Fusion Last): - Divide base layer to part1 and part2 via channenls
!Part2 -> Denseblock -> Transition Layer -> concate part1, part2
!
!CSP (fusion first) means to concatenate the feature maps generated by two parts, and then do transition operation. If this strategy is adopted, a large amount of gradient information will be reused. As to the CSP (fusion last) strategy, the output from the dense block will go through the transition layer and then do concatenation with the feature map coming from part 1. If one goes with the CSP (fusion last) strategy, the gradient information will not be reused since the gradient flow is truncated.
!
!DenseNet: - https://arxiv.org/pdf/1608.06993.pdf
!CSPNet: - https://arxiv.org/pdf/1911.11929.pdf
!
!Setup

In [0]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential, Model
import tensorflow.keras.activations as act
from tensorflow.keras.layers import (
    Conv2D,
    Input,
    MaxPool2D,
    Concatenate,
    AveragePooling2D,
    GlobalAveragePooling2D,
    Dropout,
    BatchNormalization,
    Reshape,
    Flatten,
)
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras import backend as K
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split as split
from keras.callbacks import LearningRateScheduler

Get the data (cifar10)

In [0]:
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
X_train, x_val, Y_train, y_val = split(x_train, y_train, test_size=0.1)
print(
    "X_train.shape, Y_train.shape, x_test.shape, y_test.shape, x_val.shape, Y_train.shape, y_val.shape"
)
print(
    X_train.shape,
    Y_train.shape,
    x_test.shape,
    y_test.shape,
    x_val.shape,
    Y_train.shape,
    y_val.shape,
)

Normalize the data

In [0]:
train_norm = X_train.astype("float32")
test_norm = x_test.astype("float32")
val_norm = x_val.astype("float32")
train_norm /= 255.0
test_norm /= 255.0
val_norm /= 255.0
del X_train, x_test, x_val

Prepare the labels

In [0]:
train_label = to_categorical(Y_train)
test_label = to_categorical(y_test)
val_label = to_categorical(y_val)

Activation function: - Mish

In [0]:

def mish(x):
    x = x * (act.tanh(act.softplus(x)))
    return x


Define Convolution block

In [0]:

def conv_block(inps, convs):
    x = inps
    for conv in convs:
        x = Conv2D(
            conv["filter"],
            conv["kernel"],
            conv["strides"],
            conv["padding"],
            name="conv_" + str(conv["layer_ids"]),
        )(x)
        x = mish(x)
    return x


In place of softmax, I have used Network in Network block, which helps in reducing parameters

In [0]:

def nin_block(inps, filter, ker):
    x = inps
    x = Conv2D(filter, ker, padding="same", name="nin_" + str(0))(x)
    x = mish(x)
    x = Conv2D(filter, 1, padding="same", name="nin_" + str(1))(x)
    x = mish(x)
    x = Conv2D(filter, 1, padding="same", name="nin_" + str(2))(x)
    x = mish(x)
    return x


Define Dense Block (1x1 followed by 3x3 convolutions)

In [0]:
# for densenet-bc uncomment the lines of this block
def dense_block(inps, filter, times, id):
    for time in range(0, times):
        # inps = BatchNormalization()(inps)
        shape = inps.shape
        part1 = inps
        part2 = part1
        part2 = Conv2D(filter[0], 1, padding="same")(part2)
        part2 = mish(part2)
        # part2 = BatchNormalization()(part2)
        part2 = Conv2D(filter[1], 3, padding="same")(part2)
        part2 = mish(part2)
        inps = Concatenate()([part1, part2])
    return inps


Define Dense Block CSPNet

In [0]:
# for cspnet and fusion last the block remains same
# for fusion first comment part2 = Conv2D(filter[2], 1, padding='same')(part2) and part2 = mish(part2)
def dense_block_cspnet(inps, partition, filter, times, id):
    shape = inps.shape
    features = shape[3] - partition
    part1 = inps
    part1 = inps[:, :, :, 0:features]
    part2 = inps[:, :, :, features:]
    for time in range(0, times):
        part2 = Conv2D(filter[0], 1, padding="same")(part2)
        part2 = mish(part2)
        part2 = Conv2D(filter[1], 3, padding="same")(part2)
        part2 = mish(part2)

    part2 = Conv2D(filter[2], 1, padding="same")(part2)
    part2 = mish(part2)
    inps = Concatenate()([part1, part2])
    return inps


Define DenseNet

In [0]:
# for densenet block remains same
# for densenet-b also the block remains same
# for densenet-bc, while calling all conv_block,
# change filters to half of used in dense_block
# In this case change 48/2 i.e 24
# np chnage to conv_block with layer_id = 0
def densenet():
    shp = train_norm.shape
    input_image = Input(shape=(shp[1], shp[2], shp[3]))
    # Layer 0 => 1
    x = conv_block(
        input_image,
        [{"filter": 32, "kernel": 7, "padding": "same", "strides": 2, "layer_ids": 0}],
    )
    x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)

    # Dense block 1
    x = dense_block(x, [32, 32], 10, "dense_block1_")
    x = Dropout(0.5)(x)
    x = BatchNormalization()(x)

    x = conv_block(
        x,
        [{"filter": 32, "kernel": 1, "strides": 1, "padding": "same", "layer_ids": 2}],
    )
    x = AveragePooling2D(pool_size=(2, 2), strides=(2, 2))(x)

    # dense block 2
    x = dense_block(x, [32, 32], 10, "dense_block2_")
    x = Dropout(0.5)(x)
    x = BatchNormalization()(x)

    x = conv_block(
        x,
        [{"filter": 32, "kernel": 1, "strides": 1, "padding": "same", "layer_ids": 3}],
    )
    x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)

    # dense block 3
    x = dense_block(x, [32, 32], 10, "dense_block3_")
    x = Dropout(0.5)(x)
    x = BatchNormalization()(x)

    x = conv_block(
        x,
        [{"filter": 32, "kernel": 1, "strides": 1, "padding": "same", "layer_ids": 4}],
    )
    x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)

    # nin block
    x = nin_block(x, num_clas, 3)
    x = GlobalAveragePooling2D()(x)
    x = Reshape((1, 1, num_clas))(x)
    x = Flatten()(x)
    x = Model(inputs=input_image, outputs=x)
    return x


Define CSPDenseNet

In [0]:
# for cspnet and fusion first the block remains same
# for fusion last comment the lines calling conv_block
# so here 3 lines calling conv_block will be commented for fusion last
def cspnet():
    shp = train_norm.shape
    input_image = Input(shape=(shp[1], shp[2], shp[3]))
    # Layer 0 => 1
    x = conv_block(
        input_image,
        [{"filter": 32, "kernel": 7, "padding": "same", "strides": 2, "layer_ids": 0}],
    )
    x = MaxPool2D(pool_size=(2, 2), strides=(2, 2))(x)

    # Dense block 1
    x = dense_block_cspnet(x, 24, [64, 64, 64], 20, "dense_block1_")
    x = Dropout(0.5)(x)
    x = BatchNormalization()(x)

    x = conv_block(
        x,
        [{"filter": 64, "kernel": 1, "strides": 1, "padding": "same", "layer_ids": 2}],
    )
    x = AveragePooling2D(pool_size=(2, 2), strides=(2, 2))(x)

    # dense block 2
    x = dense_block_cspnet(x, 24, [64, 64, 64], 20, "dense_block2_")
    x = Dropout(0.5)(x)
    x = BatchNormalization()(x)

    x = conv_block(
        x,
        [{"filter": 64, "kernel": 1, "strides": 1, "padding": "same", "layer_ids": 3}],
    )
    x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)

    # dense block 3
    x = dense_block_cspnet(x, 24, [64, 64, 64], 20, "dense_block3_")
    x = Dropout(0.5)(x)
    x = BatchNormalization()(x)

    x = conv_block(
        x,
        [{"filter": 64, "kernel": 1, "strides": 1, "padding": "same", "layer_ids": 4}],
    )
    x = AveragePooling2D(pool_size=(2, 2), strides=2)(x)

    # nin block
    x = nin_block(x, num_clas, 3)
    x = GlobalAveragePooling2D()(x)
    x = Reshape((1, 1, num_clas))(x)
    x = Flatten()(x)
    x = Model(inputs=input_image, outputs=x)
    return x


Define the model

In [0]:

def train(
    func, train_norm, train_label, num_epochs, lr, val_norm, val_label, batch_size
):
    optimizer = Adam(learning_rate=lr)
    loss = CategoricalCrossentropy(from_logits=True)
    model = func()
    model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])
    model.fit(
        train_norm,
        train_label,
        epochs=num_epochs,
        validation_data=(val_norm, val_label),
        callbacks=callbacks,
        batch_size=batch_size,
    )
    return model


Train the model
Change the name of the function while calling train:-
densenet, cspnet

In [0]:
batch_size = 64
lr = 0.01
num_epochs = 1
num_clas = 10


def lr_scheduler(epoch, lr):
    if epoch == 11:
        lr = 0.001
    return lr


callbacks = [LearningRateScheduler(lr_scheduler)]
model = train(
    densenet, train_norm, train_label, num_epochs, lr, val_norm, val_label, batch_size
)
model.evaluate(test_norm, test_label)