# Tests in DWT

In  DWT domain we primarily focus on 2 experiments.

## Experiment 1

Assume a cloud based image classification scenario. Here source device (mobile phone) sends an inference image over a bandlimited channel to the server to get the class label. Server receives the inference image and feeds it to a trained classifier to predict the class label. In order to conserve limited channel bandwidth and storage capacity, 
source devices often encode and compress the images before transmitting to the cloud by utilizing standardized
compression techniques such as JPEG2000.Because most neural networks are designed to classify images in the spatial RBG domain, the cloud currently receives and decodes the compressed j2k images back into the RGB domain before forwarding them to trained neural networks for further processing, as illustrated in the top part of the follwing figure. Thus, a natural question arises is to how to achieve faster training and inference with improved accuracy in a cloud based image classification under bandwidth, storage and computation constraints. 
	

<img src="files/figures/j2kcoder2.jpg">

We claim that the conventional use of image reconstruction is unnecessary for JPEG2000 encoded classification by constructing and training a deep CNN model with the DWT coefficients with CDF 9/7 wavelets. See the bottom part of the above figure. Furthermore, we establish that more accurate classification is also possible by deploying shallower models to benefit from faster training and classification in comparison to models trained fo spatial RGB image inputs. 

## result - 1

We trained a set of ResNet models for CIFAR-10 dataset and following figures compare the test accuracy and speed for training and inference process. 

<img src="files/figures/result12.JPG">

In the above figure, (a) illustrates test accuracy vs inference speed for the CIFAR-10 data set. The blue lines represent results using reconstructed RGB images. Red curve is the result using DWT coefficients with CDF 9/7 wavelets. (b) shows the Test error vs training speed/epoch. Here rate is the number of images that go through the model in each epoch.  The proposed model delivers fast and accurate classification for both training and inference. The points a,b,c,d,e anf f correspond to 6 different ResNet models. The following table summerices these models.

<img src="files/figures/tabelresnet.JPG">

Following is the code for the above implementation.
1. Import the neccessary libraries 

In [1]:
from __future__ import print_function
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
import tensorflow as tf
from keras.models import Model
import time
import numpy as np
import os
from keras.datasets import cifar10

Using TensorFlow backend.


2. We used the following ResNet model (reference - keras)

In [2]:
n=1 #this is an indicator or depth (See keras Resnet implementation for CIFAR-10 for more details)
#the following are the n values of the models a,b,c,d,e,f
#a: n= 4
#b: n= 3
#c: n= 2
#d: n= 3
#e: n= 2
#f: n= 1

depth = n * 6 + 2 #model depth

# Model name, depth and version
model_type = 'ResNet%d' % (depth)

def resnet_layer(inputs,
                 num_filters=64,
                 kernel_size=3, ######################### try to change this and see
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder
    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)
    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x


def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder [a]
    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filters is
    doubled. Within each stage, the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx the same as Table 6 of [a]:
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M
    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)
    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 64 
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters  = int(num_filters*1.5)

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=4)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

3. We used the following data augmentation method which gives flexibility to manipulate mini batches as necessary.

In [3]:
def creategen(X,Y,batch_size):
    while True:
        # suffled indices    
        #idx = np.random.permutation( X.shape[0])
        # create image generator
        datagen = ImageDataGenerator(
                
                featurewise_center=False,  # set input mean to 0 over the dataset
                samplewise_center=False,  # set each sample mean to 0
                featurewise_std_normalization=False,  # divide inputs by std of the dataset
                samplewise_std_normalization=False,  # divide each input by its std
                zca_whitening=False,  # apply ZCA whitening
                rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
                width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
                height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
                horizontal_flip=True,  # randomly flip images
                vertical_flip=False)

        batches= datagen.flow( X, Y, batch_size=batch_size,shuffle=True)
       
        idx0 = 0
        for batch in batches:
            idx1 = idx0 + batch[0].shape[0]
            yield  batch[0], batch[1]

            idx0 = idx1
            if idx1 >= X.shape[0]:
                break

4. We used an initial learning rate of 0.001 which is reduced progressively at 80, 120 and 160 over 200 epochs.

In [4]:
def lr_schedule(epoch):
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-2
    elif epoch > 160:
        lr *= 1e-2
    elif epoch > 120:
        lr *= 1e-1
    elif epoch > 80:
        lr *= 0.5
    print('Learning rate: ', lr)
    return lr

we used the following methods for preprocessing.

In [8]:
from numpy.linalg import inv

#RGB2YCbCr - RGB to YCbCr conversion
def batchRGB2YCRCB(x_batch):
    alpha_R = 0.299
    alpha_G = 0.587
    alpha_B = 0.114
    x_batchnew = np.zeros((x_batch.shape)).astype('float32')
    for i in range(0,x_batch.shape[0]):
        #Y
        x_batchnew[i,:,:,0] = alpha_R*x_batch[i,:,:,0] + alpha_G*x_batch[i,:,:,1] + alpha_B*x_batch[i,:,:,2]
        #Cb
        x_batchnew[i,:,:,1] = (0.5/(1-alpha_B))*(x_batch[i,:,:,2]-x_batchnew[i,:,:,0])
        #Cr
        x_batchnew[i,:,:,2] = (0.5/(1-alpha_R))*(x_batch[i,:,:,0]-x_batchnew[i,:,:,0])
    return x_batchnew


#generate the matrix for CDF 9/7 transform
def getTcdf97(height):
    a1 = -1.586134342
    a2 = -0.05298011854
    a3 = 0.8829110762
    a4 = 0.4435068522

    # Scale coeff:
    k1 = 0.8128662109 # 1/1.230174104914 // 0,2,4,6
    k2 = 0.6149902344 # 1.230174104914/2 // 5038 1,3,5,7
    X1 = np.identity(height)
    X2 = np.identity(height)
    X3 = np.identity(height)
    X4 = np.identity(height)
    X5 = np.zeros((height,height)).astype('float32')
    for col in range(1,height-2,2):
        X1[col-1,col]=X1[col+1,col]=a1
    X1[height-2,height-1] = 2*a1
    
    #print(X1)
    for col in range(2,height-1,2):
        X2[col-1,col]=X2[col+1,col]=a2
    X2[1,0] = 2*a2
    #print(X2)
    for col in range(1,height-2,2):
        X3[col-1,col]=X3[col+1,col]=a3
    X3[height-2,height-1] = 2*a3
    
    #print(X1)
    for col in range(2,height-1,2):
        X4[col-1,col]=X4[col+1,col]=a4
    X4[1,0] = 2*a4
    
    for col in range(0,height,1):
        if(col%2==0 ):
            #print(col)
            X5[col,int(col/2)]=k1
        else:
            X5[col,int(height/2 + (col-1)/2)]=k2
    #print(X3)
    X =np.matmul(np.matmul(np.matmul(np.matmul(X1,X2),X3),X4),X5)
    return X,inv(X)

#take Level 1 DWT
def batchwaveletcdf97mat(x_batch,X,dimhalf):
    x_batchnew = np.zeros((x_batch.shape[0],dimhalf,dimhalf,12)).astype('float32')
    for i in range(0,x_batch.shape[0]):
        for j in range(0,x_batch.shape[3]):
            coeff_array = np.matmul(np.matmul(X.transpose(),x_batch[i,:,:,j]),X)
            x_batchnew[i,:,:,j*4+0]=coeff_array[0:dimhalf,0:dimhalf]
            x_batchnew[i,:,:,j*4+1]=coeff_array[0:dimhalf,dimhalf:2*dimhalf]
            x_batchnew[i,:,:,j*4+2]=coeff_array[dimhalf:2*dimhalf,0:dimhalf]
            x_batchnew[i,:,:,j*4+3]=coeff_array[dimhalf:2*dimhalf,dimhalf:2*dimhalf]
    return x_batchnew

5. load the dataset and preprocessing.

In [11]:
# Load the CIFAR10 data.
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
num_classes = 10
#convert to float32
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')


#level offset
X_train = X_train - 128.0
X_test = X_test - 128.0

#RGB2YCbCr - This converts RGB images to YCbCr format to facilitate compression - optional
X_train = batchRGB2YCRCB(X_train)
X_test = batchRGB2YCRCB(X_test)

#generate necessary matrices for DWT cdf9/7 trandformation
M,M_inv = getTcdf97(32)

#take level-1 DWT with CDF 9/7
x_train = batchwaveletcdf97mat(X_train.astype('float32'),M,16)
x_test = batchwaveletcdf97mat(X_test.astype('float32'),M,16)

### max normalization
x_train=x_train/np.max(np.abs(x_train))
x_test=x_test/np.max(np.abs(x_test))

input_shape = x_test.shape[1:]
print('input shape to resnet: ',input_shape)

input shape to resnet:  (16, 16, 12)


6. convert labels to one hot encoding

In [12]:
# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

7. compile the model: we used Adam optimizer.

In [13]:
batch_size = 32  
epochs = 200

model = resnet_v1(input_shape=input_shape, depth=depth,num_classes=num_classes)

model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=lr_schedule(0)),metrics=['accuracy'])
model.summary()
print(model_type)

Learning rate:  0.001
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 16, 16, 12)   0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 16, 16, 64)   6976        input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 16, 16, 64)   256         conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 16, 16, 64)   0           batch_normalization_1[0][0]      
_______________________________________________________________________________________

8. Set callback methods

In [14]:
lr_scheduler = LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [lr_reducer, lr_scheduler]

9. Train the model - We used a server with Titan-V GPU.

In [15]:
# Fit the model on the batches generated by datagen.flow().
model.fit_generator(creategen(x_train, y_train, batch_size=batch_size),
                        steps_per_epoch=int(np.ceil(x_train.shape[0]/32.0)),
                        epochs=epochs, verbose=1, workers=1,
                        callbacks=callbacks)

Epoch 1/200


  str(self.x.shape[channels_axis]) + ' channels).')


Learning rate:  0.001
Epoch 2/200
Learning rate:  0.001
   9/1563 [..............................] - ETA: 22s - loss: 1.3493 - acc: 0.5903



 318/1563 [=====>........................] - ETA: 25s - loss: 1.3328 - acc: 0.5854

KeyboardInterrupt: 

10. Evaluate the test set

In [None]:
start = time.time()
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('time per image :',(time.time()-start)*1000/10000,' ms')
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])