# DIFFERENTIAL METHODS 

Import some useful libraries for this notebook

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import cifar10
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

During this notebook, we will use the CIFAR-10 dataset for testing purpose. Here is a script that loads the dataset into training set and test set. We will apply the differential methods to the training set.

In [3]:
# Load the raw CIFAR-10 data.
cifar10_dir = r'C:\Users\alexa\OneDrive\Bureau\projet clustering\kmeans_clusters\cluster_fold\C1'

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', x_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', x_test.shape)
print('Test labels shape: ', y_test.shape)

Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000, 1)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000, 1)


In [4]:
a=np.load('kmeans_cifar10.npy',allow_pickle=True)

array([[[[ 54,  31,  18],
         [ 59,  34,  19],
         [ 56,  37,  22],
         ...,
         [133,  92,  53],
         [131,  89,  49],
         [132,  94,  53]],

        [[ 59,  38,  26],
         [ 62,  39,  26],
         [ 55,  36,  25],
         ...,
         [193, 135,  77],
         [200, 144,  86],
         [197, 142,  86]],

        [[ 41,  26,  18],
         [ 41,  25,  16],
         [ 36,  21,  13],
         ...,
         [202, 144,  84],
         [196, 141,  81],
         [190, 136,  79]],

        ...,

        [[105, 161, 157],
         [ 97, 163, 162],
         [ 96, 170, 164],
         ...,
         [ 98, 145, 144],
         [100, 138, 135],
         [100, 130, 123]],

        [[ 86, 149, 143],
         [ 96, 157, 156],
         [ 97, 164, 160],
         ...,
         [ 94, 127, 122],
         [ 98, 126, 120],
         [100, 124, 114]],

        [[ 70, 136, 128],
         [ 92, 146, 145],
         [113, 169, 168],
         ...,
         [ 98, 121, 111],
        

## Min-max differential method

Suppose that the images in this set are **very similar**, we call `I_min` and `I_max` the minimum and the maximum images, respectively.

**For the encoder**
- For each channel:
    - We start by defaul by using distance to `I_min`
    - We continue using the distance to to `I_min` utill this distance is greater than the distance to to `I_max`, we switch !

**For the decoder**
- We just do the same things as reversed order

Below is a naive approche, start by encoding a single channel 

In [10]:
def channel_encoder_note(channel, min_channel, max_channel, start_by_min = True):
    H, W = channel.shape
    encoded_channel = np.zeros((H,W))
    is_min = start_by_min
    for h in range(H):
        for w in range(W):
            dis_to_min = channel[h,w] - min_channel[h,w]
            dis_to_max = max_channel[h,w] - channel[h,w]
            if is_min == True:
                encoded_channel[h,w] = dis_to_min
            if is_min == False:
                encoded_channel[h,w] = dis_to_max
            
            is_min = dis_to_min <= dis_to_max
            
    return encoded_channel

def channel_decoder_note(encoded_channel, min_channel,max_channel, start_by_min = True):
    H, W = encoded_channel.shape
    decoded_channel = np.zeros((H,W))
    is_min = start_by_min

    for h in range(H):
        for w in range(W):
            if is_min == True:
                decoded_channel[h,w] = min_channel[h,w] + encoded_channel[h,w]
            if is_min == False:
                decoded_channel[h,w] = max_channel[h,w] - encoded_channel[h,w]

            dis_to_min = decoded_channel[h,w] - min_channel[h,w]
            dis_to_max = max_channel[h,w] - decoded_channel[h,w]
            is_min = (dis_to_min <= dis_to_max)

    return decoded_channel

Here we test our implementation for a single channel of an image and check that it works well!

In [14]:
en = channel_encoder_note(x_train[0,:,:,0], 255*np.ones((32,32,)), np.zeros((32,32)))
de = channel_decoder_note(en, 255*np.ones((32,32,)), np.zeros((32,32)))
np.sum(de != x_train[0,:,:,0])

0

The decoded channel is exactly the same as the original channel. So we are good to go further. Here we will encode the whole training set

In [15]:
def MMD_Encoder_note(X, start_by_min = True):
    '''
    Min-max differential encoder
    Input: - X : numpy array of shape (N, H, W, C) where N is number of images,
                 H is heigh, W is width and C is number of channels. We suppose 
                 that the images in X are very similar
           - start_by_min: boolean indicate that we start by difference to min image
    Return: Y, (I_min, I_max) where:
            - Y : numpy array of the same shape as input, contain the differential parts
            - I_min : min image of shape (H, W, C)
            - I_max : max_image of shape (H, W, C)   
    '''
    N, C = X.shape[0], X.shape[3]
    Y = np.zeros(X.shape)
    I_min = np.min(X, axis=0)
    I_max = np.max(X, axis=0)

    for n in range(N):
        for c in range(C):
            Y[n,:,:,c] = channel_encoder_note(X[n,:,:,c], I_min[:,:,c], I_max[:,:,c],start_by_min)

    return Y, (I_min, I_max)

In [16]:
def MMD_Decoder_note(Y, I_min, I_max, start_by_min = True):
    '''
    Min-max differential decoder
    Input: Y, (I_min, I_max) where:
            - Y : numpy array of the same shape as input, contain the differential parts
            - I_min : min image of shape (H, W, C)
            - I_max : max_image of shape (H, W, C) 
            - start_by_min: boolean indicate that we start by difference to min image
    Return: - X : numpy array of shape (N, H, W, C) where N is number of images,
                 H is heigh, W is width and C is number of channels. 
    '''
    N, C = Y.shape[0], Y.shape[3]
    X = np.zeros(Y.shape)

    for n in range(N):
        for c in range(C):
            X[n,:,:,c] = channel_decoder_note(Y[n,:,:,c],I_min[:,:,c], I_max[:,:,c],start_by_min)
    
    return X

All elements are ready for the min-max differential method. Below we apply this method to the training set

In [17]:
X_encoded, (I_min, I_max) = MMD_Encoder_note(x_train[a[1]])
X_decoded = MMD_Decoder_note(X_encoded, I_min, I_max)

We save the data and compare the compressed sizes

In [21]:
np.save("./saved_data/X_train_encoded.npy", X_encoded)
np.save("./saved_data/X_train.npy", x_train[a[1]] )

In [9]:
np.sum(X_decoded != X_train[:,:,:,:])

0

In [25]:
X_encoded.size

3072

In [26]:
x_train[a[1]].size

3072

We got the right answer!