## Mutual Information, Self-information and Split Learning

In this excersise we want to see how split-learning is helping privacy by decreasing the information content in a raw input. This is a simplified example, just to help the understanding of the concepts in this lesson. 

We will be using MNIST data for this excersise. We will first download and load the data. Then, we will load a pretrained small DNN. Our aim is to compare the information in the raw inputs of the MNIST test set, with the information in the output of the final convolution layer, to see if there is information degredation.  

#### imports

In [8]:
import torch
from torch import nn

from lenet_5 import LeNet5_5
from torchvision.datasets.mnist import MNIST
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import numpy as np



#### Load data

In [18]:
BATCH_SIZE = 256
BATCH_TEST_SIZE = 1024
data_train = MNIST('./data/mnist',
                   download=True,
                   transform=transforms.Compose([
                       transforms.Resize((32, 32)),
                       transforms.ToTensor()]))
data_test = MNIST('./data/mnist',
                  train=False,
                  download=True,
                  transform=transforms.Compose([
                      transforms.Resize((32, 32)),
                      transforms.ToTensor()]))
data_train_loader = DataLoader(data_train, batch_size = BATCH_SIZE , shuffle=True, num_workers=8)
data_test_loader = DataLoader(data_test,  batch_size = BATCH_TEST_SIZE, num_workers=8)
data_test_loader2 = DataLoader(data_test,  batch_size = 1, num_workers=0)

TRAIN_SIZE = len(data_train_loader.dataset)
TEST_SIZE = len(data_test_loader.dataset)
NUM_BATCHES = len(data_train_loader)
NUM_TEST_BATCHES = len(data_test_loader)

#### Load pre-trained model

In [19]:
model_loaded = LeNet5_5()
model_loaded.load_state_dict(torch.load("./LeNet-saved-5"))
criterion = nn.NLLLoss()

#### Validate

In [20]:
def validate (net, criterion):
    net.eval()
    total_correct = 0
    avg_loss = 0.0
    for i, (images, labels) in enumerate(data_test_loader):
        labels = (labels > 5).long()
        output = net(images)
        avg_loss += criterion(output, labels).sum()
        pred = output.detach().max(1)[1]
        total_correct += pred.eq(labels.view_as(pred)).sum()

    avg_loss /= len(data_test)
    print('Test Avg. Loss: %f, Accuracy: %f' % (avg_loss.detach().cpu().item(), float(total_correct) / len(data_test)))
    return 

#### Run validate to check the accuracy of the pretrained model

In [21]:
validate (model_loaded, criterion)

Test Avg. Loss: 0.000023, Accuracy: 0.992900


### Splitting and measuring information content

At this point, we want to split the network to two parts, and observe how different the information content of the original images and the intermediate activations are. We chose the last convolution layer of the pre-trained model we had as the splitting point. We will feed all the test data to the convolutions, and save their outputs so that we can later use them to quantitatively measure the bits of information. 

#### Save the raw images and the intermediate activations

In [30]:
imgs =[]
intermediate_activations = []
total_correct = 0

model_loaded.eval()
for i, (images, labels) in enumerate(data_test_loader2):
    
    
    imgs.append(((np.reshape(np.squeeze(images.detach().numpy()), (1,-1)) )))
    x= images
    x = model_loaded.convnet(x)      
    
    intermediate_activations.append(((np.reshape(np.squeeze(x.detach().numpy()), (1,-1)) )))
    
    np.save("images", np.array(imgs).squeeze(1))
    np.save("intermediate_act", np.array(intermediate_activations).squeeze(1))



#### load the Information Toolbox

In [32]:
import sys
sys.path.insert(1,'./ite-repo')
import ite

In [None]:
#### loading numpy files and calculating mutual information

In [37]:
images_raw=np.load("images.npy")
print(images_raw.shape)
intermediate_activation=np.load("intermediate_act.npy")
print(intermediate_activation.shape)

(10000, 1024)
(10000, 120)


#### Mutual Information function

In [39]:
co = ite.cost.MIShannon_DKL()

In [40]:
ds = np.array([1024, 1024])
y = np.concatenate((images_raw, images_raw),axis=1)
print(y.shape)
i = co.estimation(y, ds) 
print(i)

(10000, 2048)
730.6892832072778


In [42]:
ds = np.array([1024, 120])
y = np.concatenate((images_raw, intermediate_activation),axis=1)
print(y.shape)
i = co.estimation(y, ds) 
print(i)

(10000, 1144)
303.22723470399853


### Observation and conclusion: We can see that the raw image contained 730 bits of self-information, whereas the intermediate activations only contain 303 bits of information that was originally in the raw image (the 730 bits). This shows that the first layers of the neural network, alone, have degraded more than half of the original information in the raw input. 