# Testing a network for jet images

This Jupyter Notebook will test a pre-trained network for jet images. The data used is created in examples/process_jet_images.

## Preparing the data

Note: You have to change the path to the directory where your processed files are. 

In [1]:
import pandas as pd

# Change this path to point to where you have stored the data: 
pkl_path = '/Users/nallenallis/Documents/LTH/Exjobb/data/jet images/images/jet_images_LAGAN_images_signal.pkl'

# Read the .pkl-file with Pandas
df = pd.read_pickle(pkl_path)

In [2]:
# Separate the DataFrame into two sets.
split = round(0.8*len(df))
train = df[:split] # indices 0 to split (80 %)
test = df[split:] # indices split to end (20 %)

## Setting up the network

### Inputting the data

Adding the two datasets as TensorDatasets to PyTorch (also loading all other classes we'll need later)

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

from fastai import learner
from fastai.data import core

train_x = train
test_x = test
train_y = train_x  # y = x since we are building an autoencoder
test_y = test_x

# Constructs a tensor object of the data and wraps them in a TensorDataset object.
train_ds = TensorDataset(torch.tensor(train_x.values, dtype=torch.float), torch.tensor(train_y.values, dtype=torch.float))
valid_ds = TensorDataset(torch.tensor(test_x.values, dtype=torch.float), torch.tensor(test_y.values, dtype=torch.float))

We now set things up to load the data, and we use a batch size that was optimized by previous students...note also that this is fastai v2, migration thanks to Jessica Lastow.

In [4]:
bs = 256

# Converts the TensorDataset into a DataLoader object and combines into one DataLoaders object (a basic wrapper
# around several DataLoader objects). 
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)
dls = core.DataLoaders(train_dl, valid_dl)

### Preparing the network

Here we have an example network. Details aren't too important, as long as they match what was already trained for us...in this case we have a LeakyReLU, tanh activation function, and a number of layers that goes from 4 to 200 to 20 to 3 (number of features in the hidden layer that we pick for testing compression) and then back all the way to 4. 

In [5]:
class AE_3D_200_LeakyReLU_test(nn.Module):
    def __init__(self, n_features=625):
        super(AE_3D_200_LeakyReLU_test, self).__init__()
        self.en1 = nn.Linear(n_features, 300)
        self.en2 = nn.Linear(300, 100)
        self.de1 = nn.Linear(100, 300)
        self.de2 = nn.Linear(300, n_features)
        #self.en1 = nn.Linear(n_features, 625)
        #self.en2 = nn.Linear(625, 625)
        #self.de1 = nn.Linear(625, 625)
        #self.de2 = nn.Linear(625, n_features)
        self.tanh = nn.Tanh()

    def encode(self, x):
        return self.en2(self.tanh(self.en1(x)))

    def decode(self, x):
        return self.de2(self.tanh(self.de1(self.tanh(x))))

    def forward(self, x):
        z = self.encode(x)
        return self.decode(z)

    def describe(self):
        return 'text'

model = AE_3D_200_LeakyReLU_test()
model.to('cpu')

AE_3D_200_LeakyReLU_test(
  (en1): Linear(in_features=625, out_features=300, bias=True)
  (en2): Linear(in_features=300, out_features=100, bias=True)
  (de1): Linear(in_features=100, out_features=300, bias=True)
  (de2): Linear(in_features=300, out_features=625, bias=True)
  (tanh): Tanh()
)

We now have to pick a loss function - MSE loss is appropriate for a compression autoencoder since it reflects the [(input-output)/input] physical quantity that we want to minimize. 

In [6]:
from fastai.metrics import mse

loss_func = nn.MSELoss()
wd = 1e-6

learn = learner.Learner(dls, model=model, wd=wd, loss_func=loss_func)

## Loading the pre-trained network

In [7]:
#learn.load('bg_all_img_625-625-625-625')
learn.load('bg_all_img_625-300-100-300-625')

<fastai.learner.Learner at 0x1213d0370>

In [8]:
learn.validate()

(#1) [0.08032570034265518]

## Preparing for analysis

In [9]:
import os

save_dir = "plotOutput"
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

In [10]:
import numpy as np

model.to('cpu')

data = torch.tensor(test.values, dtype=torch.float)
pred = model(data)
pred = pred.detach().numpy()
data = data.detach().numpy()

data_df = pd.DataFrame(data, columns=test.columns)
pred_df = pd.DataFrame(pred, columns=test.columns)

In [11]:
pred_df.to_pickle(os.path.join(save_dir,'bg_all_img_625-300-100-300-625_output_signal.pkl'))
#pred_df.to_pickle(os.path.join(save_dir,'bg_all_img_625-625-625-625-625_output_signal.pkl'))
#pred_df.to_pickle(os.path.join(save_dir,'bg_all_img_625-300-100-300-625_output.pkl'))